Q1: What is Clustering?
find k labels of a set of data and divide data to k by labels
like T-shirt size example in document
Q2: Why do we need clustering?
it gives a label to the raw data
considering in knn, svm training data stage, we need some responses to tell us what is data, how can we get responses?
- assign data one by one manually
- some algorithm categorizes data set automatically
clustering is the second choice
Q3: Why does k-means sound so familiar?
in video chapter, we use mean-shift track motive object track.actually, mean-shift and k-means clustering have some similar points
- they all calculate the mean value and repeat
- they all need criteria to stop repeat
- they all shit the center in every repeat
Q4: What’s the main problem of k-means clustering?
they are two problems of k-means clustering
- how to choose the category number k
- how to choose the initial centroids of each category
Q5: Do we have other clustering algorithms?
yes, something like
- Mean-Shift Clustering
- Density-Based Spatial Clustering of Applications with Noise (DBSCAN)
- Expectation–Maximization (EM) Clustering using Gaussian Mixture Models (GMM)
- Agglomerative Hierarchical Clustering