Q1: What is kNN(k-Nearest Neighbor)?
from the document
- “kNN is one of the simplest classification algorithms available for supervised learning.”
Q2: What problem does KNN want to solve?
- “classification algorithms” is the key point
take one unclassified input data in to some predefined classes is a classification
Q3: What concepts in KNN?
train data:
because kNN need predefined pairs during the algorithms,
the data making pairs is training dataresponses:
responses are training data classified results, responses size should be equal to train data
test data:
a list of input data, each input data should be classified as one result of responses
labels:
the result of test data, same as responses, but with a different name
distance:
the number for judging neighbors, k nearest denote k minimums
Q4: What’s the main problem with kNN?
from document
1 | "But there is a problem with that. Red Triangle may be the nearest. But what if there are a lot of Blue Squares near to him? " |
if the number of red/blue are equal, the location of red/blue distribute maybe not, is it import?
- how to choose the import k?
- we are supposing all k neighbors are with equal importance? Is it justice?