Difference between revisions of "Página de pruebas"
Adelo Vieira (talk | contribs) |
Adelo Vieira (talk | contribs) |
||
| Line 11: | Line 11: | ||
| − | + | KNN is a model that classifies a new data point based on the points that are closest in distance to the new point. The principle behind nearest neighbor methods is to find a predefined number of training samples (''K'') closest in distance to the new data point. Then, the class of the new data point will be the most common class in the k training samples. | |
| + | https://scikit-learn.org/stable/modules/neighbors.html [Adelo] | ||
| + | In other words, KNN determines the class of a given unlabeled observation by identifying the most common class among the k-nearest labeled observations to it. | ||
| + | This is a simple method, but extremely powerful. | ||
| − | < | + | |
| − | + | <img src="https://upload.wikimedia.org/wikipedia/commons/e/e7/KnnClassification.svg" class="center" style="display: block; margin-left: auto; margin-right: auto; width: 300pt;" /> | |
| Line 22: | Line 25: | ||
| + | <br /> | ||
| + | KNN can be used for both classification and regression predictive problems. However, it is more widely used in classification problems in the industry. | ||
| + | |||
| + | |||
| + | <br /> | ||
k-NN is ideal for classification tasks where relationships among the attributes and target classes are: | k-NN is ideal for classification tasks where relationships among the attributes and target classes are: | ||
* numerous | * numerous | ||
Revision as of 17:27, 16 January 2021
K-Nearest Neighbour
- 15/06: Recorded class - K-Nearest Neighbour
- StatQuest: https://www.youtube.com/watch?v=HVXime0nQeI
KNN is a model that classifies a new data point based on the points that are closest in distance to the new point. The principle behind nearest neighbor methods is to find a predefined number of training samples (K) closest in distance to the new data point. Then, the class of the new data point will be the most common class in the k training samples.
https://scikit-learn.org/stable/modules/neighbors.html [Adelo]
In other words, KNN determines the class of a given unlabeled observation by identifying the most common class among the k-nearest labeled observations to it.
This is a simple method, but extremely powerful.
KNN can be used for both classification and regression predictive problems. However, it is more widely used in classification problems in the industry.
k-NN is ideal for classification tasks where relationships among the attributes and target classes are:
- numerous
- complex
- difficult to interpret and
- where instances of a class are fairly homogeneous
Applications of this learning method include:
- Computer vision applications:
- Optical character recognition
- Face recognition
- Recommendation systems
- Pattern detection in genetic data
Basic Implementation:
- Training Algorithm:
- Simply store the training examples
- Prediction Algorithm:
- Calculate the distance from x to all points in your data (Udemy Course)
- Sort the points in your data by increasing distance from x (Udemy Course)
- Predict the majority label of the "k" closets points (Udemy Course)
- Find the training examples that are nearest to the test example (Noel)
- Predict the most frequent class among those . (Noel)
- Improvements:
- Weighting training examples based on their distance
- Alternative measures of "nearness"
- Finding "close" examples in a large training set quickly
Strengths and Weaknesses:
| Strengths | Weaknesses |
|---|---|
| The algorithm is simple and effective | The method does not produce any model which limits potential insights about the relationship between features |
| Fast training phase | Slow classification phase. Requires lots of memory |
| Capable of reflecting complex relationships | Can not handle nominal feature or missing data without additional pre-processing |
| Unlike many other methods, no assumptions about the distribution of the data are made |
- Classifying a new example: