Difference between revisions of "Página de pruebas"
Adelo Vieira (talk | contribs) |
Adelo Vieira (talk | contribs) |
||
| Line 10: | Line 10: | ||
| − | KNN is a model that classifies a new data point based on the points that are closest in distance to the new point. The principle behind nearest neighbor methods is to find a predefined number of training samples (''K'') closest in distance to the new data point. Then, the class of the new data point will be the most common class in the k training samples. | + | {| class="wikitable" |
| − | https://scikit-learn.org/stable/modules/neighbors.html [Adelo] | + | |+ |
| − | + | ! colspan="6" |KNN is a model that classifies a new data point based on the points that are closest in distance to the new point. The principle behind nearest neighbor methods is to find a predefined number of training samples (''K'') closest in distance to the new data point. Then, the class of the new data point will be the most common class in the k training samples. <nowiki>https://scikit-learn.org/stable/modules/neighbors.html</nowiki> [Adelo] | |
In other words, KNN determines the class of a given unlabeled observation by identifying the most common class among the k-nearest labeled observations to it. | In other words, KNN determines the class of a given unlabeled observation by identifying the most common class among the k-nearest labeled observations to it. | ||
This is a simple method, but extremely powerful. | This is a simple method, but extremely powerful. | ||
| + | |- | ||
| + | !'''Regression/Classification''' | ||
| + | !'''Applications''' | ||
| + | !Strengths | ||
| + | !Weaknesses | ||
| + | !Comments | ||
| + | !Improvements | ||
| + | |- | ||
| + | |KNN can be used for both classification and regression predictive problems. However, it is more widely used in classification problems in the industry. <nowiki>https://www.analyticsvidhya.com/blog/2018/03/introduction-k-neighbours-algorithm-clustering/</nowiki> | ||
| + | | | ||
| + | * Computer vision applications: | ||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
:* Optical character recognition | :* Optical character recognition | ||
:* Face recognition | :* Face recognition | ||
| + | |||
* Recommendation systems | * Recommendation systems | ||
* Pattern detection in genetic data | * Pattern detection in genetic data | ||
| + | | | ||
| + | | | ||
| + | |k-NN is ideal for classification tasks where relationships among the attributes and target classes are: | ||
| − | + | * numerous | |
| − | + | * complex | |
| − | + | * difficult to interpret and | |
| − | * numerous | ||
| − | * complex | ||
| − | * difficult to interpret and | ||
* where instances of a class are fairly homogeneous | * where instances of a class are fairly homogeneous | ||
| + | | | ||
| + | :* Weighting training examples based on their distance | ||
| + | :* Alternative measures of "nearness" | ||
| + | :* Finding "close" examples in a large training set quickly | ||
| + | |} | ||
| Line 53: | Line 62: | ||
:* Find the <math>k</math> training examples <math>(x_{1},y_{1}),...(x_{k},y_{k})</math> that are '''nearest''' to the test example <math>x</math> (Noel) | :* Find the <math>k</math> training examples <math>(x_{1},y_{1}),...(x_{k},y_{k})</math> that are '''nearest''' to the test example <math>x</math> (Noel) | ||
:* Predict the most frequent class among those <math>y_{i}'s</math>. (Noel) | :* Predict the most frequent class among those <math>y_{i}'s</math>. (Noel) | ||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
Revision as of 17:53, 16 January 2021
K-Nearest Neighbour
- Recorded Noel class (15/06):
- StatQuest: https://www.youtube.com/watch?v=HVXime0nQeI
| KNN is a model that classifies a new data point based on the points that are closest in distance to the new point. The principle behind nearest neighbor methods is to find a predefined number of training samples (K) closest in distance to the new data point. Then, the class of the new data point will be the most common class in the k training samples. https://scikit-learn.org/stable/modules/neighbors.html [Adelo]
In other words, KNN determines the class of a given unlabeled observation by identifying the most common class among the k-nearest labeled observations to it. This is a simple method, but extremely powerful. | |||||
|---|---|---|---|---|---|
| Regression/Classification | Applications | Strengths | Weaknesses | Comments | Improvements |
| KNN can be used for both classification and regression predictive problems. However, it is more widely used in classification problems in the industry. https://www.analyticsvidhya.com/blog/2018/03/introduction-k-neighbours-algorithm-clustering/ |
|
k-NN is ideal for classification tasks where relationships among the attributes and target classes are:
|
| ||
Basic Implementation:
- Training Algorithm:
- Simply store the training examples
- Prediction Algorithm:
- Calculate the distance from x to all points in your data (Udemy Course)
- Sort the points in your data by increasing distance from x (Udemy Course)
- Predict the majority label of the "k" closets points (Udemy Course)
- Find the training examples that are nearest to the test example (Noel)
- Predict the most frequent class among those . (Noel)
Udemy course, Pierian data https://www.udemy.com/course/python-for-data-science-and-machine-learning-bootcamp/