Difference between revisions of "Página de pruebas"

From Sinfronteras
Jump to: navigation, search
Line 10: Line 10:
  
  
KNN is a model that classifies a new data point based on the points that are closest in distance to the new point. The principle behind nearest neighbor methods is to find a predefined number of training samples (''K'') closest in distance to the new data point. Then, the class of the new data point will be the most common class in the k training samples.  
+
{| class="wikitable"
https://scikit-learn.org/stable/modules/neighbors.html [Adelo]
+
|+
 
+
! colspan="6" |KNN is a model that classifies a new data point based on the points that are closest in distance to the new point. The principle behind nearest neighbor methods is to find a predefined number of training samples (''K'') closest in distance to the new data point. Then, the class of the new data point will be the most common class in the k training samples. <nowiki>https://scikit-learn.org/stable/modules/neighbors.html</nowiki> [Adelo]
 
In other words, KNN determines the class of a given unlabeled observation by identifying the most common class among the k-nearest labeled observations to it.
 
In other words, KNN determines the class of a given unlabeled observation by identifying the most common class among the k-nearest labeled observations to it.
  
 
This is a simple method, but extremely powerful.
 
This is a simple method, but extremely powerful.
 +
|-
 +
!'''Regression/Classification'''
 +
!'''Applications'''
 +
!Strengths
 +
!Weaknesses
 +
!Comments
 +
!Improvements
 +
|-
 +
|KNN can be used for both classification and regression predictive problems. However, it is more widely used in classification problems in the industry. <nowiki>https://www.analyticsvidhya.com/blog/2018/03/introduction-k-neighbours-algorithm-clustering/</nowiki>
 +
|
 +
* Computer vision applications:
  
 
<br />
 
KNN can be used for both classification and regression predictive problems. However, it is more widely used in classification problems in the industry. https://www.analyticsvidhya.com/blog/2018/03/introduction-k-neighbours-algorithm-clustering/
 
 
 
<br />
 
'''Applications of this learning method include:'''
 
* Computer vision applications:
 
 
:* Optical character recognition
 
:* Optical character recognition
 
:* Face recognition
 
:* Face recognition
 +
 
* Recommendation systems
 
* Recommendation systems
 
* Pattern detection in genetic data
 
* Pattern detection in genetic data
 +
|
 +
|
 +
|k-NN is ideal for classification tasks where relationships among the attributes and target classes are:
  
 
+
* numerous
<br />
+
* complex
k-NN is ideal for classification tasks where relationships among the attributes and target classes are:
+
* difficult to interpret and
* numerous  
 
* complex  
 
* difficult to interpret and  
 
 
* where instances of a class are fairly homogeneous
 
* where instances of a class are fairly homogeneous
 +
|
 +
:* Weighting training examples based on their distance
 +
:* Alternative measures of "nearness"
 +
:* Finding "close" examples in a large training set quickly
 +
|}
  
  
Line 53: Line 62:
 
:* Find the <math>k</math> training examples <math>(x_{1},y_{1}),...(x_{k},y_{k})</math> that are '''nearest''' to the test example <math>x</math> (Noel)
 
:* Find the <math>k</math> training examples <math>(x_{1},y_{1}),...(x_{k},y_{k})</math> that are '''nearest''' to the test example <math>x</math> (Noel)
 
:* Predict the most frequent class among those <math>y_{i}'s</math>. (Noel)
 
:* Predict the most frequent class among those <math>y_{i}'s</math>. (Noel)
 
 
* '''Improvements:'''
 
:* Weighting training examples based on their distance
 
:* Alternative measures of "nearness"
 
:* Finding "close" examples in a large training set quickly
 
 
 
'''Strengths and Weaknesses:'''
 
{| class="wikitable"
 
|+
 
!Strengths
 
!Weaknesses
 
|-
 
|The algorithm is simple and effective
 
|The method does not produce any model which limits potential insights about the relationship between features
 
|-
 
|Fast training phase
 
|Slow classification phase. Requires lots of memory
 
|-
 
|Capable of reflecting complex relationships
 
|Can not handle nominal feature or missing data without additional pre-processing
 
|-
 
|Unlike many other methods, no assumptions about the distribution of the data are made
 
|
 
|}
 
  
  

Revision as of 17:53, 16 January 2021

K-Nearest Neighbour

  • Recorded Noel class (15/06):


KNN is a model that classifies a new data point based on the points that are closest in distance to the new point. The principle behind nearest neighbor methods is to find a predefined number of training samples (K) closest in distance to the new data point. Then, the class of the new data point will be the most common class in the k training samples. https://scikit-learn.org/stable/modules/neighbors.html [Adelo]

In other words, KNN determines the class of a given unlabeled observation by identifying the most common class among the k-nearest labeled observations to it.

This is a simple method, but extremely powerful.

Regression/Classification Applications Strengths Weaknesses Comments Improvements
KNN can be used for both classification and regression predictive problems. However, it is more widely used in classification problems in the industry. https://www.analyticsvidhya.com/blog/2018/03/introduction-k-neighbours-algorithm-clustering/
  • Computer vision applications:
  • Optical character recognition
  • Face recognition
  • Recommendation systems
  • Pattern detection in genetic data
k-NN is ideal for classification tasks where relationships among the attributes and target classes are:
  • numerous
  • complex
  • difficult to interpret and
  • where instances of a class are fairly homogeneous
  • Weighting training examples based on their distance
  • Alternative measures of "nearness"
  • Finding "close" examples in a large training set quickly



Basic Implementation:

  • Training Algorithm:
  • Simply store the training examples


  • Prediction Algorithm:
  1. Calculate the distance from x to all points in your data (Udemy Course)
  2. Sort the points in your data by increasing distance from x (Udemy Course)
  3. Predict the majority label of the "k" closets points (Udemy Course)
  • Find the training examples that are nearest to the test example (Noel)
  • Predict the most frequent class among those . (Noel)