Difference between revisions of "Página de pruebas"

From Sinfronteras
Jump to: navigation, search
Line 10: Line 10:
  
  
{| class="wikitable" style="vertical-align: text-top;"
+
{| class="wikitable"  
 
|+
 
|+
 
! colspan="6" style="text-align: left; font-weight: normal" |
 
! colspan="6" style="text-align: left; font-weight: normal" |
Line 25: Line 25:
 
!style="width: 15%"|Improvements
 
!style="width: 15%"|Improvements
 
|-
 
|-
|
+
|style="vertical-align: text-top;"|
 
KNN can be used for both classification and regression predictive problems. However, it is more widely used in classification problems in the industry. <nowiki>https://www.analyticsvidhya.com/blog/2018/03/introduction-k-neighbours-algorithm-clustering/</nowiki>
 
KNN can be used for both classification and regression predictive problems. However, it is more widely used in classification problems in the industry. <nowiki>https://www.analyticsvidhya.com/blog/2018/03/introduction-k-neighbours-algorithm-clustering/</nowiki>
 
|
 
|

Revision as of 18:10, 16 January 2021

K-Nearest Neighbour

  • Recorded Noel class (15/06):


KNN is a model that classifies a new data point based on the points that are closest in distance to the new point. The principle behind nearest neighbor methods is to find a predefined number of training samples (K) closest in distance to the new data point. Then, the class of the new data point will be the most common class in the k training samples. https://scikit-learn.org/stable/modules/neighbors.html [Adelo] In other words, KNN determines the class of a given unlabeled observation by identifying the most common class among the k-nearest labeled observations to it.

This is a simple method, but extremely powerful.

Regression/Classification Applications Strengths Weaknesses Comments Improvements

KNN can be used for both classification and regression predictive problems. However, it is more widely used in classification problems in the industry. https://www.analyticsvidhya.com/blog/2018/03/introduction-k-neighbours-algorithm-clustering/

  • Computer vision applications:
  • Optical character recognition
  • Face recognition
  • Recommendation systems
  • Pattern detection in genetic data
  • The algorithm is simple and effective
  • Fast training phase
  • Capable of reflecting complex relationships
  • Unlike many other methods, no assumptions about the distribution of the data are made
  • The method does not produce any model which limits potential insights about the relationship between features
  • Slow classification phase. Requires lots of memory
  • Can not handle nominal feature or missing data without additional pre-processing

k-NN is ideal for classification tasks where relationships among the attributes and target classes are:

  • numerous
  • complex
  • difficult to interpret and
  • where instances of a class are fairly homogeneous
  • Weighting training examples based on their distance
  • Alternative measures of "nearness"
  • Finding "close" examples in a large training set quickly



Basic Implementation:

  • Training Algorithm:
  • Simply store the training examples


  • Prediction Algorithm:
  1. Calculate the distance from x to all points in your data (Udemy Course)
  2. Sort the points in your data by increasing distance from x (Udemy Course)
  3. Predict the majority label of the "k" closets points (Udemy Course)
  • Find the training examples that are nearest to the test example (Noel)
  • Predict the most frequent class among those . (Noel)