Difference between revisions of "Página de pruebas"

From Sinfronteras
Jump to: navigation, search
Line 72: Line 72:
 
<img src="https://upload.wikimedia.org/wikipedia/commons/e/e7/KnnClassification.svg" style="display: block; margin-left: auto; margin-right: auto; width: 300pt;" />
 
<img src="https://upload.wikimedia.org/wikipedia/commons/e/e7/KnnClassification.svg" style="display: block; margin-left: auto; margin-right: auto; width: 300pt;" />
  
<div style="text-align: center; width:500pt">Example of k-NN classification. The test sample (green dot) should be classified either to blue squares or to red triangles. If k = 3 (solid line circle) it is assigned to the red triangles because there are 2 triangles and only 1 square inside the inner circle. If k = 5 (dashed line circle) it is assigned to the blue squares (3 squares vs. 2 triangles inside the outer circle). Taken from https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm</div>
+
<div style="text-align: left; display:block; margin-right: auto; margin-left: auto; width:500pt">Example of k-NN classification. The test sample (green dot) should be classified either to blue squares or to red triangles. If k = 3 (solid line circle) it is assigned to the red triangles because there are 2 triangles and only 1 square inside the inner circle. If k = 5 (dashed line circle) it is assigned to the blue squares (3 squares vs. 2 triangles inside the outer circle). Taken from https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm</div>
  
  

Revision as of 19:32, 16 January 2021

K-Nearest Neighbour

  • Recorded Noel class (15/06):


KNN classifies a new data point based on the points that are closest in distance to the new point. The principle behind KNN is to find a predefined number of training samples (K) closest in distance to the new data point. Then, the class of the new data point will be the most common class in the k nearest training samples. https://scikit-learn.org/stable/modules/neighbors.html [Adelo] In other words, KNN determines the class of a given unlabeled observation by identifying the most common class among the k-nearest labeled observations to it.

This is a simple method, but extremely powerful.

Regression/Classification Applications Strengths Weaknesses Comments Improvements

KNN can be used for both classification and regression predictive problems. However, it is more widely used in classification problems in the industry. https://www.analyticsvidhya.com/blog/2018/03/introduction-k-neighbours-algorithm-clustering/

  • Face recognition
  • Optical character recognition
  • Recommendation systems
  • Pattern detection in genetic data
  • The algorithm is simple and effective
  • Fast training phase
  • Capable of reflecting complex relationships
  • Unlike many other methods, no assumptions about the distribution of the data are made
  • Slow classification phase. Requires lots of memory
  • The method does not produce any model which limits potential insights about the relationship between features
  • Can not handle nominal feature or missing data without additional pre-processing

k-NN is ideal for classification tasks where relationships among the attributes and target classes are:

  • numerous
  • complex
  • difficult to interpret and
  • where instances of a class are fairly homogeneous
  • Weighting training examples based on their distance
  • Alternative measures of "nearness"
  • Finding "close" examples in a large training set quickly



Basic Implementation:

  • Training Algorithm:
  • Simply store the training examples


  • Prediction Algorithm:
  1. Calculate the distance from the new data point to all points in the data.
  2. Sort the points in your data by increasing the distance from the new data point.
  3. Determine the most frequent class among the k nearest points</math>.



Example of k-NN classification. The test sample (green dot) should be classified either to blue squares or to red triangles. If k = 3 (solid line circle) it is assigned to the red triangles because there are 2 triangles and only 1 square inside the inner circle. If k = 5 (dashed line circle) it is assigned to the blue squares (3 squares vs. 2 triangles inside the outer circle). Taken from https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm