Difference between revisions of "Página de pruebas"

From Sinfronteras
Jump to: navigation, search
Line 1: Line 1:
 
==K-Nearest Neighbour==
 
==K-Nearest Neighbour==
  
* 15/06: Recorded class - K-Nearest Neighbour
+
* Recorded Noel class (15/06):
  
 
:* https://drive.google.com/drive/folders/1BaordCV9vw-gxLdJBMbWioX2NW7Ty9Lm
 
:* https://drive.google.com/drive/folders/1BaordCV9vw-gxLdJBMbWioX2NW7Ty9Lm
  
 
:* https://drive.google.com/drive/folders/1BaordCV9vw-gxLdJBMbWioX2NW7Ty9Lm
 
:* https://drive.google.com/drive/folders/1BaordCV9vw-gxLdJBMbWioX2NW7Ty9Lm
 
  
 
* StatQuest: https://www.youtube.com/watch?v=HVXime0nQeI
 
* StatQuest: https://www.youtube.com/watch?v=HVXime0nQeI
Line 17: Line 16:
  
 
This is a simple method, but extremely powerful.
 
This is a simple method, but extremely powerful.
 
 
<img src="https://upload.wikimedia.org/wikipedia/commons/e/e7/KnnClassification.svg"  class="center" style="display: block; margin-left: auto; margin-right: auto; width: 300pt;" />
 
 
 
[[File:KNearest_Neighbors_from_the_Udemy_course_Pierian_data1.mp4|800px|thumb|center|Udemy course, Pierian data https://www.udemy.com/course/python-for-data-science-and-machine-learning-bootcamp/]]
 
  
  
Line 88: Line 81:
  
  
 +
<img src="https://upload.wikimedia.org/wikipedia/commons/e/e7/KnnClassification.svg"  class="center" style="display: block; margin-left: auto; margin-right: auto; width: 300pt;" />
  
* Classifying a new example:
+
 
 +
[[File:KNearest_Neighbors_from_the_Udemy_course_Pierian_data1.mp4|800px|thumb|center|Udemy course, Pierian data https://www.udemy.com/course/python-for-data-science-and-machine-learning-bootcamp/]]
  
  
 
<br />
 
<br />

Revision as of 17:36, 16 January 2021

K-Nearest Neighbour

  • Recorded Noel class (15/06):


KNN is a model that classifies a new data point based on the points that are closest in distance to the new point. The principle behind nearest neighbor methods is to find a predefined number of training samples (K) closest in distance to the new data point. Then, the class of the new data point will be the most common class in the k training samples. https://scikit-learn.org/stable/modules/neighbors.html [Adelo]

In other words, KNN determines the class of a given unlabeled observation by identifying the most common class among the k-nearest labeled observations to it.

This is a simple method, but extremely powerful.



KNN can be used for both classification and regression predictive problems. However, it is more widely used in classification problems in the industry. https://www.analyticsvidhya.com/blog/2018/03/introduction-k-neighbours-algorithm-clustering/



Applications of this learning method include:

  • Computer vision applications:
  • Optical character recognition
  • Face recognition
  • Recommendation systems
  • Pattern detection in genetic data



k-NN is ideal for classification tasks where relationships among the attributes and target classes are:

  • numerous
  • complex
  • difficult to interpret and
  • where instances of a class are fairly homogeneous



Basic Implementation:

  • Training Algorithm:
  • Simply store the training examples


  • Prediction Algorithm:
  1. Calculate the distance from x to all points in your data (Udemy Course)
  2. Sort the points in your data by increasing distance from x (Udemy Course)
  3. Predict the majority label of the "k" closets points (Udemy Course)
  • Find the Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle k} training examples Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle (x_{1},y_{1}),...(x_{k},y_{k})} that are nearest to the test example Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle x} (Noel)
  • Predict the most frequent class among those Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle y_{i}'s} . (Noel)


  • Improvements:
  • Weighting training examples based on their distance
  • Alternative measures of "nearness"
  • Finding "close" examples in a large training set quickly


Strengths and Weaknesses:

Strengths Weaknesses
The algorithm is simple and effective The method does not produce any model which limits potential insights about the relationship between features
Fast training phase Slow classification phase. Requires lots of memory
Capable of reflecting complex relationships Can not handle nominal feature or missing data without additional pre-processing
Unlike many other methods, no assumptions about the distribution of the data are made