Difference between revisions of "Página de pruebas"

Revision as of 18:10, 16 January 2021

K-Nearest Neighbour

Recorded Noel class (15/06):

https://drive.google.com/drive/folders/1BaordCV9vw-gxLdJBMbWioX2NW7Ty9Lm

https://drive.google.com/drive/folders/1BaordCV9vw-gxLdJBMbWioX2NW7Ty9Lm

StatQuest: https://www.youtube.com/watch?v=HVXime0nQeI


KNN is a model that classifies a new data point based on the points that are closest in distance to the new point. The principle behind nearest neighbor methods is to find a predefined number of training samples (K) closest in distance to the new data point. Then, the class of the new data point will be the most common class in the k training samples. https://scikit-learn.org/stable/modules/neighbors.html [Adelo] In other words, KNN determines the class of a given unlabeled observation by identifying the most common class among the k-nearest labeled observations to it. This is a simple method, but extremely powerful.
Regression/Classification	Applications	Strengths	Weaknesses	Comments	Improvements
KNN can be used for both classification and regression predictive problems. However, it is more widely used in classification problems in the industry. https://www.analyticsvidhya.com/blog/2018/03/introduction-k-neighbours-algorithm-clustering/	Computer vision applications: Optical character recognition Face recognition Recommendation systems Pattern detection in genetic data	The algorithm is simple and effective Fast training phase Capable of reflecting complex relationships Unlike many other methods, no assumptions about the distribution of the data are made	The method does not produce any model which limits potential insights about the relationship between features Slow classification phase. Requires lots of memory Can not handle nominal feature or missing data without additional pre-processing	k-NN is ideal for classification tasks where relationships among the attributes and target classes are: numerous complex difficult to interpret and where instances of a class are fairly homogeneous	Weighting training examples based on their distance Alternative measures of "nearness" Finding "close" examples in a large training set quickly

Basic Implementation:

Training Algorithm:

Simply store the training examples

Prediction Algorithm:

Calculate the distance from x to all points in your data (Udemy Course)
Sort the points in your data by increasing distance from x (Udemy Course)
Predict the majority label of the "k" closets points (Udemy Course)

Find the $k$ training examples $(x_{1},y_{1}),...(x_{k},y_{k})$ that are nearest to the test example $x$ (Noel)
Predict the most frequent class among those $y_{i}'s$ . (Noel)

Udemy course, Pierian data https://www.udemy.com/course/python-for-data-science-and-machine-learning-bootcamp/

@@ Line 10: / Line 10: @@
-{| class="wikitable" style="vertical-align: text-top;"
+{| class="wikitable"
 |+
 ! colspan="6" style="text-align: left; font-weight: normal" |
@@ Line 25: / Line 25: @@
 !style="width: 15%"|Improvements
 |-
-|
+|style="vertical-align: text-top;"|
 KNN can be used for both classification and regression predictive problems. However, it is more widely used in classification problems in the industry. <nowiki>https://www.analyticsvidhya.com/blog/2018/03/introduction-k-neighbours-algorithm-clustering/</nowiki>
 |

Difference between revisions of "Página de pruebas"

Revision as of 18:10, 16 January 2021

K-Nearest Neighbour

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Tools