Difference between revisions of "Página de pruebas"

From Sinfronteras
Jump to: navigation, search
(Blanked the page)
(Tag: Blanking)
 
(131 intermediate revisions by the same user not shown)
Line 1: Line 1:
==K-Nearest Neighbour==
 
  
* Recorded Noel class (15/06):
 
 
:* https://drive.google.com/drive/folders/1BaordCV9vw-gxLdJBMbWioX2NW7Ty9Lm
 
 
:* https://drive.google.com/drive/folders/1BaordCV9vw-gxLdJBMbWioX2NW7Ty9Lm
 
 
* StatQuest: https://www.youtube.com/watch?v=HVXime0nQeI
 
 
 
{| class="wikitable"
 
|+
 
! colspan="6" style="text-align: left; font-weight: normal" |
 
KNN classifies a new data point based on the points that are closest in distance to the new point. The principle behind KNN is to find a predefined number of training samples (''K'') closest in distance to the new data point. Then, the class of the new data point will be the most common class in the k nearest training samples. https://scikit-learn.org/stable/modules/neighbors.html [Adelo]
 
In other words, KNN determines the class of a given unlabeled observation by identifying the most common class among the k-nearest labeled observations to it.
 
 
This is a simple method, but extremely powerful.
 
|-
 
!style="width: 17%"|'''Regression/Classification'''
 
!style="width: 17%"|'''Applications'''
 
!style="width: 17%"|Strengths
 
!style="width: 17%"|Weaknesses
 
!style="width: 17%"|Comments
 
!style="width: 15%"|Improvements
 
|-style="vertical-align: text-top;"
 
|
 
KNN can be used for both classification and regression predictive problems. However, it is more widely used in classification problems in the industry. <nowiki>https://www.analyticsvidhya.com/blog/2018/03/introduction-k-neighbours-algorithm-clustering/</nowiki>
 
|
 
* Face recognition
 
* Optical character recognition
 
 
* Recommendation systems
 
* Pattern detection in genetic data
 
|
 
* The algorithm is simple and effective
 
* Fast training phase
 
* Capable of reflecting complex relationships
 
* Unlike many other methods, no assumptions about the distribution of the data are made
 
|
 
* Slow classification phase. Requires lots of memory
 
* The method does not produce any model which limits potential insights about the relationship between features
 
* Can not handle nominal feature or missing data without additional pre-processing
 
|
 
k-NN is ideal for classification tasks where relationships among the attributes and target classes are:
 
 
* numerous
 
* complex
 
* difficult to interpret and
 
* where instances of a class are fairly homogeneous
 
|
 
:* Weighting training examples based on their distance
 
:* Alternative measures of "nearness"
 
:* Finding "close" examples in a large training set quickly
 
|}
 
 
 
<br />
 
'''Basic Implementation:'''
 
 
* Training Algorithm:
 
:* Simply store the training examples
 
 
 
* Prediction Algorithm:
 
:# Calculate the distance from the new data point to all points in the data.
 
:# Sort the points in your data by increasing the distance from the new data point.
 
:# Determine the most frequent class among the k nearest points</math>.
 
 
 
<br />
 
<img src="https://upload.wikimedia.org/wikipedia/commons/e/e7/KnnClassification.svg" style="display: block; margin-left: auto; margin-right: auto; width: 300pt;" />
 
 
<div style="text-align: left; display:block; margin-right: auto; margin-left: auto; width:500pt">Example of k-NN classification. The test sample (green dot) should be classified either to blue squares or to red triangles. If k = 3 (solid line circle) it is assigned to the red triangles because there are 2 triangles and only 1 square inside the inner circle. If k = 5 (dashed line circle) it is assigned to the blue squares (3 squares vs. 2 triangles inside the outer circle). Taken from https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm</div>
 
 
 
[[File:KNearest_Neighbors_from_the_Udemy_course_Pierian_data1.mp4|800px|thumb|center|Udemy course, Pierian data https://www.udemy.com/course/python-for-data-science-and-machine-learning-bootcamp/]]
 
 
 
<br />
 

Latest revision as of 22:25, 23 February 2026