|
|
| (155 intermediate revisions by the same user not shown) |
| Line 1: |
Line 1: |
| − | ==K-Nearest Neighbour==
| |
| | | | |
| − | * 15/06: Recorded class - K-Nearest Neighbour
| |
| − |
| |
| − | :* https://drive.google.com/drive/folders/1BaordCV9vw-gxLdJBMbWioX2NW7Ty9Lm
| |
| − |
| |
| − | :* https://drive.google.com/drive/folders/1BaordCV9vw-gxLdJBMbWioX2NW7Ty9Lm
| |
| − |
| |
| − |
| |
| − | * StatQuest: https://www.youtube.com/watch?v=HVXime0nQeI
| |
| − |
| |
| − |
| |
| − | <img src="https://upload.wikimedia.org/wikipedia/commons/e/e7/KnnClassification.svg" class="center" style="width: 200pt" />
| |
| − |
| |
| − |
| |
| − | <br >
| |
| − | KNN determines the class of a given unlabeled observation by identifying the k-nearest labeled observations to it. In other words, the algorithm assigns a given unlabeled observation to the class that has more similar labeled instances. This is a simple method, but very powerful.
| |
| − |
| |
| − |
| |
| − | [[File:KNearest_Neighbors_from_the_Udemy_course_Pierian_data1.mp4|800px|thumb|center|Udemy course, Pierian data https://www.udemy.com/course/python-for-data-science-and-machine-learning-bootcamp/]]
| |
| − |
| |
| − |
| |
| − | k-NN is ideal for classification tasks where relationships among the attributes and target classes are:
| |
| − | * numerous
| |
| − | * complex
| |
| − | * difficult to interpret and
| |
| − | * where instances of a class are fairly homogeneous
| |
| − |
| |
| − |
| |
| − | <br />
| |
| − | '''Applications of this learning method include:'''
| |
| − | * Computer vision applications:
| |
| − | :* Optical character recognition
| |
| − | :* Face recognition
| |
| − | * Recommendation systems
| |
| − | * Pattern detection in genetic data
| |
| − |
| |
| − |
| |
| − | <br />
| |
| − | Basic Implementation:
| |
| − |
| |
| − | * Training Algorithm:
| |
| − | :* Simply store the training examples
| |
| − |
| |
| − |
| |
| − | * Prediction Algorithm:
| |
| − | :# Calculate the distance from x to all points in your data (Udemy Course)
| |
| − | :# Sort the points in your data by increasing distance from x (Udemy Course)
| |
| − | :# Predict the majority label of the "k" closets points (Udemy Course)
| |
| − |
| |
| − | :* Find the <math>k</math> training examples <math>(x_{1},y_{1}),...(x_{k},y_{k})</math> that are '''nearest''' to the test example <math>x</math> (Noel)
| |
| − | :* Predict the most frequent class among those <math>y_{i}'s</math>. (Noel)
| |
| − |
| |
| − |
| |
| − | * '''Improvements:'''
| |
| − | :* Weighting training examples based on their distance
| |
| − | :* Alternative measures of "nearness"
| |
| − | :* Finding "close" examples in a large training set quickly
| |
| − |
| |
| − |
| |
| − | '''Strengths and Weaknesses:'''
| |
| − | {| class="wikitable"
| |
| − | |+
| |
| − | !Strengths
| |
| − | !Weaknesses
| |
| − | |-
| |
| − | |The algorithm is simple and effective
| |
| − | |The method does not produce any model which limits potential insights about the relationship between features
| |
| − | |-
| |
| − | |Fast training phase
| |
| − | |Slow classification phase. Requires lots of memory
| |
| − | |-
| |
| − | |Capable of reflecting complex relationships
| |
| − | |Can not handle nominal feature or missing data without additional pre-processing
| |
| − | |-
| |
| − | |Unlike many other methods, no assumptions about the distribution of the data are made
| |
| − | |
| |
| − | |}
| |
| − |
| |
| − |
| |
| − |
| |
| − | * Classifying a new example:
| |
| − |
| |
| − |
| |
| − | <br />
| |