Difference between revisions of "Página de pruebas"
Adelo Vieira (talk | contribs) |
Adelo Vieira (talk | contribs) |
||
| Line 1: | Line 1: | ||
| + | ==K-Nearest Neighbour== | ||
| − | + | * 15/06: Recorded class - K-Nearest Neighbour | |
| − | + | ||
| − | '' | + | :* https://drive.google.com/drive/folders/1BaordCV9vw-gxLdJBMbWioX2NW7Ty9Lm |
| − | + | ||
| − | {| | + | :* https://drive.google.com/drive/folders/1BaordCV9vw-gxLdJBMbWioX2NW7Ty9Lm |
| + | |||
| + | |||
| + | |||
| + | * StatQuest: https://www.youtube.com/watch?v=HVXime0nQeI | ||
| + | |||
| + | |||
| + | <br > | ||
| + | KNN determines the class of a given unlabeled observation by identifying the k-nearest labeled observations to it. In other words, the algorithm assigns a given unlabeled observation to the class that has more similar labeled instances. This is a simple method, but very powerful. | ||
| + | |||
| + | |||
| + | [[File:KNearest_Neighbors_from_the_Udemy_course_Pierian_data1.mp4|800px|thumb|center|Udemy course, Pierian data https://www.udemy.com/course/python-for-data-science-and-machine-learning-bootcamp/]] | ||
| + | |||
| + | |||
| + | k-NN is ideal for classification tasks where relationships among the attributes and target classes are: | ||
| + | * numerous | ||
| + | * complex | ||
| + | * difficult to interpret and | ||
| + | * where instances of a class are fairly homogeneous | ||
| + | |||
| + | |||
| + | <br /> | ||
| + | '''Applications of this learning method include:''' | ||
| + | * Computer vision applications: | ||
| + | :* Optical character recognition | ||
| + | :* Face recognition | ||
| + | * Recommendation systems | ||
| + | * Pattern detection in genetic data | ||
| + | |||
| + | |||
| + | <br /> | ||
| + | Basic Implementation: | ||
| + | |||
| + | * Training Algorithm: | ||
| + | :* Simply store the training examples | ||
| + | |||
| + | |||
| + | * Prediction Algorithm: | ||
| + | :# Calculate the distance from x to all points in your data (Udemy Course) | ||
| + | :# Sort the points in your data by increasing distance from x (Udemy Course) | ||
| + | :# Predict the majority label of the "k" closets points (Udemy Course) | ||
| + | |||
| + | :* Find the <math>k</math> training examples <math>(x_{1},y_{1}),...(x_{k},y_{k})</math> that are '''nearest''' to the test example <math>x</math> (Noel) | ||
| + | :* Predict the most frequent class among those <math>y_{i}'s</math>. (Noel) | ||
| + | |||
| + | |||
| + | * '''Improvements:''' | ||
| + | :* Weighting training examples based on their distance | ||
| + | :* Alternative measures of "nearness" | ||
| + | :* Finding "close" examples in a large training set quickly | ||
| + | |||
| + | |||
| + | '''Strengths and Weaknesses:''' | ||
| + | {| class="wikitable" | ||
| + | |+ | ||
| + | !Strengths | ||
| + | !Weaknesses | ||
| + | |- | ||
| + | |The algorithm is simple and effective | ||
| + | |The method does not produce any model which limits potential insights about the relationship between features | ||
| + | |- | ||
| + | |Fast training phase | ||
| + | |Slow classification phase. Requires lots of memory | ||
| + | |- | ||
| + | |Capable of reflecting complex relationships | ||
| + | |Can not handle nominal feature or missing data without additional pre-processing | ||
|- | |- | ||
| + | |Unlike many other methods, no assumptions about the distribution of the data are made | ||
| | | | ||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
|} | |} | ||
| − | + | ||
| − | </ | + | |
| + | |||
| + | * Classifying a new example: | ||
| + | |||
| + | |||
| + | <br /> | ||
Revision as of 00:39, 16 January 2021
K-Nearest Neighbour
- 15/06: Recorded class - K-Nearest Neighbour
- StatQuest: https://www.youtube.com/watch?v=HVXime0nQeI
KNN determines the class of a given unlabeled observation by identifying the k-nearest labeled observations to it. In other words, the algorithm assigns a given unlabeled observation to the class that has more similar labeled instances. This is a simple method, but very powerful.
Udemy course, Pierian data https://www.udemy.com/course/python-for-data-science-and-machine-learning-bootcamp/
k-NN is ideal for classification tasks where relationships among the attributes and target classes are:
- numerous
- complex
- difficult to interpret and
- where instances of a class are fairly homogeneous
Applications of this learning method include:
- Computer vision applications:
- Optical character recognition
- Face recognition
- Recommendation systems
- Pattern detection in genetic data
Basic Implementation:
- Training Algorithm:
- Simply store the training examples
- Prediction Algorithm:
- Calculate the distance from x to all points in your data (Udemy Course)
- Sort the points in your data by increasing distance from x (Udemy Course)
- Predict the majority label of the "k" closets points (Udemy Course)
- Find the training examples Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle (x_{1},y_{1}),...(x_{k},y_{k})} that are nearest to the test example Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle x} (Noel)
- Predict the most frequent class among those Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle y_{i}'s} . (Noel)
- Improvements:
- Weighting training examples based on their distance
- Alternative measures of "nearness"
- Finding "close" examples in a large training set quickly
Strengths and Weaknesses:
| Strengths | Weaknesses |
|---|---|
| The algorithm is simple and effective | The method does not produce any model which limits potential insights about the relationship between features |
| Fast training phase | Slow classification phase. Requires lots of memory |
| Capable of reflecting complex relationships | Can not handle nominal feature or missing data without additional pre-processing |
| Unlike many other methods, no assumptions about the distribution of the data are made |
- Classifying a new example: