Difference between revisions of "Data Science"

@@ Line 5: / Line 5: @@
 <br />
 ==Data Mining - Machine Learning Algorithms==
-<br />
-===Boosting===
-<br />
-====Gradient boosting====
-https://medium.com/mlreview/gradient-boosting-from-scratch-1e317ae4587d
-https://freakonometrics.hypotheses.org/tag/gradient-boosting
-https://en.wikipedia.org/wiki/Gradient_boosting
-https://www.researchgate.net/publication/326379229_Exploring_the_clinical_features_of_narcolepsy_type_1_versus_narcolepsy_type_2_from_European_Narcolepsy_Network_database_with_machine_learning
-http://arogozhnikov.github.io/2016/06/24/gradient_boosting_explained.html
-Boosting is a machine learning ensemble meta-algorithm for primarily reducing bias, and also variance in supervised learning, and a family of machine learning algorithms that convert weak learners to strong ones. Boosting is based on the question posed by Kearns and Valiant (1988, 1989): "Can a set of weak learners create a single strong learner?" A weak learner is defined to be a classifier that is only slightly correlated with the true classification (it can label examples better than random guessing). In contrast, a strong learner is a classifier that is arbitrarily well-correlated with the true classification. https://en.wikipedia.org/wiki/Gradient_boosting
-Gradient boosting is a machine learning technique for regression and classification problems, which produces a prediction model in the form of an ensemble of weak prediction models, typically decision trees. It builds the model in a stage-wise fashion like other boosting methods do, and it generalizes them by allowing optimization of an arbitrary differentiable loss function.
-Boosting is a sequential process; i.e., trees are grown using the information from a previously grown tree one after the other. This process slowly learns from data and tries to improve its prediction in subsequent iterations. Let's look at a classic classification example:
-[[File:How_does_boosting_work.png|600px|thumb|center|]]
-Four classifiers (in 4 boxes), shown above, are trying hard to classify + and - classes as homogeneously as possible. Let's understand this picture well:
-*Box 1: The first classifier creates a vertical line (split) at D1. It says anything to the left of D1 is + and anything to the right of D1 is -. However, this classifier misclassifies three + points.
-*Box 2: The next classifier says don't worry I will correct your mistakes. Therefore, it gives more weight to the three + misclassified points (see bigger size of +) and creates a vertical line at D2. Again it says, anything to right of D2 is - and left is +.  Still, it makes mistakes by incorrectly classifying three - points.
-*Box 3: The next classifier continues to bestow support. Again, it gives more weight to the three - misclassified points and creates a horizontal line at D3. Still, this classifier fails to classify the points (in circle) correctly.
-*Remember that each of these classifiers has a misclassification error associated with them.
-*Boxes 1,2, and 3 are weak classifiers. These classifiers will now be used to create a strong classifier Box 4.
-*Box 4: It is a weighted combination of the weak classifiers. As you can see, it does good job at classifying all the points correctly.
-That's the basic idea behind boosting algorithms. The very next model capitalizes on the misclassification/error of previous model and tries to reduce it.

Difference between revisions of "Data Science"

Revision as of 23:46, 9 July 2020

Contents

Mathematics and Statistics for Data Science

Data Mining - Machine Learning Algorithms

Python for Data Science

NumPy and Pandas

Data Visualization with Python

Natural Language Processing

Dash - Plotly

Scrapy

R

R tutorial

RapidMiner

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Tools