Difference between revisions of "Página de pruebas"

Revision as of 11:11, 18 January 2021

Decision Trees

10/06: Recorded class - Decision tree

https://drive.google.com/drive/folders/1TW494XF-laGGJiLFApz8bJfMstG4Yqk_

What is a Decision Tree

In general terms, we can say that "A decision tree is a flowchart-like structure in which each internal node represents a "test" on an attribute (e.g. whether a coin flip comes up heads or tails), each branch represents the outcome of the test, and each leaf node represents a class label (decision taken after computing all attributes). The paths from root to leaf represent classification rules". https://en.wikipedia.org/wiki/Decision_tree

In a ML context, we can say that DT is A predictive model or a ML method based on a series of branching Boolean tests, splitting the data into smaller and smaller subsets to identify patterns that can be used for prediction. The model is a series of logical decisions on an attribute (tests for which the answer is true or false) [Noel Cosgrave slides]

How Decision Trees Work

Decision tree learners build models in the form of tree structures.

The model is a series of logical decisions on an attribute (tests for which the answer is true or false)

Decision nodes split the data using these tests

Leaf nodes assign a predicted class based on a combination of the decisions in higher nodes

Data travels through the tree from root to leaf nodes

A decision rule is the path from the root node to the leaf node

Each rule is mutually exclusive
Every row (or observation) in the data is covered by a single rule
Covering a data observation means that the observation satisfies the conditions of the rule

A dataset can have many possible decision trees

In practice, we want small & accurate trees

What is the inductive bias in Decision Tree learning?

Shorter trees are preferred over longer trees: Smaller trees are more general, usually more accurate, and easier to understand by humans (and to communicate!)

Prefer trees that place high information gain attributes close to the root

More succinct hypotheses are preferred. Why?

There are fewer succinct hypotheses than complex ones
If a succinct hypothesis explains the data this is unlikely to be a coincidence
However, not every succinct hypothesis is a reasonable one.

Example 1:

The model for predicting the future success of a movie can be represented as a simple tree:

Movie	Number of celebrities	Budget	Success (True label)	Success (Predicted label)
Movie 1	low	hight	boxoffice flop	boxoffice flop
Movie 2	low	low	boxoffice flop	boxoffice flop
Movie 3	low	hight	boxoffice flop	boxoffice flop
Movie 4	low	low	boxoffice flop	boxoffice flop
Movie 5	low	hight	boxoffice flop	boxoffice flop
Movie 6	low	low	boxoffice flop	boxoffice flop
Movie 7	low	hight	boxoffice flop	boxoffice flop
Movie 8	low	low	boxoffice flop	boxoffice flop
Movie 9	low	hight	boxoffice flop	boxoffice flop
Movie 10	low	low	boxoffice flop	boxoffice flop
Movie 11	hight	hight	mainstream hit	mainstream hit
Movie 12	hight	hight	mainstream hit	mainstream hit
Movie 13	hight	hight	mainstream hit	mainstream hit
Movie 14	low	hight	mainstream hit	boxoffice flop
Movie 15	hight	hight	mainstream hit	mainstream hit
Movie 16	hight	hight	mainstream hit	mainstream hit
Movie 17	hight	hight	mainstream hit	mainstream hit
Movie 18	hight	hight	mainstream hit	mainstream hit
Movie 19	hight	hight	mainstream hit	mainstream hit
Movie 20	hight	hight	mainstream hit	mainstream hit
Movie 21	hight	low	critical success	critical success
Movie 22	hight	low	critical success	critical success
Movie 23	hight	low	critical success	critical success
Movie 24	low	hight	critical success	boxoffice flop
Movie 25	hight	low	critical success	critical success
Movie 26	hight	hight	critical success	mainstream hit
Movie 27	hight	low	critical success	critical success
Movie 28	hight	low	critical success	critical success
Movie 29	hight	low	critical success	critical success
Movie 30	hight	low	critical success	critical success

Example 2:

This decision tree is used to decide whether or not to provide a loan to a customer

$15$ training examples
$Yes$ = approval, $No$ = rejection
The $x/y$ values mean that $x$ out of $y$ training examples that reach this leaf node has the class of the leaf. This is the confidence
The $x$ value is the support count
$x/total$ training examples is the support

The algorithm

This is a basic but nice explanation of the algorithm.

The algorithm described above is something like this:

We first need to determine which will be the Root:

The attribute (Chest pain, Good blood circulation, Blocked arteries) that determines better whether a Patient has Heart Disease or not will be chosen as the Root.

To do so, we need to evaluate the three attributes by calculating what is known as Impurity, which is a measure of how bad the Attribute is able to separate (determine) our label attribute (Heart Disease in our case).

There are many ways to measure Impurity, one of the popular ones is to calculate the Gini impurity.

So, let's calculate the Gini impurity for our «Chest pain» attribute:

We look at chest pain for all 303 patients in our data:

Taken from https://www.youtube.com/watch?v=7VeUPuFGJHk

We calculate the Gini impurity for each branch (True: The patient has chest pain | False: The patient does not have chest pain):

\color {blue}{Gini\ Impurity=1-\sum _{i=1}^{N}P_{i}^{2}}

For the True branch:

Gini\ Impurity_{True}=1-(Probability\ of\ Yes)^{2}+(Probability\ of\ No)^{2}

Gini\ Impurity_{True}=1-{\biggl (}{\frac {105}{105+39}}{\biggl )}^{2}-{\biggl (}{\frac {39}{105+39}}{\biggl )}^{2}=0.395

For the False branch:

Gini\ Impurity_{False}=0.336

$\color {blue}{Total\ Gini\ Impurity=Weighted\ average\ of\ Gini\ impurities\ for\ the\ leaf\ nodes\ (branches)}$

Gini\ Impurity_{Chestpain}={\biggl (}{\frac {144}{144+159}}{\biggl )}0.395\ +\ {\biggl (}{\frac {159}{144+159}}{\biggl )}0.336=0.364

Good Blood Circulation

Gini\ Impurity_{Good\ Blood\ Circulation}=0.360

Blocked Arteries

Gini\ Impurity_{Blocked\ Arteries}=0.381

Taken from https://www.youtube.com/watch?v=7VeUPuFGJHk

Then, we follow the same method to determine the following nodes

A node becomes a leaf when the Gini impurity for a node before is lower than the Gini impurity after using a new attribute

Taken from https://www.youtube.com/watch?v=7VeUPuFGJHk

At the end, our DT looks like this:

Taken from https://www.youtube.com/watch?v=7VeUPuFGJHk

For numeric data:

Ex.: Patient weight

Ranked data and Multiple choice data:

Ranked data: For example, "Rank my jokes on a scale of 1 to 4".

Multiple choice data: For example, "Which color do you like: red, blue or green".

Decision Trees, Part 2 - Feature Selection and Missing Data: https://www.youtube.com/watch?v=wpNl-JwwplA

Regression Trees: https://www.youtube.com/watch?v=g9c66TUylZ4

The ID3 algorithm

ID3 (Quinlan, 1986) is an early algorithm for learning Decision Trees. The learning of the tree is top-down. The algorithm is greedy, it looks at a single attribute and gain in each step. This may fail when a combination of attributes is needed to improve the purity of a node.

At each split, the question is "which attribute should be tested next? Which logical test gives us more information?". This is determined by the measures of entropy and information gain. These are discussed later.

A new decision node is then created for each outcome of the test and examples are partitioned according to this value. The process is repeated for each new node until all the examples are classified correctly or there are no attributes left.

The C5.0 algorithm

C5.0 (Quinlan, 1993) is a refinement of C4.5 which in itself improved upon ID3. It is the industry standard for producing decision trees. It performs well for most problems out-of-the-box. Unlike other machine-learning techniques, it has high explanatory power, it can be understood and explained.

.

Example in RapidMiner

File:Decision tree Example RapidMiner.pdf

Difference between revisions of "Página de pruebas"

Revision as of 11:11, 18 January 2021

Contents

Decision Trees

The algorithm

The ID3 algorithm

The C5.0 algorithm

Example in RapidMiner

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Tools

@@ Line 1: / Line 1: @@
+==Decision Trees==
+* 10/06: Recorded class - Decision tree
+:* https://drive.google.com/drive/folders/1TW494XF-laGGJiLFApz8bJfMstG4Yqk_
+<br />
+* '''What is a Decision Tree'''
+[[File:Decision_Trees_terminology1.png|400px|thumb|right|]]
+[[File:Decision_Trees_terminology2.png|400px|thumb|right|]]
+[[File:Decision_Trees_terminology3.png|400px|thumb|right|]]
+: In general terms, we can say that "A decision tree is a flowchart-like structure in which each internal node represents a "test" on an attribute (e.g. whether a coin flip comes up heads or tails), each branch represents the outcome of the test, and each leaf node represents a class label (decision taken after computing all attributes). The paths from root to leaf represent classification rules". https://en.wikipedia.org/wiki/Decision_tree
+: In a ML context, we can say that DT is A predictive model or a ML method based on a series of branching Boolean tests, splitting the data into smaller and smaller subsets to identify patterns that can be used for prediction. The model is a series of logical decisions on an attribute (tests for which the answer is true or false) [Noel Cosgrave slides]
+<br />
+* '''How Decision Trees Work'''
+:* Decision tree learners build models in the form of tree structures.
+:* The model is a series of logical decisions on an attribute (tests for which the answer is true or false)
+:* Decision nodes split the data using these tests
+:* Leaf nodes assign a predicted class based on a combination of the decisions in higher nodes
+:* Data travels through the tree from root to leaf nodes
+<br />
+* '''A decision rule''' is the path from the root node to the leaf node
+:* Each rule is mutually exclusive
+:* Every row (or observation) in the data is covered by a single rule
+:* Covering a data observation means that the observation satisfies the conditions of the rule
+<br />
+* '''A dataset can have many possible decision trees'''
+: In practice, we want small & accurate trees
+:* What is the inductive bias in Decision Tree learning?
+::* Shorter trees are preferred over longer trees: Smaller trees are more general, usually more accurate, and easier to understand by humans (and to communicate!)
+::* Prefer trees that place high information gain attributes close to the root
+:* More succinct hypotheses are preferred. Why?
+::* There are fewer succinct hypotheses than complex ones
+::* If a succinct hypothesis explains the data this is unlikely to be a coincidence
+::* However, not every succinct hypothesis is a reasonable one.
+<br />
+'''Example 1:'''
+The model for predicting the future success of a movie can be represented as a simple tree:
+{|
+|
+{| class="wikitable"
+!Movie
+!Number of celebrities
+!Budget
+!Success (True label)
+!Success (Predicted label)
+|- style="background-color:#ff794d"
+|Movie 1
+|low
+|hight
+|boxoffice flop
+|boxoffice flop
+|- style="background-color:#ff794d"
+|Movie 2
+|low
+|low
+|boxoffice flop
+|boxoffice flop
+|- style="background-color:#ff794d"
+|Movie 3
+|low
+|hight
+|boxoffice flop
+|boxoffice flop
+|- style="background-color:#ff794d"
+|Movie 4
+|low
+|low
+|boxoffice flop
+|boxoffice flop
+|- style="background-color:#ff794d"
+|Movie 5
+|low
+|hight
+|boxoffice flop
+|boxoffice flop
+|- style="background-color:#ff794d"
+|Movie 6
+|low
+|low
+|boxoffice flop
+|boxoffice flop
+|- style="background-color:#ff794d"
+|Movie 7
+|low
+|hight
+|boxoffice flop
+|boxoffice flop
+|- style="background-color:#ff794d"
+|Movie 8
+|low
+|low
+|boxoffice flop
+|boxoffice flop
+|- style="background-color:#ff794d"
+|Movie 9
+|low
+|hight
+|boxoffice flop
+|boxoffice flop
+|- style="background-color:#ff794d"
+|Movie 10
+|low
+|low
+|boxoffice flop
+|boxoffice flop
+|- style="background-color:#66a3ff"
+|Movie 11
+|hight
+|hight
+|mainstream hit
+|mainstream hit
+|- style="background-color:#66a3ff"
+|Movie 12
+|hight
+|hight
+|mainstream hit
+|mainstream hit
+|- style="background-color:#66a3ff"
+|Movie 13
+|hight
+|hight
+|mainstream hit
+|mainstream hit
+|- style="background-color:#66a3ff"
+|'''Movie 14'''
+|'''low'''
+|'''hight'''
+|'''mainstream hit'''
+|style="background-color:#ff794d"|'''boxoffice flop'''
+|- style="background-color:#66a3ff"
+|Movie 15
+|hight
+|hight
+|mainstream hit
+|mainstream hit
+|- style="background-color:#66a3ff"
+|Movie 16
+|hight
+|hight
+|mainstream hit
+|mainstream hit
+|- style="background-color:#66a3ff"
+|Movie 17
+|hight
+|hight
+|mainstream hit
+|mainstream hit
+|- style="background-color:#66a3ff"
+|Movie 18
+|hight
+|hight
+|mainstream hit
+|mainstream hit
+|- style="background-color:#66a3ff"
+|Movie 19
+|hight
+|hight
+|mainstream hit
+|mainstream hit
+|- style="background-color:#66a3ff"
+|Movie 20
+|hight
+|hight
+|mainstream hit
+|mainstream hit
+|- style="background-color:#00e6b0"
+|Movie 21
+|hight
+|low
+|critical success
+|critical success
+|- style="background-color:#00e6b0"
+|Movie 22
+|hight
+|low
+|critical success
+|critical success
+|- style="background-color:#00e6b0"
+|Movie 23
+|hight
+|low
+|critical success
+|critical success
+|- style="background-color:#00e6b0"
+|'''Movie 24'''
+|'''low'''
+|'''hight'''
+|'''critical success'''
+|style="background-color:#ff794d"|'''boxoffice flop'''
+|- style="background-color:#00e6b0"
+|Movie 25
+|hight
+|low
+|critical success
+|critical success
+|- style="background-color:#00e6b0"
+|'''Movie 26'''
+|'''hight'''
+|'''hight'''
+|'''critical success'''
+|style="background-color:#66a3ff"|'''mainstream hit'''
+|- style="background-color:#00e6b0"
+|Movie 27
+|hight
+|low
+|critical success
+|critical success
+|- style="background-color:#00e6b0"
+|Movie 28
+|hight
+|low
+|critical success
+|critical success
+|- style="background-color:#00e6b0"
+|Movie 29
+|hight
+|low
+|critical success
+|critical success
+|- style="background-color:#00e6b0"
+|Movie 30
+|hight
+|low
+|critical success
+|critical success
+|}
+|
+[[File:DecisionTree-NoelCosgraveSlides_2.png|500px|thumb|center|]]
+[[File:DecisionTree-NoelCosgraveSlides_1.png|500px|thumb|center|]]
+|}
+<br />
+'''Example 2:'''
+This decision tree is used to decide whether or not to provide a loan to a customer
+{|
+|[[File:DecisionTree-NoelCosgraveSlides_3.png|350px|thumb|center|]]
+|
+* <math>15</math> training examples
+* <math>Yes</math> = approval, <math>No</math> = rejection
+* The <math>x/y</math> values mean that <math>x</math> out of <math>y</math> training examples that reach this leaf node has the class of the leaf. This is the confidence
+* The <math>x</math> value is the support count
+* <math>x/total</math> training examples is the support
+|}
+<br />
+===The algorithm===
+This is a basic but nice explanation of the algorithm.
+<center>
+<div style="width: 420pt;">{{#ev:youtube|https://www.youtube.com/watch?v=7VeUPuFGJHk|||||start=110}}
+</div>
+</center>
+The algorithm described above is something like this:
+* '''We first need to determine which will be the Root:'''
+:* The attribute (Chest pain, Good blood circulation, Blocked arteries) that determines better whether a Patient has '''Heart Disease''' or not will be chosen as the Root.
+:* To do so, we need to evaluate the three attributes by calculating what is known as '''Impurity''', which is a measure of how bad the Attribute is able to separate (determine) our label attribute (Heart Disease in our case).
+:*There are many ways to measure Impurity, one of the popular ones is to calculate the '''Gini''' impurity.
+:* '''So, let's calculate the Gini impurity for our «Chest pain» attribute:'''
+::* We look at chest pain for all 303 patients in our data:
+[[File:Decision_trees-StatQuest1.png|600px|thumb|center|Taken from https://www.youtube.com/watch?v=7VeUPuFGJHk]]
+::* We calculate the Gini impurity for each branch (True: The patient has chest pain | False: The patient does not have chest pain):
+:::<div style='font-size:15pt'><math>\color{blue}{
+Gini\ Impurity = 1 - \sum_{i=1}^{N} P_{i}^{2}
+}
+</math>
+</div>
+::* For the '''True''' branch:
+:::<math>
+Gini\ Impurity_{True} = 1 - (Probability\ of\ Yes)^{2} + (Probability\ of\ No)^{2}
+</math>
+:::<math>
+Gini\ Impurity_{True} = 1 - \biggl(\frac{105}{105 + 39}\biggl)^2 - \biggl(\frac{39}{105 + 39}\biggl)^2 = 0.395
+</math>
+::* For the '''False''' branch:
+:::<math>
+Gini\ Impurity_{False} = 0.336
+</math>
+::*<div style='font-size:13pt'><math>\color{blue}{
+Total\ Gini\ Impurity = Weighted\ average\ of\ Gini\ impurities\ for\ the\ leaf\ nodes\ (branches)}
+</math></div>
+:::<math>
+Gini\ Impurity_{Chest pain} = \biggl( \frac{144}{144 + 159} \biggl)0.395\ +\ \biggl( \frac{159}{144 + 159} \biggl)0.336 = 0.364
+</math>
+:* '''Good Blood Circulation'''
+::<math>
+Gini\ Impurity_{Good\ Blood\ Circulation} = 0.360
+</math>
+:* '''Blocked Arteries'''
+::<math>
+Gini\ Impurity_{Blocked\ Arteries} = 0.381
+</math>
+[[File:Decision_trees-StatQuest2.png|600px|thumb|center|Taken from https://www.youtube.com/watch?v=7VeUPuFGJHk]]
+* '''Then, we follow the same method to determine the following nodes'''
+* '''A node becomes a leaf when the Gini impurity for a node before is lower than the Gini impurity after using a new attribute'''
+[[File:Decision_trees-StatQuest3.png|600px|thumb|center|Taken from https://www.youtube.com/watch?v=7VeUPuFGJHk]]
+* '''At the end, our DT looks like this:'''
+[[File:Decision_trees-StatQuest4.png|600px|thumb|center|Taken from https://www.youtube.com/watch?v=7VeUPuFGJHk]]
+* '''For numeric data:'''
+: Ex.: Patient weight
+* '''Ranked data and Multiple choice data:'''
+: Ranked data: For example, "Rank my jokes on a scale of 1 to 4".
+: Multiple choice data: For example, "Which color do you like: red, blue or green".
+<br /> <br />
+Decision Trees, Part 2 - Feature Selection and Missing Data: https://www.youtube.com/watch?v=wpNl-JwwplA
+<br />
+Regression Trees: https://www.youtube.com/watch?v=g9c66TUylZ4
+<br />
+====The ID3 algorithm====
+ID3 (Quinlan, 1986) is an early algorithm for learning Decision Trees. The learning of the tree is top-down. The algorithm is greedy, it looks at a single attribute and gain in each step. This may fail when a combination of attributes is needed to improve the purity of a node.
+At each split, the question is "which attribute should be tested next? Which logical test gives us more information?". This is determined by the measures of '''entropy''' and '''information gain'''. These are discussed later.
+A new decision node is then created for each outcome of the test and examples are partitioned according to this value. The process is repeated for each new node until all the examples are classified correctly or there are no attributes left.
+<br />
+====The C5.0 algorithm====
+C5.0 (Quinlan, 1993) is a refinement of C4.5 which in itself improved upon ID3. '''It is the industry standard for producing decision trees'''. It performs well for most problems out-of-the-box. Unlike other machine-learning techniques, it has high explanatory power, it can be understood and explained.
+.
+.
+.
+<br />
+===Example in RapidMiner===
+<div style="text-align: center;">
+{| style="margin: 0 auto;" |
+|[[File:decision_tree_RapidMiner_example-Iris_data3.png|300px|thumb|center|]]
+|[[File:decision_tree_RapidMiner_example-Iris_data2.png|500px|thumb|center|]]
+|-
+|<img style='width:300px' src='https://upload.wikimedia.org/wikipedia/commons/7/78/Petal-sepal.jpg' />
+|[[File:decision_tree_RapidMiner_example-Iris_data1.png|500px|thumb|center|]]
+|}
+[[File:Decision_tree_RapidMiner_example-Iris_data.mp4|800px|thumb|center|]]
+<pdf width="1500" height="600">File:Decision_tree_Example_RapidMiner.pdf</pdf>
+[[File:Decision_tree_Example_RapidMiner.pdf]]
+</div>
+<br />