# Página de pruebas

## Decision Trees

DT is an predictive algorithm that build models in the form of a tree structure, that is composed of a series of branching Boolean tests (tests for which the answer is true or false). The princpel is to use these boolean tests to split the data into smaller and smaller subsets to identify patterns that can be used for prediction. [Noel Cosgrave slides]

In a DT, each internal node represents a "test" on an attribute (e.g. whether a coin flip comes up heads or tails), each branch represents the outcome of the test, and each leaf node represents a class label (decision taken after computing all attributes). The paths from root to leaf are called decision rules/classification rules. https://en.wikipedia.org/wiki/Decision_tree

A dataset can have many possible decision trees

In practice, we want small & accurate trees

• What is the inductive bias in Decision Tree learning?
• Shorter trees are preferred over longer trees: Smaller trees are more general, usually more accurate, and easier to understand by humans (and to communicate!)
• Prefer trees that place high information gain attributes close to the root

• More succinct hypotheses are preferred. Why?
• There are fewer succinct hypotheses than complex ones
• If a succinct hypothesis explains the data this is unlikely to be a coincidence
• However, not every succinct hypothesis is a reasonable one.

Example 1:

A decision tree model can be used to decide whether or not to provide a loan to a customer (wheather or not a custoer is likely to pay a loan)

 ${\displaystyle 16}$ training examples The ${\displaystyle x/y}$ values mean that ${\displaystyle x}$ out of ${\displaystyle y}$ training examples that reach this leaf node has the class of the leaf. This is the confidence The ${\displaystyle x}$ value is the support count ${\displaystyle x/total}$ training examples is the support

Example 2:

A model for predicting the future success of a movie:

Movie Number of celebrities Budget Success (True label) Success (Predicted label)
Movie 1 low hight boxoffice flop boxoffice flop
Movie 2 low low boxoffice flop boxoffice flop
Movie 3 low hight boxoffice flop boxoffice flop
Movie 4 low low boxoffice flop boxoffice flop
Movie 5 low hight boxoffice flop boxoffice flop
Movie 6 low low boxoffice flop boxoffice flop
Movie 7 low hight boxoffice flop boxoffice flop
Movie 8 low low boxoffice flop boxoffice flop
Movie 9 low hight boxoffice flop boxoffice flop
Movie 10 low low boxoffice flop boxoffice flop
Movie 11 hight hight mainstream hit mainstream hit
Movie 12 hight hight mainstream hit mainstream hit
Movie 13 hight hight mainstream hit mainstream hit
Movie 14 low hight mainstream hit boxoffice flop
Movie 15 hight hight mainstream hit mainstream hit
Movie 16 hight hight mainstream hit mainstream hit
Movie 17 hight hight mainstream hit mainstream hit
Movie 18 hight hight mainstream hit mainstream hit
Movie 19 hight hight mainstream hit mainstream hit
Movie 20 hight hight mainstream hit mainstream hit
Movie 21 hight low critical success critical success
Movie 22 hight low critical success critical success
Movie 23 hight low critical success critical success
Movie 24 low hight critical success boxoffice flop
Movie 25 hight low critical success critical success
Movie 26 hight hight critical success mainstream hit
Movie 27 hight low critical success critical success
Movie 28 hight low critical success critical success
Movie 29 hight low critical success critical success
Movie 30 hight low critical success critical success

### The algorithm

#### Basic explanation of the algorithm

This is a basic but nice explanation of the algorithm.

The algorithm described above is something like this:

• We first need to determine which will be the Root:
• The attribute (Chest pain, Good blood circulation, Blocked arteries) that determines better whether a Patient has Heart Disease or not will be chosen as the Root.
• To do so, we need to evaluate the three attributes by calculating what is known as Impurity, which is a measure of how bad the Attribute is able to separate (determine) our label attribute (Heart Disease in our case).
• There are many ways to measure Impurity, one of the popular ones is to calculate the Gini impurity.

• So, let's calculate the Gini impurity for our «Chest pain» attribute:
• We look at chest pain for all 303 patients in our data:

• We calculate the Gini impurity for each branch (True: The patient has chest pain | False: The patient does not have chest pain):
${\displaystyle \color {blue}{Gini\ Impurity=1-\sum _{i=1}^{N}P_{i}^{2}}}$

• For the True branch:
${\displaystyle Gini\ Impurity_{True}=1-(Probability\ of\ Yes)^{2}+(Probability\ of\ No)^{2}}$
${\displaystyle Gini\ Impurity_{True}=1-{\biggl (}{\frac {105}{105+39}}{\biggl )}^{2}-{\biggl (}{\frac {39}{105+39}}{\biggl )}^{2}=0.395}$

• For the False branch:
${\displaystyle Gini\ Impurity_{False}=0.336}$

• ${\displaystyle \color {blue}{Total\ Gini\ Impurity=Weighted\ average\ of\ Gini\ impurities\ for\ the\ leaf\ nodes\ (branches)}}$
${\displaystyle Gini\ Impurity_{Chestpain}={\biggl (}{\frac {144}{144+159}}{\biggl )}0.395\ +\ {\biggl (}{\frac {159}{144+159}}{\biggl )}0.336=0.364}$

• Good Blood Circulation
${\displaystyle Gini\ Impurity_{Good\ Blood\ Circulation}=0.360}$

• Blocked Arteries
${\displaystyle Gini\ Impurity_{Blocked\ Arteries}=0.381}$

• Then, we follow the same method to determine the following nodes

• A node becomes a leaf when the Gini impurity for a node before is lower than the Gini impurity after using a new attribute

• At the end, our DT looks like this:

• For numeric data:
Ex.: Patient weight

• Ranked data and Multiple choice data:
Ranked data: For example, "Rank my jokes on a scale of 1 to 4".
Multiple choice data: For example, "Which color do you like: red, blue or green".

Decision Trees, Part 2 - Feature Selection and Missing Data: https://www.youtube.com/watch?v=wpNl-JwwplA

#### Algorithms addressed in Noel s Lecture

##### The ID3 algorithm

ID3 (Quinlan, 1986) is an early algorithm for learning Decision Trees. The learning of the tree is top-down. The algorithm is greedy, it looks at a single attribute and gain in each step. This may fail when a combination of attributes is needed to improve the purity of a node.

At each split, the question is "which attribute should be tested next? Which logical test gives us more information?". This is determined by the measures of entropy and information gain. These are discussed later.

A new decision node is then created for each outcome of the test and examples are partitioned according to this value. The process is repeated for each new node until all the examples are classified correctly or there are no attributes left.

##### The C5.0 algorithm

C5.0 (Quinlan, 1993) is a refinement of C4.5 which in itself improved upon ID3. It is the industry standard for producing decision trees. It performs well for most problems out-of-the-box. Unlike other machine-learning techniques, it has high explanatory power, it can be understood and explained.

.

.

.

### Example in RapidMiner

This is the guide provided by Noel File:Decision tree Example RapidMiner.pdf