Decision Tree falls under supervised machine learning, as the name suggests it is a tree-like structure that helps us to make decisions based on certain conditions. A decision tree can help us to solve both regression and classification problems.

What is Classification?

Classification is the process of dividing the data into different categories or groups by giving certain labels. For Example; categorize the transaction data based on whether the transaction is Fraud or Genuine. If we take the present epidemic as an example based on the symptoms like fever, cold and cough we categorize the patient as suffering from covid or not.

What is Regression?

Regression is a process to get the predictions which is a continuous value. For example; prediction the weight or predicting the sales or profit of the company etc.
A gentle introduction to the Decision tree:
A decision tree is a graphical representation that helps us to make decisions based on certain conditions.

For Example; making a decision whether to watch a movie or not.

Important terminology in the decision tree:

Node:

A decision tree is made up of several nodes:

1.Root Node: A Root Node represents the entire data and the starting point of the tree. From the above example the
First Node where we are checking the first condition, whether the movie belongs to Hollywood or not that is the
Rood node from which the entire tree grows
2.Leaf Node: A Leaf Node is the end node of the tree, which can’t split into further nodes.
From the above example ‘watch movie’ and ‘Don’t watch ‘are leaf nodes.
3.Parent/Child Nodes: A Node that splits into a further node will be the parent node for the successor nodes. The
nodes which are obtained from the previous node will be child nodes for the above node.

Branches:

Branches are the arrows which is a connection between nodes, it represents a flow from the starting/Root node to the leaf node.

How to select an attribute to create the tree or split the node:
We use criteria to select attribute which helps us to split the data into partitions.

Here are the most important and useful methods to select the node for splitting the data

Information Gain:

In the process of selecting an attribute that gives more information about the data, we select the attribute for splitting further from which we get the highest information gain. For calculating Information gain we use matric
Entropy.

Information from attribute = ∑p(x). Entropy (x)

Here, x represents a class in the attribute
Information Gain for any attribute = total entropy – Information from attribute after splitting
Entropy:
Entropy is used to measure the Impurity and disorder in the dataset

Entropy = – ∑ p(y). log2 p(y)

Here, y represents the class in the target variable

Gini Index:

Gini Index is also called Gini Impurity which calculates the probability of an attribute that is randomly selected.

R code

The dataset that we are looking into:

We are going to build a decision tree model to decide whether to play outside or not.
Now we will build a decision tree on the above dataset.
To build a decision tree, first we have to select an attribute that gives highest information among all the attributes.

Calculating Total Entropy

View(dataset)
## changing the data into factors type 
data = data.frame(lapply(dataset,factor))
summary(data) ## summary of the data 
### Claculating Total Entropy
table(data$Play)
## p(Yes)*log2 p(Yes)-p(No)*log2 p(No)
TotalEntropy= -(9/14)*log2(9/14)-(5/14)*log2(5/14)
TotalEntropy
> View(dataset)
> ## changing the data into factors type 
> data = data.frame(lapply(dataset,factor))
> summary(data) ## summary of the data 
     Outlook  Temperature   Humidity     wind    Play  
 Overcast:4   Cold:4      High  :7   Strong:6   No :5  
 Rainy   :5   Hot :4      Normal:7   Weak  :8   Yes:9  
 Sunny   :5   Mild:6                                   
> ### Claculating Total Entropy
> table(data$Play)

 No Yes 
  5   9 
> ## p(Yes)*log2 p(Yes)-p(No)*log2 p(No)
> TotalEntropy= -(9/14)*log2(9/14)-(5/14)*log2(5/14)
> TotalEntropy
[1] 0.940286

Calculate Entropy for each class in Outlook and Information Gain for the Outlook

table(data$Play)
## p(Yes)*log2 p(Yes)-p(No)*log2 p(No)
TotalEntropy= -(9/14)*log2(9/14)-(5/14)*log2(5/14)
TotalEntropy
## filtering Outlook data to calculate entropy
library(dplyr)
## Calculate Entropy for Outlook
Outlook_Rainy = data.frame(filter(select(data,Outlook,Play),Outlook=='Rainy'))

View(Outlook_Rainy)
Entropy_Rainy = -(3/5)*log2(3/5)-(2/5)*log2(2/5)
Entropy_Rainy

Outlook_Overcast = data.frame(filter(select(data,Outlook,Play),Outlook=='Overcast'))
View(Outlook_Overcast)
Entropy_Overcast=-(4/4)*log2(4/4)-0 ## since we don't have any No values
Entropy_Overcast

Outlook_Sunny = data.frame(filter(select(data,Outlook,Play),Outlook=='Sunny'))
View(Outlook_Sunny)
Entropy_Sunny = -(2/5)*log2(2/5)-(3/5)*log2(3/5)
Entropy_Sunny

# calculating Information for outlook
### Info = summation(p(x)*Entropy(x))

Outlook_Info = ((5/14)*Entropy_Rainy)+((4/14)*Entropy_Overcast)+((5/14)*Entropy_Sunny)
Outlook_Info
## Information gain
## Info_gain = Total Entropy- Outlook_info
Info_gain1 = TotalEntropy - Outlook_Info
Info_gain1
> table(data$Play)

 No Yes 
  5   9 
> ## p(Yes)*log2 p(Yes)-p(No)*log2 p(No)
> TotalEntropy= -(9/14)*log2(9/14)-(5/14)*log2(5/14)
> TotalEntropy
[1] 0.940286
> ## filtering Outlook data to calculate entropy
> library(dplyr)
> ## Calculate Entropy for Outlook
> Outlook_Rainy = data.frame(filter(select(data,Outlook,Play),Outlook=='Rainy'))
> View(Outlook_Rainy)
> Entropy_Rainy = -(3/5)*log2(3/5)-(2/5)*log2(2/5)
> Entropy_Rainy
[1] 0.9709506
> Outlook_Overcast = data.frame(filter(select(data,Outlook,Play),Outlook=='Overcast'))
> View(Outlook_Overcast)
> Entropy_Overcast=-(4/4)*log2(4/4)-0 ## since we don't have any No values
> Entropy_Overcast
[1] 0
> Outlook_Sunny = data.frame(filter(select(data,Outlook,Play),Outlook=='Sunny'))
> View(Outlook_Sunny)
> Entropy_Sunny = -(2/5)*log2(2/5)-(3/5)*log2(3/5)
> Entropy_Sunny
[1] 0.9709506
> Outlook_Info = ((5/14)*Entropy_Rainy)+((4/14)*Entropy_Overcast)+((5/14)*Entropy_Sunny)
> Outlook_Info
[1] 0.6935361
> ## Information gain
> ## Info_gain = Total Entropy- Outlook_info
> Info_gain1 = TotalEntropy - Outlook_Info
> Info_gain1
[1] 0.2467498

Same way calculating entropy and Information gain for all remaining columns

From the Above table Outlook has highest information gain, so the first attribute at the root node is Outlook

From the above diagram, we can observe that Overcast has only the Yes class. So, there is no need for further splitting. But for the Rainy and Sunny Contains both Yes and No. Again the same process repeats.

Till now, we have seen the manual process

Here is the R code for Building a Decision Tree Model using C5.0 function and plot of the Decision Tree.

library(C50)

## Syntax  C5.0(Input_Columns, Target)

model = C5.0(data[,1:4],data$Play)
plot(model)

Resource Article: simple-linear-regression