Introducing Decision Trees

Improving the baseline model

Examples:

404 image

A program for prediction using a set of rules with if/else statements

404 image
  • How about a rule-based algorithm with several if/else statements?
if class attendance == 1 and quiz1 == 1:
    quiz2 == "A+"
elif class attendance == 1 and lab3 == 1 and lab4 == 1:
    quiz2 == "A+"
...
classification_df = pd.read_csv("data/quiz2-grade-toy-classification.csv")
classification_df.head(3)
ml_experience class_attendance lab1 lab2 lab3 lab4 quiz1 quiz2
0 1 1 92 93 84 91 92 A+
1 1 0 94 90 80 83 91 not A+
2 0 0 78 85 83 80 80 not A+


X = classification_df.drop(columns=["quiz2"])
X.head(3)
ml_experience class_attendance lab1 lab2 lab3 lab4 quiz1
0 1 1 92 93 84 91 92
1 1 0 94 90 80 83 91
2 0 0 78 85 83 80 80


y = classification_df["quiz2"]
y.head(3)
0        A+
1    not A+
2    not A+
Name: quiz2, dtype: object
X_binary = X.copy()
columns = ["lab1", "lab2", "lab3", "lab4", "quiz1"]
for col in columns:
    X_binary[col] = X_binary[col].apply(
        lambda x: 1 if x >= 90 else 0)
X_binary.head()    
ml_experience class_attendance lab1 lab2 lab3 lab4 quiz1
0 1 1 1 1 0 1 1
1 1 0 1 1 0 0 1
2 0 0 0 0 0 0 0
3 0 1 1 1 1 1 0
4 0 1 0 0 1 1 0

Decision trees

404 image

Decision trees terminology

404 image
404 image

Decision Stump

404 image

Let’s apply what we learned!