Decision Tree Regressor

regression_df = pd.read_csv("data/quiz2-grade-toy-regression.csv")
regression_df
ml_experience class_attendance lab1 lab2 lab3 lab4 quiz1 quiz2
0 1 1 92 93 84 91 92 90
1 1 0 94 90 80 83 91 84
2 0 0 78 85 83 80 80 82
... ... ... ... ... ... ... ... ...
4 0 1 77 83 90 92 85 90
5 1 0 70 73 68 74 71 75
6 1 0 80 88 89 88 91 91

7 rows × 8 columns

X = regression_df.drop(columns=["quiz2"])
X.head()
ml_experience class_attendance lab1 lab2 lab3 lab4 quiz1
0 1 1 92 93 84 91 92
1 1 0 94 90 80 83 91
2 0 0 78 85 83 80 80
3 0 1 91 94 92 91 89
4 0 1 77 83 90 92 85


y = regression_df["quiz2"]
y.head()
0    90
1    84
2    82
3    92
4    90
Name: quiz2, dtype: int64
from sklearn.tree import DecisionTreeRegressor

depth = 4
reg_model = DecisionTreeRegressor(max_depth=depth)
reg_model.fit(X, y)
404 image
X.loc[[0]]
ml_experience class_attendance lab1 lab2 lab3 lab4 quiz1
0 1 1 92 93 84 91 92


reg_model.predict(X.loc[[0]])
array([90.])
predicted_grades = reg_model.predict(X)
regression_df = regression_df.assign(predicted_quiz2 = predicted_grades)
print("R^2 score on the training data:" + str(round(reg_model.score(X,y), 3)))
R^2 score on the training data:1.0


regression_df.head()
ml_experience class_attendance lab1 lab2 ... lab4 quiz1 quiz2 predicted_quiz2
0 1 1 92 93 ... 91 92 90 90.0
1 1 0 94 90 ... 83 91 84 84.0
2 0 0 78 85 ... 80 80 82 82.0
3 0 1 91 94 ... 91 89 92 92.0
4 0 1 77 83 ... 92 85 90 90.0

5 rows × 9 columns

Let’s apply what we learned!