Decision Tree Classifiers

from sklearn.tree import DecisionTreeClassifier

model = DecisionTreeClassifier()
new_example
ml_experience class_attendance lab1 lab2 lab3 lab4 quiz1
0 1 0 1 1 0 0 0


model.predict(new_example)
NotFittedError: This DecisionTreeClassifier instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator.

Detailed traceback: 
  File "<string>", line 1, in <module>
  File "/usr/local/lib/python3.12/site-packages/sklearn/tree/_classes.py", line 529, in predict
    check_is_fitted(self)
  File "/usr/local/lib/python3.12/site-packages/sklearn/utils/validation.py", line 1754, in check_is_fitted
    raise NotFittedError(msg % {"name": type(estimator).__name__})
X_binary.head()
ml_experience class_attendance lab1 lab2 lab3 lab4 quiz1
0 1 1 1 1 0 1 1
1 1 0 1 1 0 0 1
2 0 0 0 0 0 0 0
3 0 1 1 1 1 1 0
4 0 1 0 0 1 1 0


y.head()
0        A+
1    not A+
2    not A+
3        A+
4        A+
Name: quiz2, dtype: object


model.fit(X_binary, y);
404 image
new_example
ml_experience class_attendance lab1 lab2 lab3 lab4 quiz1
0 1 0 1 1 0 0 0


404 image
(model.predict(new_example)[0])
'not A+'


model.score(X_binary, y)
0.9047619047619048

How does predict work?

observation
ml_experience class_attendance lab1 lab2 lab3 lab4 quiz1
0 1 0 1 1 0 1 1
404 image

How does fit work

  • Which features are most useful for classification?
  • Minimize impurity at each question/node
  • Common criteria to minimize impurity
    • Gini Index
    • Information gain
    • Cross entropy

Let’s apply what we learned!