Precision, Recall and F1 Score

Accuracy is only part of the story…

from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.tree import DecisionTreeClassifier

pipe_tree = make_pipeline(
    (StandardScaler()),
    (DecisionTreeClassifier(random_state=123))
)


from sklearn.model_selection import cross_validate
pd.DataFrame(cross_validate(pipe_tree, X_train, y_train, return_train_score=True)).mean()
fit_time       9.988710
score_time     0.005139
test_score     0.999119
train_score    1.000000
dtype: float64


y_train.value_counts(normalize=True)
Class
0    0.998302
1    0.001698
Name: proportion, dtype: float64
from sklearn.metrics import confusion_matrix

pipe_tree.fit(X_train,y_train);
predictions = pipe_tree.predict(X_valid)
confusion_matrix(y_valid, predictions)
array([[59674,    34],
       [   26,    76]])


TN, FP, FN, TP = confusion_matrix(y_valid, predictions).ravel()

Recall

Among all positive examples, how many did you identify?

404 image

404 image


confusion_matrix(y_valid, predictions)
array([[59674,    34],
       [   26,    76]])


TN, FP, FN, TP = confusion_matrix(y_valid, predictions).ravel()


recall = TP / (TP + FN)
recall.round(4)
np.float64(0.7451)

Precision

Among the positive examples you identified, how many were actually positive?

404 image

404 image


confusion_matrix(y_valid, predictions)
array([[59674,    34],
       [   26,    76]])


TN, FP, FN, TP = confusion_matrix(y_valid, predictions).ravel()


precision = TP / (TP + FP)
precision.round(4)
np.float64(0.6909)

f1

f1-score combines precision and recall to give one score.

404 image

404 image


precision
np.float64(0.6909090909090909)


recall
np.float64(0.7450980392156863)


f1_score = (2 * precision * recall) / (precision + recall)
f1_score
np.float64(0.7169811320754716)

Calculate evaluation metrics by ourselves and with sklearn

data = {}
data["accuracy"] = [(TP + TN) / (TN + FP + FN + TP)]
data["error"] = [(FP + FN) / (TN + FP + FN + TP)]
data["precision"] = [ TP / (TP + FP)] 
data["recall"] = [TP / (TP + FN)] 
data["f1 score"] = [(2 * precision * recall) / (precision + recall)] 
measures_df = pd.DataFrame(data, index=['ourselves'])
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score


pred_cv =  pipe_tree.predict(X_valid) 

data["accuracy"].append(accuracy_score(y_valid, pred_cv))
data["error"].append(1 - accuracy_score(y_valid, pred_cv))
data["precision"].append(precision_score(y_valid, pred_cv, zero_division=1))
data["recall"].append(recall_score(y_valid, pred_cv))
data["f1 score"].append(f1_score(y_valid, pred_cv))

pd.DataFrame(data, index=['ourselves', 'sklearn'])
accuracy error precision recall f1 score
ourselves 0.998997 0.001003 0.690909 0.745098 0.716981
sklearn 0.998997 0.001003 0.690909 0.745098 0.716981

Classification report

from sklearn.metrics import classification_report


pipe_tree.classes_
array([0, 1])


print(classification_report(y_valid, pipe_tree.predict(X_valid),
        target_names=["non-fraud", "fraud"]))
              precision    recall  f1-score   support

   non-fraud       1.00      1.00      1.00     59708
       fraud       0.69      0.75      0.72       102

    accuracy                           1.00     59810
   macro avg       0.85      0.87      0.86     59810
weighted avg       1.00      1.00      1.00     59810
404 image

See here for full size.

Let’s apply what we learned!