Introducing Evaluation Metrics

cc_df = pd.read_csv('data/creditcard.csv.zip', encoding='latin-1')
train_df, test_df = train_test_split(cc_df, test_size=0.3, random_state=111)


train_df.head()
Time V1 V2 V3 ... V27 V28 Amount Class
64454 51150.0 -3.538816 3.481893 -1.827130 ... -0.023636 -0.454966 1.00 0
37906 39163.0 -0.363913 0.853399 1.648195 ... -0.186814 -0.257103 18.49 0
79378 57994.0 1.193021 -0.136714 0.622612 ... -0.036764 0.015039 23.74 0
245686 152859.0 1.604032 -0.808208 -1.594982 ... 0.005387 -0.057296 156.52 0
60943 49575.0 -2.669614 -2.734385 0.662450 ... 0.388023 0.161782 57.50 0

5 rows × 31 columns


train_df.shape
(199364, 31)
train_df.describe(include="all", percentiles = [])
Time V1 V2 V3 ... V27 V28 Amount Class
count 199364.000000 199364.000000 199364.000000 199364.000000 ... 199364.000000 199364.000000 199364.000000 199364.000000
mean 94888.815669 0.000492 -0.000726 0.000927 ... -0.000366 0.000227 88.164679 0.001700
std 47491.435489 1.959870 1.645519 1.505335 ... 0.401541 0.333139 238.925768 0.041201
min 0.000000 -56.407510 -72.715728 -31.813586 ... -22.565679 -11.710896 0.000000 0.000000
50% 84772.500000 0.018854 0.065463 0.179080 ... 0.001239 0.011234 22.000000 0.000000
max 172792.000000 2.451888 22.057729 9.382558 ... 12.152401 33.847808 11898.090000 1.000000

6 rows × 31 columns

X_train_big, y_train_big = train_df.drop(columns=["Class"]), train_df["Class"]
X_test, y_test = test_df.drop(columns=["Class"]), test_df["Class"]


X_train, X_valid, y_train, y_valid = train_test_split(X_train_big, 
                                                      y_train_big, 
                                                      test_size=0.3, 
                                                      random_state=123)

Baseline

from sklearn.dummy import DummyClassifier
from sklearn.model_selection import cross_validate

dummy = DummyClassifier(strategy="most_frequent")
pd.DataFrame(cross_validate(dummy, X_train, y_train, return_train_score=True)).mean()
fit_time       0.010974
score_time     0.000922
test_score     0.998302
train_score    0.998302
dtype: float64


train_df["Class"].value_counts(normalize=True)
Class
0    0.9983
1    0.0017
Name: proportion, dtype: float64
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.tree import DecisionTreeClassifier

pipe = make_pipeline(
       (StandardScaler()),
       (DecisionTreeClassifier(random_state=123))
)


pd.DataFrame(cross_validate(pipe, X_train, y_train, return_train_score=True)).mean()
fit_time       9.942893
score_time     0.005338
test_score     0.999119
train_score    1.000000
dtype: float64

What is “positive” and “negative”?

train_df["Class"].value_counts(normalize=True)
Class
0    0.9983
1    0.0017
Name: proportion, dtype: float64

There are two kinds of binary classification problems:

  • Distinguishing between two classes
  • Spotting a class (fraud transaction, spam, disease)

Confusion Matrix

pipe.fit(X_train, y_train);


from sklearn.metrics import  ConfusionMatrixDisplay
import matplotlib.pyplot as plt

ConfusionMatrixDisplay.from_estimator(pipe, X_valid, y_valid, display_labels=["Non fraud", "Fraud"], values_format="d", cmap="Blues");
plt.show()


X predict negative predict positive
negative example True negative (TN) False positive (FP)
positive example False negative (FN) True positive (TP)
from sklearn.metrics import confusion_matrix


predictions = pipe.predict(X_valid)
confusion_matrix(y_valid, predictions)
array([[59674,    34],
       [   26,    76]])

Let’s apply what we learned!