Baselines: Dummy Regression

Building a baseline regression model

Baseline model:

Averge target value: always predicts the mean of the training set.

Data

classification_df = pd.read_csv("data/quiz2-grade-toy-regression.csv")
classification_df.head()
ml_experience class_attendance lab1 lab2 lab3 lab4 quiz1 quiz2
0 1 1 92 93 84 91 92 90
1 1 0 94 90 80 83 91 84
2 0 0 78 85 83 80 80 82
3 0 1 91 94 92 91 89 92
4 0 1 77 83 90 92 85 90

1. Create ๐‘‹ and ๐‘ฆ

๐‘‹ โ†’ Feature vectors
๐‘ฆ โ†’ Target

X = classification_df.drop(columns=["quiz2"])
y = classification_df["quiz2"]

2. Create a regressor object

  • import the appropriate regressor, in this case, DummyRegressor.
  • Create an object of the regressor.
from sklearn.dummy import DummyRegressor

dummy_reg = DummyRegressor(strategy="mean")

3. Fit the regressor

dummy_reg.fit(X, y)

4. Predict the target of given examples

We can predict the mean of examples by calling predict on the classifier object.

single_obs = X.loc[[2]]
single_obs
ml_experience class_attendance lab1 lab2 lab3 lab4 quiz1
2 0 0 78 85 83 80 80


dummy_reg.predict(single_obs)
array([86.28571429])
X
ml_experience class_attendance lab1 lab2 lab3 lab4 quiz1
0 1 1 92 93 84 91 92
1 1 0 94 90 80 83 91
2 0 0 78 85 83 80 80
... ... ... ... ... ... ... ...
4 0 1 77 83 90 92 85
5 1 0 70 73 68 74 71
6 1 0 80 88 89 88 91

7 rows ร— 7 columns


dummy_reg.predict(X)
array([86.28571429, 86.28571429, 86.28571429, 86.28571429, 86.28571429, 86.28571429, 86.28571429])

5. Scoring your model

In the regression setting, .score() gives the R^2 of the model, i.e. the coefficient of determination of the prediction.

print("The accuracy of the model on the training data:", round((dummy_reg.score(X, y)),3))
The accuracy of the model on the training data: 0.0

Letโ€™s apply what we learned!