Coefficients and coef_

Intuition behind linear regression

listing number Number of Bedrooms Number of Bathrooms Square Footage Age Price
1 5 6 3000 2 $6.39 million
2 1 1 800 90 $1.67 million
3 3 2 1875 66 $3.92 million
404 image
404 image


Consider the following listing (example):

listing number Number of Bedrooms Number of Bathrooms Square Footage Age
3 3 2 1875 66

  


predicted(price) = coefficientbedrooms x #bedrooms + coefficientbathrooms x #bathrooms + coefficientsqfeet x #sqfeet + coefficientage x age + intercept

predicted(price) = 0.03 x #bedrooms + 0.04 x #bathrooms + 0.002 x #sqfeet + -0.01 x #age + intercept

predicted(price) = (0.03 x 3) + (0.04 x 2) + (0.002 x 1875) + (-0.01 x 66) + 0

predicted(price) = 3.26

Components of a linear model




predicted(price) = (coefficientbedrooms x #bedrooms) + (coefficientbathrooms x #bathrooms) + (coefficientsqfeet x #sqfeet) + (coefficientage x age) + intercept

  • Input features
  • Coefficients, one per feature
  • Bias or intercept
housing_df = pd.read_csv("data/real_estate.csv")
train_df, test_df = train_test_split(housing_df, test_size=0.1, random_state=1)
train_df.head()
house_age distance_station num_stores latitude longitude price
172 6.6 90.45606 9 24.97433 121.54310 58.1
230 4.0 2147.37600 3 24.96299 121.51284 33.4
346 13.2 1712.63200 2 24.96412 121.51670 30.8
244 4.8 1559.82700 3 24.97213 121.51627 21.7
367 15.0 1828.31900 2 24.96464 121.51531 20.9


X_train, y_train = train_df.drop(columns =['price']), train_df['price']
X_test, y_test = test_df.drop(columns =['price']), test_df['price']
from sklearn.linear_model import Ridge

lm = Ridge()
lm.fit(X_train, y_train);
training_score = lm.score(X_train, y_train)
training_score
0.5170145681350129


lm.coef_
array([-2.43214368e-01, -5.33723544e-03,  1.25878207e+00,  8.92353624e+00, -1.34523313e+00])
ridge_coeffs = lm.coef_
ridge_coeffs
array([-2.43214368e-01, -5.33723544e-03,  1.25878207e+00,  8.92353624e+00, -1.34523313e+00])


words_coeffs_df = pd.DataFrame(data=ridge_coeffs, index=X_train.columns, columns=['Coefficients'])
words_coeffs_df
Coefficients
house_age -0.243214
distance_station -0.005337
num_stores 1.258782
latitude 8.923536
longitude -1.345233
words_coeffs_df.abs().sort_values(by='Coefficients')
Coefficients
distance_station 0.005337
house_age 0.243214
num_stores 1.258782
longitude 1.345233
latitude 8.923536

Interpreting learned coefficients


In linear models:

  • if the coefficient is +, then ↑ the feature values ↑ the prediction value.
  • if the coefficient is -, then ↑ the feature values ↓ the prediction value.
  • if the coefficient is 0, the feature is not used in making a prediction.

Predicting

X_train.iloc[0:1]
house_age distance_station num_stores latitude longitude
172 6.6 90.45606 9 24.97433 121.5431


lm.predict(X_train.iloc[0:1])
array([52.35605528])
words_coeffs_df.T
house_age distance_station num_stores latitude longitude
Coefficients -0.243214 -0.005337 1.258782 8.923536 -1.345233


X_train.iloc[0:1]
house_age distance_station num_stores latitude longitude
172 6.6 90.45606 9 24.97433 121.5431


intercept = lm.intercept_
intercept
np.float64(-16.24051672028149)
predicted(price) = coefficienthouse_age x house_age + coefficientdistance_station x distance_station + coefficientnum_stores x num_stores + coefficientlatitude x latitude + coefficientlongitude x longitude + intercept


(ridge_coeffs * X_train.iloc[0:1]).sum(axis=1) + intercept 
172    52.356055
dtype: float64


lm.predict(X_train.iloc[0:1])
array([52.35605528])

Let’s apply what we learned!