encoding_view| language | language_enc | |
|---|---|---|
| 0 | English | 0 |
| 1 | Vietnamese | 5 |
| 2 | English | 0 |
| 3 | Mandarin | 3 |
| 4 | English | 0 |
| 5 | English | 0 |
| 6 | Mandarin | 3 |
| 7 | English | 0 |
| 8 | Vietnamese | 5 |
| 9 | Mandarin | 3 |
| 10 | French | 1 |
| 11 | Spanish | 4 |
| 12 | Mandarin | 3 |
| 13 | Hindi | 2 |
[array(['English', 'French', 'Hindi', 'Mandarin', 'Spanish', 'Vietnamese'], dtype=object)]
Ordinal encoding:
One-hot encoding:
from sklearn.preprocessing import OneHotEncoder
ohe = OneHotEncoder(sparse_output=False, dtype='int')
ohe.fit(X_toy);
X_toy_ohe = ohe.transform(X_toy)
X_toy_ohearray([[1, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 1],
[1, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 0],
[1, 0, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 0],
[1, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 1],
[0, 0, 0, 1, 0, 0],
[0, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0],
[0, 0, 0, 1, 0, 0],
[0, 0, 1, 0, 0, 0]])
| language_English | language_French | language_Hindi | language_Mandarin | language_Spanish | language_Vietnamese | |
|---|---|---|---|---|---|---|
| 0 | 1 | 0 | 0 | 0 | 0 | 0 |
| 1 | 0 | 0 | 0 | 0 | 0 | 1 |
| 2 | 1 | 0 | 0 | 0 | 0 | 0 |
| 3 | 0 | 0 | 0 | 1 | 0 | 0 |
| 4 | 1 | 0 | 0 | 0 | 0 | 0 |
| 5 | 1 | 0 | 0 | 0 | 0 | 0 |
| 6 | 0 | 0 | 0 | 1 | 0 | 0 |
| 7 | 1 | 0 | 0 | 0 | 0 | 0 |
| 8 | 0 | 0 | 0 | 0 | 0 | 1 |
| 9 | 0 | 0 | 0 | 1 | 0 | 0 |
| 10 | 0 | 1 | 0 | 0 | 0 | 0 |
| 11 | 0 | 0 | 0 | 0 | 1 | 0 |
| 12 | 0 | 0 | 0 | 1 | 0 | 0 |
| 13 | 0 | 0 | 1 | 0 | 0 | 0 |
| longitude | latitude | housing_median_age | households | ... | ocean_proximity | rooms_per_household | bedrooms_per_household | population_per_household | |
|---|---|---|---|---|---|---|---|---|---|
| 6051 | -117.75 | 34.04 | 22.0 | 602.0 | ... | INLAND | 4.897010 | 1.056478 | 4.318937 |
| 20113 | -119.57 | 37.94 | 17.0 | 20.0 | ... | INLAND | 17.300000 | 6.500000 | 2.550000 |
| 14289 | -117.13 | 32.74 | 46.0 | 708.0 | ... | NEAR OCEAN | 4.738701 | 1.084746 | 2.057910 |
| 13665 | -117.31 | 34.02 | 18.0 | 285.0 | ... | INLAND | 5.733333 | 0.961404 | 3.154386 |
| 14471 | -117.23 | 32.88 | 18.0 | 1458.0 | ... | NEAR OCEAN | 3.817558 | 1.004801 | 4.323045 |
5 rows × 9 columns
ohe = OneHotEncoder(sparse_output=False, dtype="int")
ohe.fit(X_train[["ocean_proximity"]])
X_imp_ohe_train = ohe.transform(X_train[["ocean_proximity"]])
X_imp_ohe_trainarray([[0, 1, 0, 0, 0],
[0, 1, 0, 0, 0],
[0, 0, 0, 0, 1],
...,
[1, 0, 0, 0, 0],
[0, 0, 0, 1, 0],
[0, 1, 0, 0, 0]], shape=(18576, 5))
transformed_ohe = pd.DataFrame(
data=X_imp_ohe_train,
columns=ohe.get_feature_names_out(['ocean_proximity']),
index=X_train.index,
)
transformed_ohe.head()| ocean_proximity_<1H OCEAN | ocean_proximity_INLAND | ocean_proximity_ISLAND | ocean_proximity_NEAR BAY | ocean_proximity_NEAR OCEAN | |
|---|---|---|---|---|---|
| 6051 | 0 | 1 | 0 | 0 | 0 |
| 20113 | 0 | 1 | 0 | 0 | 0 |
| 14289 | 0 | 0 | 0 | 0 | 1 |
| 13665 | 0 | 1 | 0 | 0 | 0 |
| 14471 | 0 | 0 | 0 | 0 | 1 |