2.1. Exercises
One-Hot Encoding Questions
One-Hot Encoding - Output
Refer to the dataframe to answer the following question.
name colour location seed shape sweetness water_content weight
0 apple red canada True round True 84 100
1 banana yellow mexico False long True 75 120
2 cantaloupe orange spain True round True 90 1360
3 dragon-fruit magenta china True round False 96 600
4 elderberry purple austria False round True 80 5
5 fig purple turkey False oval False 78 40
6 guava green mexico True oval True 83 450
7 huckleberry blue canada True round True 73 5
8 kiwi brown china True round True 80 76
9 lemon yellow mexico False oval False 83 65
array([[0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 1, 0, 0, 1],
[1, 0, 1, 1, 1, 0, 0, 1, 1, 0]])
array([[0, 0, 1],
[1, 0, 0],
[0, 0, 1],
[0, 0, 1],
[0, 0, 1],
[0, 1, 0],
[0, 1, 0],
[0, 0, 1],
[0, 0, 1],
[0, 1, 0]])
array([[0, 1, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 0],
[0, 0, 0, 0, 1, 0],
[0, 0, 1, 0, 0, 0],
[1, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 1],
[0, 0, 0, 1, 0, 0],
[0, 1, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0],
[0, 0, 0, 1, 0, 0]])
array([[0],
[5],
[0],
[3],
[0],
[0],
[3],
[0],
[5],
[3],
[1],
[4],
[3],
[2]])
One Hot encoding True or False
Encoding - One-Hot Style!
Instructions:
Running a coding exercise for the first time could take a bit of time for everything to load. Be patient, it could take a few minutes.
When you see ____ in a coding exercise, replace it with what you assume to be the correct code. Run it and see if you obtain the desired output. Submit your code to validate if you were correct.
Make sure you remove the hash (#) symbol in the coding portions of this question. We have commented them so that the line wonβt execute and you can test your code after each step.
Last time we ordinal encoded the country column from our basketball dataset but now we know that this isnβt the best option. This time, instead letβs one-hot encode this feature.
Tasks:
- Build a one-hot encoder that uses a
dtypeofintandsparse_output=False. Name itone_hot_encoder. - Fit on
X_column, transform it and save the results in an object namedcountry_encoded.