Programming in Python for Data Science – Conditional value replacement and assignment

Building on things we know

cereal = pd.read_csv('data/cereal.csv',
                  usecols=['name', 'mfr', 'type', 'calories', 'protein', 'weight', 'rating'])
cereal.head()

	name	mfr	type	calories	protein	weight	rating
0	100% Bran	N	Cold	70	4	1.0	68.402973
1	100% Natural Bran	Q	Cold	120	3	1.0	33.983679
2	All-Bran	K	Cold	70	4	1.0	59.425505
3	All-Bran with Extra Fiber	K	Cold	50	4	1.0	93.704912
4	Almond Delight	R	Cold	110	2	1.0	34.384843

q_cereal = cereal[cereal['mfr'] == 'Q']
q_cereal.assign(mfr = 'Quaker')

	name	mfr	type	calories	protein	weight	rating
1	100% Natural Bran	Quaker	Cold	120	3	1.0	33.983679
10	Cap'n'Crunch	Quaker	Cold	120	1	1.0	18.042851
35	Honey Graham Ohs	Quaker	Cold	120	1	1.0	21.871292
...	...	...	...	...	...	...	...
55	Puffed Wheat	Quaker	Cold	50	2	0.5	63.005645
56	Quaker Oat Squares	Quaker	Cold	100	4	1.0	49.511874
57	Quaker Oatmeal	Quaker	Hot	100	5	1.0	50.828392

8 rows × 7 columns

Building on more things we know

cereal.loc[73]

name            Trix
mfr                G
type            Cold
             ...    
protein            1
weight           1.0
rating     27.753301
Name: 73, Length: 7, dtype: object

cereal.loc[cereal['mfr'] == 'Q']

	name	mfr	type	calories	protein	weight	rating
1	100% Natural Bran	Q	Cold	120	3	1.0	33.983679
10	Cap'n'Crunch	Q	Cold	120	1	1.0	18.042851
35	Honey Graham Ohs	Q	Cold	120	1	1.0	21.871292
...	...	...	...	...	...	...	...
55	Puffed Wheat	Q	Cold	50	2	0.5	63.005645
56	Quaker Oat Squares	Q	Cold	100	4	1.0	49.511874
57	Quaker Oatmeal	Q	Hot	100	5	1.0	50.828392

8 rows × 7 columns

cereal.loc[cereal['mfr'] == 'Q', 'mfr']

1     Q
10    Q
35    Q
     ..
55    Q
56    Q
57    Q
Name: mfr, Length: 8, dtype: object

cereal.loc[cereal['mfr'] == 'Q', 'mfr'] = 'Quaker'

cereal

	name	mfr	type	calories	protein	weight	rating
0	100% Bran	N	Cold	70	4	1.0	68.402973
1	100% Natural Bran	Quaker	Cold	120	3	1.0	33.983679
2	All-Bran	K	Cold	70	4	1.0	59.425505
...	...	...	...	...	...	...	...
74	Wheat Chex	R	Cold	100	3	1.0	49.787445
75	Wheaties	G	Cold	100	3	1.0	51.592193
76	Wheaties Honey Gold	G	Cold	110	2	1.0	36.187559

77 rows × 7 columns

cereal['mfr'] == 'Q'

0     False
1     False
2     False
      ...  
74    False
75    False
76    False
Name: mfr, Length: 77, dtype: bool

cereal.loc[cereal['mfr'] == 'Q']

cereal.loc[cereal['mfr'] == 'Q', 'mfr']

cereal.loc[cereal['mfr'] == 'Q', 'mfr'] = 'Quaker'

cereal[cereal['mfr'] == 'Q', 'mfr'] = 'Quaker'

TypeError: unhashable type: 'Series'

Detailed traceback: 
  File "<string>", line 1, in <module>
  File "/usr/local/lib/python3.12/site-packages/pandas/core/frame.py", line 4311, in __setitem__
    self._set_item(key, value)
  File "/usr/local/lib/python3.12/site-packages/pandas/core/frame.py", line 4527, in _set_item
    key in self.columns
  File "/usr/local/lib/python3.12/site-packages/pandas/core/indexes/base.py", line 5358, in __contains__
    hash(key)

Replacing with inequalities

cereal.loc[cereal['protein'] >= 3, 'protein_level']  = 'high'

cereal.loc[cereal['protein'] < 3, 'protein_level']  = 'low'

cereal

	name	mfr	type	calories	protein	weight	rating	protein_level
0	100% Bran	N	Cold	70	4	1.0	68.402973	high
1	100% Natural Bran	Quaker	Cold	120	3	1.0	33.983679	high
2	All-Bran	K	Cold	70	4	1.0	59.425505	high
...	...	...	...	...	...	...	...	...
74	Wheat Chex	R	Cold	100	3	1.0	49.787445	high
75	Wheaties	G	Cold	100	3	1.0	51.592193	high
76	Wheaties Honey Gold	G	Cold	110	2	1.0	36.187559	low

77 rows × 8 columns

Creating new columns

oz_to_g = 28.3495
cereal['weight_g'] = cereal['weight'] * oz_to_g
cereal

	name	mfr	type	calories	...	weight	rating	protein_level	weight_g
0	100% Bran	N	Cold	70	...	1.0	68.402973	high	28.3495
1	100% Natural Bran	Quaker	Cold	120	...	1.0	33.983679	high	28.3495
2	All-Bran	K	Cold	70	...	1.0	59.425505	high	28.3495
...	...	...	...	...	...	...	...	...	...
74	Wheat Chex	R	Cold	100	...	1.0	49.787445	high	28.3495
75	Wheaties	G	Cold	100	...	1.0	51.592193	high	28.3495
76	Wheaties Honey Gold	G	Cold	110	...	1.0	36.187559	low	28.3495

77 rows × 9 columns

Conditional value replacement and assignment

Building on things we know

Building on more things we know

Replacing with inequalities

Creating new columns

Let’s apply what we learned!