Conditional value replacement and assignment

Building on things we know

cereal = pd.read_csv('data/cereal.csv',
                  usecols=['name', 'mfr', 'type', 'calories', 'protein', 'weight', 'rating'])
cereal.head()
name mfr type calories protein weight rating
0 100% Bran N Cold 70 4 1.0 68.402973
1 100% Natural Bran Q Cold 120 3 1.0 33.983679
2 All-Bran K Cold 70 4 1.0 59.425505
3 All-Bran with Extra Fiber K Cold 50 4 1.0 93.704912
4 Almond Delight R Cold 110 2 1.0 34.384843
q_cereal = cereal[cereal['mfr'] == 'Q']
q_cereal.assign(mfr = 'Quaker')
name mfr type calories protein weight rating
1 100% Natural Bran Quaker Cold 120 3 1.0 33.983679
10 Cap'n'Crunch Quaker Cold 120 1 1.0 18.042851
35 Honey Graham Ohs Quaker Cold 120 1 1.0 21.871292
... ... ... ... ... ... ... ...
55 Puffed Wheat Quaker Cold 50 2 0.5 63.005645
56 Quaker Oat Squares Quaker Cold 100 4 1.0 49.511874
57 Quaker Oatmeal Quaker Hot 100 5 1.0 50.828392

8 rows × 7 columns

Building on more things we know

cereal.loc[73] 
name            Trix
mfr                G
type            Cold
             ...    
protein            1
weight           1.0
rating     27.753301
Name: 73, Length: 7, dtype: object


cereal.loc[cereal['mfr'] == 'Q']
name mfr type calories protein weight rating
1 100% Natural Bran Q Cold 120 3 1.0 33.983679
10 Cap'n'Crunch Q Cold 120 1 1.0 18.042851
35 Honey Graham Ohs Q Cold 120 1 1.0 21.871292
... ... ... ... ... ... ... ...
55 Puffed Wheat Q Cold 50 2 0.5 63.005645
56 Quaker Oat Squares Q Cold 100 4 1.0 49.511874
57 Quaker Oatmeal Q Hot 100 5 1.0 50.828392

8 rows × 7 columns

cereal.loc[cereal['mfr'] == 'Q', 'mfr']
1     Q
10    Q
35    Q
     ..
55    Q
56    Q
57    Q
Name: mfr, Length: 8, dtype: object


cereal.loc[cereal['mfr'] == 'Q', 'mfr'] = 'Quaker'
cereal
name mfr type calories protein weight rating
0 100% Bran N Cold 70 4 1.0 68.402973
1 100% Natural Bran Quaker Cold 120 3 1.0 33.983679
2 All-Bran K Cold 70 4 1.0 59.425505
... ... ... ... ... ... ... ...
74 Wheat Chex R Cold 100 3 1.0 49.787445
75 Wheaties G Cold 100 3 1.0 51.592193
76 Wheaties Honey Gold G Cold 110 2 1.0 36.187559

77 rows × 7 columns

cereal['mfr'] == 'Q'
0     False
1     False
2     False
      ...  
74    False
75    False
76    False
Name: mfr, Length: 77, dtype: bool
404 image
cereal.loc[cereal['mfr'] == 'Q']


cereal.loc[cereal['mfr'] == 'Q', 'mfr'] 


cereal.loc[cereal['mfr'] == 'Q', 'mfr'] = 'Quaker'
cereal[cereal['mfr'] == 'Q', 'mfr'] = 'Quaker'
TypeError: unhashable type: 'Series'

Detailed traceback: 
  File "<string>", line 1, in <module>
  File "/usr/local/lib/python3.12/site-packages/pandas/core/frame.py", line 4311, in __setitem__
    self._set_item(key, value)
  File "/usr/local/lib/python3.12/site-packages/pandas/core/frame.py", line 4527, in _set_item
    key in self.columns
  File "/usr/local/lib/python3.12/site-packages/pandas/core/indexes/base.py", line 5358, in __contains__
    hash(key)

Replacing with inequalities

cereal.loc[cereal['protein'] >= 3, 'protein_level']  = 'high' 


cereal.loc[cereal['protein'] < 3, 'protein_level']  = 'low' 
cereal
name mfr type calories protein weight rating protein_level
0 100% Bran N Cold 70 4 1.0 68.402973 high
1 100% Natural Bran Quaker Cold 120 3 1.0 33.983679 high
2 All-Bran K Cold 70 4 1.0 59.425505 high
... ... ... ... ... ... ... ... ...
74 Wheat Chex R Cold 100 3 1.0 49.787445 high
75 Wheaties G Cold 100 3 1.0 51.592193 high
76 Wheaties Honey Gold G Cold 110 2 1.0 36.187559 low

77 rows × 8 columns

Creating new columns

oz_to_g = 28.3495
cereal['weight_g'] = cereal['weight'] * oz_to_g
cereal
name mfr type calories ... weight rating protein_level weight_g
0 100% Bran N Cold 70 ... 1.0 68.402973 high 28.3495
1 100% Natural Bran Quaker Cold 120 ... 1.0 33.983679 high 28.3495
2 All-Bran K Cold 70 ... 1.0 59.425505 high 28.3495
... ... ... ... ... ... ... ... ... ...
74 Wheat Chex R Cold 100 ... 1.0 49.787445 high 28.3495
75 Wheaties G Cold 100 ... 1.0 51.592193 high 28.3495
76 Wheaties Honey Gold G Cold 110 ... 1.0 36.187559 low 28.3495

77 rows × 9 columns

Let’s apply what we learned!