Column Arithmetic and Creation

cereal = pd.read_csv('data/cereal.csv')
cereal.head()
name mfr type calories ... shelf weight cups rating
0 100% Bran N Cold 70 ... 3 1.0 0.33 68.402973
1 100% Natural Bran Q Cold 120 ... 3 1.0 1.00 33.983679
2 All-Bran K Cold 70 ... 3 1.0 0.33 59.425505
3 All-Bran with Extra Fiber K Cold 50 ... 3 1.0 0.50 93.704912
4 Almond Delight R Cold 110 ... 3 1.0 0.75 34.384843

5 rows × 16 columns

Attribution:
80 Cereals” (c) by Chris Crawford is licensed under Creative Commons Attribution-ShareAlike 3.0 Unported

cereal= cereal.iloc[:5]
cereal
name mfr type calories ... shelf weight cups rating
0 100% Bran N Cold 70 ... 3 1.0 0.33 68.402973
1 100% Natural Bran Q Cold 120 ... 3 1.0 1.00 33.983679
2 All-Bran K Cold 70 ... 3 1.0 0.33 59.425505
3 All-Bran with Extra Fiber K Cold 50 ... 3 1.0 0.50 93.704912
4 Almond Delight R Cold 110 ... 3 1.0 0.75 34.384843

5 rows × 16 columns

404 image
cereal['fat']
0    1
1    5
2    1
3    0
4    2
Name: fat, dtype: int64


Is transformed to this:

cereal['fat'] * 1000
0    1000
1    5000
2    1000
3       0
4    2000
Name: fat, dtype: int64
cereal['rating'] 
0    68.402973
1    33.983679
2    59.425505
3    93.704912
4    34.384843
Name: rating, dtype: float64


cereal['rating'] / 10
0    6.840297
1    3.398368
2    5.942551
3    9.370491
4    3.438484
Name: rating, dtype: float64
404 image
404 image


cereal['sugars'] / cereal['cups']
0    18.181818
1     8.000000
2    15.151515
3     0.000000
4    10.666667
dtype: float64
cereal[['sugars']] / cereal[['cups']]
cups sugars
0 NaN NaN
1 NaN NaN
2 NaN NaN
3 NaN NaN
4 NaN NaN
cereal = pd.read_csv('data/cereal.csv', usecols=['name', 'mfr','type', 'fat', 'sugars', 'weight', 'cups','rating'])
cereal
name mfr type fat sugars weight cups rating
0 100% Bran N Cold 1 6 1.0 0.33 68.402973
1 100% Natural Bran Q Cold 5 8 1.0 1.00 33.983679
2 All-Bran K Cold 1 5 1.0 0.33 59.425505
... ... ... ... ... ... ... ... ...
74 Wheat Chex R Cold 1 3 1.0 0.67 49.787445
75 Wheaties G Cold 1 3 1.0 1.00 51.592193
76 Wheaties Honey Gold G Cold 1 8 1.0 0.75 36.187559

77 rows × 8 columns

Column Creation

oz_to_g = 28.3495
cereal['weight'] * oz_to_g
0     28.3495
1     28.3495
2     28.3495
       ...   
74    28.3495
75    28.3495
76    28.3495
Name: weight, Length: 77, dtype: float64


cereal = cereal.assign(weight_g=cereal['weight'] * oz_to_g)
cereal.head()
name mfr type fat ... weight cups rating weight_g
0 100% Bran N Cold 1 ... 1.0 0.33 68.402973 28.3495
1 100% Natural Bran Q Cold 5 ... 1.0 1.00 33.983679 28.3495
2 All-Bran K Cold 1 ... 1.0 0.33 59.425505 28.3495
3 All-Bran with Extra Fiber K Cold 0 ... 1.0 0.50 93.704912 28.3495
4 Almond Delight R Cold 2 ... 1.0 0.75 34.384843 28.3495

5 rows × 9 columns

cereal['sugars'] / cereal['cups']
0     18.181818
1      8.000000
2     15.151515
        ...    
74     4.477612
75     3.000000
76    10.666667
Length: 77, dtype: float64


cereal = cereal.assign(sugar_per_cup=cereal['sugars'] / cereal['cups'])
cereal.head()
name mfr type fat ... cups rating weight_g sugar_per_cup
0 100% Bran N Cold 1 ... 0.33 68.402973 28.3495 18.181818
1 100% Natural Bran Q Cold 5 ... 1.00 33.983679 28.3495 8.000000
2 All-Bran K Cold 1 ... 0.33 59.425505 28.3495 15.151515
3 All-Bran with Extra Fiber K Cold 0 ... 0.50 93.704912 28.3495 0.000000
4 Almond Delight R Cold 2 ... 0.75 34.384843 28.3495 10.666667

5 rows × 10 columns

Let’s apply what we learned!