Splitting a Column

String Split

cereal_amended
name mfr_type calories protein ... fat carbo rating hot
0 100% Bran N-Cold 70 4 ... 1 5.0 68.402973 False
1 100% Natural Bran Q-Cold 120 3 ... 5 8.0 33.983679 False
2 All-Bran K-Cold 70 4 ... 1 7.0 59.425505 False
... ... ... ... ... ... ... ... ... ...
74 Wheat Chex R-Cold 100 3 ... 1 17.0 49.787445 False
75 Wheaties G-Cold 100 3 ... 1 17.0 51.592193 False
76 Wheaties Honey Gold G-Cold 110 2 ... 1 16.0 36.187559 False

77 rows × 9 columns

cereal_amended.head(5)
name mfr_type calories protein ... fat carbo rating hot
0 100% Bran N-Cold 70 4 ... 1 5.0 68.402973 False
1 100% Natural Bran Q-Cold 120 3 ... 5 8.0 33.983679 False
2 All-Bran K-Cold 70 4 ... 1 7.0 59.425505 False
3 All-Bran with Extra Fiber K-Cold 50 4 ... 0 8.0 93.704912 False
4 Almond Delight R-Cold 110 2 ... 2 14.0 34.384843 False

5 rows × 9 columns


new = cereal_amended['mfr_type'].str.split('-', expand=True)
new 
0 1
0 N Cold
1 Q Cold
2 K Cold
... ... ...
74 R Cold
75 G Cold
76 G Cold

77 rows × 2 columns

new.head()
0 1
0 N Cold
1 Q Cold
2 K Cold
3 K Cold
4 R Cold


new = new.rename(columns = {0:'mfr', 1: 'type'})
new.head()
mfr type
0 N Cold
1 Q Cold
2 K Cold
3 K Cold
4 R Cold
cereal = cereal_amended.assign(mfr=new['mfr'],
                       type=new['type'])
cereal
name mfr_type calories protein ... rating hot mfr type
0 100% Bran N-Cold 70 4 ... 68.402973 False N Cold
1 100% Natural Bran Q-Cold 120 3 ... 33.983679 False Q Cold
2 All-Bran K-Cold 70 4 ... 59.425505 False K Cold
... ... ... ... ... ... ... ... ... ...
74 Wheat Chex R-Cold 100 3 ... 49.787445 False R Cold
75 Wheaties G-Cold 100 3 ... 51.592193 False G Cold
76 Wheaties Honey Gold G-Cold 110 2 ... 36.187559 False G Cold

77 rows × 11 columns

new = cereal_amended['mfr_type'].str.split('-', expand=False)
new 
0     [N, Cold]
1     [Q, Cold]
2     [K, Cold]
        ...    
74    [R, Cold]
75    [G, Cold]
76    [G, Cold]
Name: mfr_type, Length: 77, dtype: object


type(new)
pandas.core.series.Series

Let’s apply what we learned!