Programming in Python for Data Science

String Split

cereal_amended

	name	mfr_type	calories	protein	...	fat	carbo	rating	hot
0	100% Bran	N-Cold	70	4	...	1	5.0	68.402973	False
1	100% Natural Bran	Q-Cold	120	3	...	5	8.0	33.983679	False
2	All-Bran	K-Cold	70	4	...	1	7.0	59.425505	False
...	...	...	...	...	...	...	...	...	...
74	Wheat Chex	R-Cold	100	3	...	1	17.0	49.787445	False
75	Wheaties	G-Cold	100	3	...	1	17.0	51.592193	False
76	Wheaties Honey Gold	G-Cold	110	2	...	1	16.0	36.187559	False

77 rows × 9 columns

cereal_amended.head(5)

	name	mfr_type	calories	protein	...	fat	carbo	rating	hot
0	100% Bran	N-Cold	70	4	...	1	5.0	68.402973	False
1	100% Natural Bran	Q-Cold	120	3	...	5	8.0	33.983679	False
2	All-Bran	K-Cold	70	4	...	1	7.0	59.425505	False
3	All-Bran with Extra Fiber	K-Cold	50	4	...	0	8.0	93.704912	False
4	Almond Delight	R-Cold	110	2	...	2	14.0	34.384843	False

5 rows × 9 columns

new = cereal_amended['mfr_type'].str.split('-', expand=True)
new

	0	1
0	N	Cold
1	Q	Cold
2	K	Cold
...	...	...
74	R	Cold
75	G	Cold
76	G	Cold

77 rows × 2 columns

new.head()

	0	1
0	N	Cold
1	Q	Cold
2	K	Cold
3	K	Cold
4	R	Cold

new = new.rename(columns = {0:'mfr', 1: 'type'})
new.head()

	mfr	type
0	N	Cold
1	Q	Cold
2	K	Cold
3	K	Cold
4	R	Cold

cereal = cereal_amended.assign(mfr=new['mfr'],
                       type=new['type'])
cereal

	name	mfr_type	calories	protein	...	rating	hot	mfr	type
0	100% Bran	N-Cold	70	4	...	68.402973	False	N	Cold
1	100% Natural Bran	Q-Cold	120	3	...	33.983679	False	Q	Cold
2	All-Bran	K-Cold	70	4	...	59.425505	False	K	Cold
...	...	...	...	...	...	...	...	...	...
74	Wheat Chex	R-Cold	100	3	...	49.787445	False	R	Cold
75	Wheaties	G-Cold	100	3	...	51.592193	False	G	Cold
76	Wheaties Honey Gold	G-Cold	110	2	...	36.187559	False	G	Cold

77 rows × 11 columns

new = cereal_amended['mfr_type'].str.split('-', expand=False)
new

0     [N, Cold]
1     [Q, Cold]
2     [K, Cold]
        ...    
74    [R, Cold]
75    [G, Cold]
76    [G, Cold]
Name: mfr_type, Length: 77, dtype: object

type(new)

pandas.core.series.Series

Splitting a Column

String Split

Let’s apply what we learned!