Reshaping with Pivot Table

Pivot Table


404 image



404 image
cereal_long
name mfr nutrition value
0 Special K K calories 110
1 Special K K protein 6
2 Apple Jacks K calories 110
... ... ... ... ...
7 Cheerios G protein 6
8 Wheaties G calories 100
9 Wheaties G protein 3

10 rows × 4 columns


cereal_wider = cereal_long.pivot_table(index=['name', 'mfr'], columns='nutrition', values='value')
cereal_wider
nutrition calories protein
name mfr
Apple Jacks K 110.0 2.0
Cheerios G 110.0 6.0
Raisin Bran K 120.0 3.0
Special K K 110.0 6.0
Wheaties G 100.0 3.0
cereal_wider = cereal_long.pivot_table(index=['name','mfr'], columns='nutrition', values='value')
cereal_wider
nutrition calories protein
name mfr
Apple Jacks K 110.0 2.0
Cheerios G 110.0 6.0
Raisin Bran K 120.0 3.0
Special K K 110.0 6.0
Wheaties G 100.0 3.0


cereal_wider.reset_index().rename_axis('', axis='columns')
name mfr calories protein
0 Apple Jacks K 110.0 2.0
1 Cheerios G 110.0 6.0
2 Raisin Bran K 120.0 3.0
3 Special K K 110.0 6.0
4 Wheaties G 100.0 3.0



404 image

Why use pivot at all then?


404 image
cereal_problem
name mfr nutrition value
0 Special K K calories 100
1 Special K K calories 130
2 Special K K protein 6
3 Apple Jacks K calories 110
4 Apple Jacks K protein 2
cereal_problem.pivot(index='name', columns='nutrition', values='value')
ValueError: Index contains duplicate entries, cannot reshape

Detailed traceback: 
  File "<string>", line 1, in <module>
  File "/usr/local/lib/python3.7/site-packages/pandas/core/frame.py", line 5923, in pivot
    return pivot(self, index=index, columns=columns, values=values)
  File "/usr/local/lib/python3.7/site-packages/pandas/core/reshape/pivot.py", line 450, in pivot
    return indexed.unstack(columns)
  File "/usr/local/lib/python3.7/site-packages/pandas/core/series.py", line 3550, in unstack
    return unstack(self, level, fill_value)
  File "/usr/local/lib/python3.7/site-packages/pandas/core/reshape/reshape.py", line 179, in _make_selectors
    raise ValueError("Index contains duplicate entries, cannot reshape")



404 image


Attribution: Nikolay Grozev, Reshaping in Pandas - Pivot, Pivot-Table, Stack, and Unstack explained with Pictures

cereal_problem.pivot_table(index=['name', 'mfr'], columns='nutrition', values='value')
nutrition calories protein
name mfr
Apple Jacks K 110.0 2.0
Special K K 115.0 6.0



404 image
cereal_problem.duplicated(subset=['name', 'nutrition'], keep=False)
0     True
1     True
2    False
3    False
4    False
dtype: bool


cereal_problem.duplicated(subset=['name', 'nutrition'])
0    False
1     True
2    False
3    False
4    False
dtype: bool
cereal_problem.duplicated(subset=['name', 'nutrition'], keep=False)
0     True
1     True
2    False
3    False
4    False
dtype: bool


duplicate_info =cereal_problem.duplicated(subset=['name', 'nutrition'], keep=False)
cereal_problem[duplicate_info]
name mfr nutrition value
0 Special K K calories 100
1 Special K K calories 130
cereal_problem
name mfr nutrition value
0 Special K K calories 100
1 Special K K calories 130
2 Special K K protein 6
3 Apple Jacks K calories 110
4 Apple Jacks K protein 2


cereal_no_problem = cereal_problem.drop(axis=0, index=1)
cereal_no_problem
name mfr nutrition value
0 Special K K calories 100
2 Special K K protein 6
3 Apple Jacks K calories 110
4 Apple Jacks K protein 2

Let’s apply what we learned!