Concatenation

There are 2 different verbs we will use for joining dataframes together:

Concatenation

404 image
candy = pd.read_csv('data/candybars.csv').loc[:4, 'name': 'peanuts']
candy
name weight chocolate peanuts
0 Coffee Crisp 50 1 0
1 Butterfinger 184 1 1
2 Skor 39 1 0
3 Smarties 45 1 0
4 Twix 58 1 0

Horizontal Concatenation

candy
name weight chocolate peanuts
0 Coffee Crisp 50 1 0
1 Butterfinger 184 1 1
2 Skor 39 1 0
3 Smarties 45 1 0
4 Twix 58 1 0


candy2 = pd.read_csv('data/candy_bars2.csv')
candy2
name calories fat sugar
0 Coffee Crisp 260 13 25
1 Butterfinger 798 30 72
2 Skor 209 12 23
3 Smarties 210 6 33
4 Twix 250 12 25
candy_nutrition = pd.concat([candy, candy2], axis=1)
candy_nutrition
name weight chocolate peanuts name calories fat sugar
0 Coffee Crisp 50 1 0 Coffee Crisp 260 13 25
1 Butterfinger 184 1 1 Butterfinger 798 30 72
2 Skor 39 1 0 Skor 209 12 23
3 Smarties 45 1 0 Smarties 210 6 33
4 Twix 58 1 0 Twix 250 12 25


This results in the same 4 rows but now we have 8 columns.

candy_nutrition
name weight chocolate peanuts name calories fat sugar
0 Coffee Crisp 50 1 0 Coffee Crisp 260 13 25
1 Butterfinger 184 1 1 Butterfinger 798 30 72
2 Skor 39 1 0 Skor 209 12 23
3 Smarties 45 1 0 Smarties 210 6 33
4 Twix 58 1 0 Twix 250 12 25


candy_nutrition.loc[:,~candy_nutrition.columns.duplicated()]
name weight chocolate peanuts calories fat sugar
0 Coffee Crisp 50 1 0 260 13 25
1 Butterfinger 184 1 1 798 30 72
2 Skor 39 1 0 209 12 23
3 Smarties 45 1 0 210 6 33
4 Twix 58 1 0 250 12 25
candy_nutrition.columns.duplicated()
array([False, False, False, False,  True, False, False, False])


~candy_nutrition.columns.duplicated()
array([ True,  True,  True,  True, False,  True,  True,  True])
candy_nutrition_cleaned = candy_nutrition.loc[:,~candy_nutrition.columns.duplicated()]
candy_nutrition_cleaned
name weight chocolate peanuts calories fat sugar
0 Coffee Crisp 50 1 0 260 13 25
1 Butterfinger 184 1 1 798 30 72
2 Skor 39 1 0 209 12 23
3 Smarties 45 1 0 210 6 33
4 Twix 58 1 0 250 12 25

Vertical Concatenation

candy
name weight chocolate peanuts
0 Coffee Crisp 50 1 0
1 Butterfinger 184 1 1
2 Skor 39 1 0
3 Smarties 45 1 0
4 Twix 58 1 0


candy_more = pd.read_csv('data/candybars_more.csv', usecols=range(4))
candy_more
name weight chocolate peanuts
0 Kinder Bueno 43 1 0
1 5th Avenue 56 1 1
2 Crunch 44 1 0
large_candybars = pd.concat([candy, candy_more], axis=0)
large_candybars
name weight chocolate peanuts
0 Coffee Crisp 50 1 0
1 Butterfinger 184 1 1
2 Skor 39 1 0
... ... ... ... ...
0 Kinder Bueno 43 1 0
1 5th Avenue 56 1 1
2 Crunch 44 1 0

8 rows × 4 columns

large_candybars
name weight chocolate peanuts
0 Coffee Crisp 50 1 0
1 Butterfinger 184 1 1
2 Skor 39 1 0
... ... ... ... ...
0 Kinder Bueno 43 1 0
1 5th Avenue 56 1 1
2 Crunch 44 1 0

8 rows × 4 columns


large_candybars_cleaned = large_candybars.reset_index(drop=True)
large_candybars_cleaned
name weight chocolate peanuts
0 Coffee Crisp 50 1 0
1 Butterfinger 184 1 1
2 Skor 39 1 0
... ... ... ... ...
5 Kinder Bueno 43 1 0
6 5th Avenue 56 1 1
7 Crunch 44 1 0

8 rows × 4 columns

Be careful of order!

candy
name weight chocolate peanuts
0 Coffee Crisp 50 1 0
1 Butterfinger 184 1 1
2 Skor 39 1 0
3 Smarties 45 1 0
4 Twix 58 1 0


snacksize_candybars = pd.read_csv('data/snacksize_candybars.csv')
snacksize_candybars
name calories fat sugar
0 Butterfinger 798 30 72
1 Skor 209 12 23
2 Twix 250 12 25
3 Coffee Crisp 260 13 25
4 Smarties 210 6 33
pd.concat([candy, snacksize_candybars], axis=1)
name weight chocolate peanuts name calories fat sugar
0 Coffee Crisp 50 1 0 Butterfinger 798 30 72
1 Butterfinger 184 1 1 Skor 209 12 23
2 Skor 39 1 0 Twix 250 12 25
3 Smarties 45 1 0 Coffee Crisp 260 13 25
4 Twix 58 1 0 Smarties 210 6 33

Remember…

404 image

Let’s apply what we learned!