Introduction to Working with Strings

Recap

Let’s first remind ourselves of some of the methods we’ve already learned such as:

  • .upper()
  • .lower()
  • .count()
  • .split()
instrument = 'Violin'
instrument
'Violin'


instrument.upper()
'VIOLIN'
instrument.lower()
'violin'


instrument.count('i')
2


instrument.split('i')
['V', 'ol', 'n']

Processing String Columns

cycling
Date Name Type Time Distance Comments
0 Sep-10-2019 17:13 Afternoon Ride Ride 2084 12.62 Rain
1 Sep-11-2019 06:52 Morning Ride Ride 2531 13.03 rain
2 Sep-11-2019 17:23 Afternoon Ride Ride 1863 12.52 Wet road but nice whether
... ... ... ... ... ... ...
30 Oct-10-2019 18:10 Afternoon Ride Ride 1841 12.59 Feeling good after a holiday break!
31 Oct-11-2019 07:47 Morning Ride Ride 2463 12.79 Stopped for photo of sunrise
32 Oct-11-2019 18:16 Afternoon Ride Ride 1843 11.79 Bike feeling tight, needs an oil and pump

33 rows × 6 columns

upper_cycle = cycling.assign(Comments = cycling['Comments'].str.upper())
upper_cycle.head()
Date Name Type Time Distance Comments
0 Sep-10-2019 17:13 Afternoon Ride Ride 2084 12.62 RAIN
1 Sep-11-2019 06:52 Morning Ride Ride 2531 13.03 RAIN
2 Sep-11-2019 17:23 Afternoon Ride Ride 1863 12.52 WET ROAD BUT NICE WHETHER
3 Sep-12-2019 07:06 Morning Ride Ride 2192 12.84 STOPPED FOR PHOTO OF SUNRISE
4 Sep-12-2019 17:28 Afternoon Ride Ride 1891 12.48 TIRED BY THE END OF THE WEEK.


rain_cycle = upper_cycle.assign(Rain = upper_cycle['Comments'].str.count('RAIN'))
rain_cycle.head()
Date Name Type Time Distance Comments Rain
0 Sep-10-2019 17:13 Afternoon Ride Ride 2084 12.62 RAIN 1
1 Sep-11-2019 06:52 Morning Ride Ride 2531 13.03 RAIN 1
2 Sep-11-2019 17:23 Afternoon Ride Ride 1863 12.52 WET ROAD BUT NICE WHETHER 0
3 Sep-12-2019 07:06 Morning Ride Ride 2192 12.84 STOPPED FOR PHOTO OF SUNRISE 0
4 Sep-12-2019 17:28 Afternoon Ride Ride 1891 12.48 TIRED BY THE END OF THE WEEK. 0
upper_cycle['Comments'].str.split(expand=True)
0 1 2 3 4 5 6 7
0 RAIN None None None None None None None
1 RAIN None None None None None None None
2 WET ROAD BUT NICE WHETHER None None None
... ... ... ... ... ... ... ... ...
30 FEELING GOOD AFTER A HOLIDAY BREAK! None None
31 STOPPED FOR PHOTO OF SUNRISE None None None
32 BIKE FEELING TIGHT, NEEDS AN OIL AND PUMP

33 rows × 8 columns

"My favorite colour" + "is Blue"
'My favorite colouris Blue'


combined_cycle = cycling.assign(Distance_str = cycling['Distance'].astype('str') + ' km')
combined_cycle.head()
Date Name Type Time Distance Comments Distance_str
0 Sep-10-2019 17:13 Afternoon Ride Ride 2084 12.62 Rain 12.62 km
1 Sep-11-2019 06:52 Morning Ride Ride 2531 13.03 rain 13.03 km
2 Sep-11-2019 17:23 Afternoon Ride Ride 1863 12.52 Wet road but nice whether 12.52 km
3 Sep-12-2019 07:06 Morning Ride Ride 2192 12.84 Stopped for photo of sunrise 12.84 km
4 Sep-12-2019 17:28 Afternoon Ride Ride 1891 12.48 Tired by the end of the week. 12.48 km
upper_cycle.head(3)
Date Name Type Time Distance Comments
0 Sep-10-2019 17:13 Afternoon Ride Ride 2084 12.62 RAIN
1 Sep-11-2019 06:52 Morning Ride Ride 2531 13.03 RAIN
2 Sep-11-2019 17:23 Afternoon Ride Ride 1863 12.52 WET ROAD BUT NICE WHETHER


cap_cycle = upper_cycle.assign(Comments = upper_cycle['Comments'].str.capitalize())
cap_cycle.head(3)
Date Name Type Time Distance Comments
0 Sep-10-2019 17:13 Afternoon Ride Ride 2084 12.62 Rain
1 Sep-11-2019 06:52 Morning Ride Ride 2531 13.03 Rain
2 Sep-11-2019 17:23 Afternoon Ride Ride 1863 12.52 Wet road but nice whether


cap_cycle = upper_cycle.assign(Comments = upper_cycle['Comments'].str.title())
cap_cycle.head(3)
Date Name Type Time Distance Comments
0 Sep-10-2019 17:13 Afternoon Ride Ride 2084 12.62 Rain
1 Sep-11-2019 06:52 Morning Ride Ride 2531 13.03 Rain
2 Sep-11-2019 17:23 Afternoon Ride Ride 1863 12.52 Wet Road But Nice Whether

Strip

.strip() .

"Sunshine" == " Sunshine "
False


string1 = " Sunshine " 
new_string1 = string1.strip()
new_string1
'Sunshine'


"Sunshine" == new_string1
True
cycling.head()
Date Name Type Time Distance Comments
0 Sep-10-2019 17:13 Afternoon Ride Ride 2084 12.62 Rain
1 Sep-11-2019 06:52 Morning Ride Ride 2531 13.03 rain
2 Sep-11-2019 17:23 Afternoon Ride Ride 1863 12.52 Wet road but nice whether
3 Sep-12-2019 07:06 Morning Ride Ride 2192 12.84 Stopped for photo of sunrise
4 Sep-12-2019 17:28 Afternoon Ride Ride 1891 12.48 Tired by the end of the week.


cycling[cycling['Comments'] == 'Rain']
Date Name Type Time Distance Comments
stripped_cycling = cycling.assign(Comments = cycling['Comments'].str.strip())
stripped_cycling.head()
Date Name Type Time Distance Comments
0 Sep-10-2019 17:13 Afternoon Ride Ride 2084 12.62 Rain
1 Sep-11-2019 06:52 Morning Ride Ride 2531 13.03 rain
2 Sep-11-2019 17:23 Afternoon Ride Ride 1863 12.52 Wet road but nice whether
3 Sep-12-2019 07:06 Morning Ride Ride 2192 12.84 Stopped for photo of sunrise
4 Sep-12-2019 17:28 Afternoon Ride Ride 1891 12.48 Tired by the end of the week.


stripped_cycling[stripped_cycling['Comments'] == 'Rain']
Date Name Type Time Distance Comments
0 Sep-10-2019 17:13 Afternoon Ride Ride 2084 12.62 Rain
stripped_cycling.tail(5)
Date Name Type Time Distance Comments
28 Oct-04-2019 18:08 Afternoon Ride Ride 1870 12.63 Very tired, riding into the wind
29 Oct-10-2019 07:55 Morning Ride Ride 2149 12.70 Really cold! But feeling good
30 Oct-10-2019 18:10 Afternoon Ride Ride 1841 12.59 Feeling good after a holiday break!
31 Oct-11-2019 07:47 Morning Ride Ride 2463 12.79 Stopped for photo of sunrise
32 Oct-11-2019 18:16 Afternoon Ride Ride 1843 11.79 Bike feeling tight, needs an oil and pump


stripped_cycling['Comments'].str.strip("!").tail()
28             Very tired, riding into the wind
29                Really cold! But feeling good
30           Feeling good after a holiday break
31                 Stopped for photo of sunrise
32    Bike feeling tight, needs an oil and pump
Name: Comments, dtype: object

Let’s apply what we learned!