More Advanced String Processing

Replace

cycling = pd.read_csv('data/cycling_data.csv')
cycling.head(10)
Date Name Type Time Distance Comments
0 Sep-10-2019 17:13 Afternoon Ride Ride 2084 12.62 Rain
1 Sep-11-2019 06:52 Morning Ride Ride 2531 13.03 rain
2 Sep-11-2019 17:23 Afternoon Ride Ride 1863 12.52 Wet road but nice whether
... ... ... ... ... ... ...
7 Sep-18-2019 06:43 Morning Ride Ride 2285 12.60 Raining
8 Sep-19-2019 06:49 Morning Ride Ride 2903 14.57 Thankfully not raining today!
9 Sep-18-2019 17:15 Afternoon Ride Ride 2101 12.48 Pumped up tires

10 rows × 6 columns

cycling_lower = cycling.assign(Comments = cycling['Comments'].str.lower())
cycling_lower.head(9)
Date Name Type Time Distance Comments
0 Sep-10-2019 17:13 Afternoon Ride Ride 2084 12.62 rain
1 Sep-11-2019 06:52 Morning Ride Ride 2531 13.03 rain
2 Sep-11-2019 17:23 Afternoon Ride Ride 1863 12.52 wet road but nice whether
... ... ... ... ... ... ...
6 Sep-17-2019 17:15 Afternoon Ride Ride 1973 12.45 legs feeling strong!
7 Sep-18-2019 06:43 Morning Ride Ride 2285 12.60 raining
8 Sep-19-2019 06:49 Morning Ride Ride 2903 14.57 thankfully not raining today!

9 rows × 6 columns

Date Name Type Time Distance Comments
0 Sep-10-2019 17:13 Afternoon Ride Ride 2084 12.62 rain
1 Sep-11-2019 06:52 Morning Ride Ride 2531 13.03 rain
2 Sep-11-2019 17:23 Afternoon Ride Ride 1863 12.52 wet road but nice whether
3 Sep-12-2019 07:06 Morning Ride Ride 2192 12.84 stopped for photo of sunrise
4 Sep-12-2019 17:28 Afternoon Ride Ride 1891 12.48 tired by the end of the week.


cycling_rain = cycling_lower.assign(Comments = cycling_lower['Comments'].str.replace('whether', 'weather'))
cycling_rain.head()
Date Name Type Time Distance Comments
0 Sep-10-2019 17:13 Afternoon Ride Ride 2084 12.62 rain
1 Sep-11-2019 06:52 Morning Ride Ride 2531 13.03 rain
2 Sep-11-2019 17:23 Afternoon Ride Ride 1863 12.52 wet road but nice weather
3 Sep-12-2019 07:06 Morning Ride Ride 2192 12.84 stopped for photo of sunrise
4 Sep-12-2019 17:28 Afternoon Ride Ride 1891 12.48 tired by the end of the week.

Contains

cycling_lower['Comments'].str.contains('rain')
0      True
1      True
2     False
      ...  
30    False
31    False
32    False
Name: Comments, Length: 33, dtype: bool


cycling_lower[cycling_lower['Comments'].str.contains('rain')]
Date Name Type Time Distance Comments
0 Sep-10-2019 17:13 Afternoon Ride Ride 2084 12.62 rain
1 Sep-11-2019 06:52 Morning Ride Ride 2531 13.03 rain
7 Sep-18-2019 06:43 Morning Ride Ride 2285 12.60 raining
8 Sep-19-2019 06:49 Morning Ride Ride 2903 14.57 thankfully not raining today!
18 Sep-26-2019 17:13 Afternoon Ride Ride 1860 12.52 raining


cycling_lower.loc[cycling_lower['Comments'].str.contains('rain'), 'Comments'] = 'rained'

The rows originally filtered with “rain” in the dataset:

Date Name Type Time Distance Comments
0 Sep-10-2019 17:13 Afternoon Ride Ride 2084 12.62 rain
1 Sep-11-2019 06:52 Morning Ride Ride 2531 13.03 rain
7 Sep-18-2019 06:43 Morning Ride Ride 2285 12.60 raining
8 Sep-19-2019 06:49 Morning Ride Ride 2903 14.57 thankfully not raining today!
18 Sep-26-2019 17:13 Afternoon Ride Ride 1860 12.52 raining


Have now been been been replaced with “rained” in the Comments column:

cycling_lower[cycling_lower['Comments'] == 'rained']
Date Name Type Time Distance Comments
0 Sep-10-2019 17:13 Afternoon Ride Ride 2084 12.62 rained
1 Sep-11-2019 06:52 Morning Ride Ride 2531 13.03 rained
7 Sep-18-2019 06:43 Morning Ride Ride 2285 12.60 rained
8 Sep-19-2019 06:49 Morning Ride Ride 2903 14.57 rained
18 Sep-26-2019 17:13 Afternoon Ride Ride 1860 12.52 rained

Additional String Documentation

Let’s apply what we learned!