Working with Dates and Time

404 image
cycling.head()
Date Name Type Time Distance Comments
0 Sep-10-2019 17:13 Afternoon Ride Ride 2084 12.62 Rain
1 Sep-11-2019 06:52 Morning Ride Ride 2531 13.03 rain
2 Sep-11-2019 17:23 Afternoon Ride Ride 1863 12.52 Wet road but nice whether
3 Sep-12-2019 07:06 Morning Ride Ride 2192 12.84 Stopped for photo of sunrise
4 Sep-12-2019 17:28 Afternoon Ride Ride 1891 12.48 Tired by the end of the week.


cycling.dtypes
Date         object
Name         object
Type         object
Time          int64
Distance    float64
Comments     object
dtype: object
cycling.sort_values('Date').head(15)
Date Name Type Time Distance Comments
21 Oct-01-2019 06:53 Morning Ride Ride 2118 12.71 Rested after the weekend!
22 Oct-01-2019 17:15 Afternoon Ride Ride 1732 NaN Legs feeling strong!
23 Oct-02-2019 06:45 Morning Ride Ride 2222 12.82 Beautiful morning! Feeling fit
... ... ... ... ... ... ...
0 Sep-10-2019 17:13 Afternoon Ride Ride 2084 12.62 Rain
1 Sep-11-2019 06:52 Morning Ride Ride 2531 13.03 rain
2 Sep-11-2019 17:23 Afternoon Ride Ride 1863 12.52 Wet road but nice whether

15 rows × 6 columns

cycling.head()
Date Name Type Time Distance Comments
0 Sep-10-2019 17:13 Afternoon Ride Ride 2084 12.62 Rain
1 Sep-11-2019 06:52 Morning Ride Ride 2531 13.03 rain
2 Sep-11-2019 17:23 Afternoon Ride Ride 1863 12.52 Wet road but nice whether
3 Sep-12-2019 07:06 Morning Ride Ride 2192 12.84 Stopped for photo of sunrise
4 Sep-12-2019 17:28 Afternoon Ride Ride 1891 12.48 Tired by the end of the week.


dates = (cycling['Date'].str.split(' ', expand=True)
                           .rename(columns = {0:'Date',
                                              1:'Time'}))
dates.head()
Date Time
0 Sep-10-2019 17:13
1 Sep-11-2019 06:52
2 Sep-11-2019 17:23
3 Sep-12-2019 07:06
4 Sep-12-2019 17:28
dates = (dates['Date'].str.split('-', expand=True).rename(columns = {0:'Month',
                                                                     1:'Day',
                                                                     2:'Year'}))
dates.head()
Month Day Year
0 Sep 10 2019
1 Sep 11 2019
2 Sep 11 2019
3 Sep 12 2019
4 Sep 12 2019
dates.iloc[0,1]
'10'


type(dates.iloc[0,1])
str
cycling_dates = (cycling.assign(Year = dates['Year'].astype(int),
                                Month =  dates['Month'],
                                Day = dates['Day'].astype(int))
                                )
cycling_dates.head(3)
Date Name Type Time ... Comments Year Month Day
0 Sep-10-2019 17:13 Afternoon Ride Ride 2084 ... Rain 2019 Sep 10
1 Sep-11-2019 06:52 Morning Ride Ride 2531 ... rain 2019 Sep 11
2 Sep-11-2019 17:23 Afternoon Ride Ride 1863 ... Wet road but nice whether 2019 Sep 11

3 rows × 9 columns


cycling_dates = cycling_dates.loc[:, ['Year', 'Month', 'Day', 'Name',
                                      'Type', 'Time', 'Distance', 'Comments']]
cycling_dates.head(3)
Year Month Day Name Type Time Distance Comments
0 2019 Sep 10 Afternoon Ride Ride 2084 12.62 Rain
1 2019 Sep 11 Morning Ride Ride 2531 13.03 rain
2 2019 Sep 11 Afternoon Ride Ride 1863 12.52 Wet road but nice whether
cycling_dates.sort_values(['Year', 'Month', 'Day'])
Year Month Day Name Type Time Distance Comments
21 2019 Oct 1 Morning Ride Ride 2118 12.71 Rested after the weekend!
22 2019 Oct 1 Afternoon Ride Ride 1732 NaN Legs feeling strong!
23 2019 Oct 2 Morning Ride Ride 2222 12.82 Beautiful morning! Feeling fit
... ... ... ... ... ... ... ... ...
18 2019 Sep 26 Afternoon Ride Ride 1860 12.52 raining
19 2019 Sep 27 Morning Ride Ride 2350 12.91 Detour around trucks at Jericho
20 2019 Sep 27 Afternoon Ride Ride 1712 12.47 Tired by the end of the week

33 rows × 8 columns

Pandas parse_dates

cycling = pd.read_csv('data/cycling_data.csv')
cycling.head(3)
Date Name Type Time Distance Comments
0 Sep-10-2019 17:13 Afternoon Ride Ride 2084 12.62 Rain
1 Sep-11-2019 06:52 Morning Ride Ride 2531 13.03 rain
2 Sep-11-2019 17:23 Afternoon Ride Ride 1863 12.52 Wet road but nice whether


cycling.dtypes
Date         object
Name         object
Type         object
Time          int64
Distance    float64
Comments     object
dtype: object
cycling_dates = pd.read_csv('data/cycling_data.csv', parse_dates = ['Date'])
cycling_dates.head()
Date Name Type Time Distance Comments
0 2019-09-10 17:13:00 Afternoon Ride Ride 2084 12.62 Rain
1 2019-09-11 06:52:00 Morning Ride Ride 2531 13.03 rain
2 2019-09-11 17:23:00 Afternoon Ride Ride 1863 12.52 Wet road but nice whether
3 2019-09-12 07:06:00 Morning Ride Ride 2192 12.84 Stopped for photo of sunrise
4 2019-09-12 17:28:00 Afternoon Ride Ride 1891 12.48 Tired by the end of the week.


cycling_dates.dtypes
Date        datetime64[ns]
Name                object
Type                object
Time                 int64
Distance           float64
Comments            object
dtype: object
cycling_dates.sort_values('Date')
Date Name Type Time Distance Comments
0 2019-09-10 17:13:00 Afternoon Ride Ride 2084 12.62 Rain
1 2019-09-11 06:52:00 Morning Ride Ride 2531 13.03 rain
2 2019-09-11 17:23:00 Afternoon Ride Ride 1863 12.52 Wet road but nice whether
... ... ... ... ... ... ...
30 2019-10-10 18:10:00 Afternoon Ride Ride 1841 12.59 Feeling good after a holiday break!
31 2019-10-11 07:47:00 Morning Ride Ride 2463 12.79 Stopped for photo of sunrise
32 2019-10-11 18:16:00 Afternoon Ride Ride 1843 11.79 Bike feeling tight, needs an oil and pump

33 rows × 6 columns

pd.read_csv('data/cycling_data_split_time.csv').head()
Year Month Day Clock ... Type Time Distance Comments
0 2019 Sep 10 17:13:04 ... Ride 2084 12.62 Rain
1 2019 Sep 11 06:52:18 ... Ride 2531 13.03 rain
2 2019 Sep 11 17:23:50 ... Ride 1863 12.52 Wet road but nice weather
3 2019 Sep 12 07:06:19 ... Ride 2192 12.84 Stopped for photo of sunrise
4 2019 Sep 12 17:28:05 ... Ride 1891 12.48 Tired by the end of the week

5 rows × 9 columns


(pd.read_csv('data/cycling_data_split_time.csv',
              parse_dates={'Date': ['Year', 'Month', 'Day', 'Clock']})
              .head())
Date Name Type Time Distance Comments
0 2019-09-10 17:13:04 Afternoon Ride Ride 2084 12.62 Rain
1 2019-09-11 06:52:18 Morning Ride Ride 2531 13.03 rain
2 2019-09-11 17:23:50 Afternoon Ride Ride 1863 12.52 Wet road but nice weather
3 2019-09-12 07:06:19 Morning Ride Ride 2192 12.84 Stopped for photo of sunrise
4 2019-09-12 17:28:05 Afternoon Ride Ride 1891 12.48 Tired by the end of the week
cycling = pd.read_csv('data/cycling_data.csv')
cycling.head()
Date Name Type Time Distance Comments
0 Sep-10-2019 17:13 Afternoon Ride Ride 2084 12.62 Rain
1 Sep-11-2019 06:52 Morning Ride Ride 2531 13.03 rain
2 Sep-11-2019 17:23 Afternoon Ride Ride 1863 12.52 Wet road but nice whether
3 Sep-12-2019 07:06 Morning Ride Ride 2192 12.84 Stopped for photo of sunrise
4 Sep-12-2019 17:28 Afternoon Ride Ride 1891 12.48 Tired by the end of the week.


cycling.dtypes
Date         object
Name         object
Type         object
Time          int64
Distance    float64
Comments     object
dtype: object
new_cycling = cycling.assign(Date = pd.to_datetime(cycling['Date']))
new_cycling.head()
Date Name Type Time Distance Comments
0 2019-09-10 17:13:00 Afternoon Ride Ride 2084 12.62 Rain
1 2019-09-11 06:52:00 Morning Ride Ride 2531 13.03 rain
2 2019-09-11 17:23:00 Afternoon Ride Ride 1863 12.52 Wet road but nice whether
3 2019-09-12 07:06:00 Morning Ride Ride 2192 12.84 Stopped for photo of sunrise
4 2019-09-12 17:28:00 Afternoon Ride Ride 1891 12.48 Tired by the end of the week.


new_cycling.dtypes
Date        datetime64[ns]
Name                object
Type                object
Time                 int64
Distance           float64
Comments            object
dtype: object

Pandas datetime tools

  • .dt.day_name() for the day of the week:
new_cycling['Date'].dt.day_name().head(3)
0      Tuesday
1    Wednesday
2    Wednesday
Name: Date, dtype: object


new_cycling.assign(weekday = new_cycling['Date'].dt.day_name()).head(3)
Date Name Type Time Distance Comments weekday
0 2019-09-10 17:13:00 Afternoon Ride Ride 2084 12.62 Rain Tuesday
1 2019-09-11 06:52:00 Morning Ride Ride 2531 13.03 rain Wednesday
2 2019-09-11 17:23:00 Afternoon Ride Ride 1863 12.52 Wet road but nice whether Wednesday
new_cycling['Date'].dt.day.head()
0    10
1    11
2    11
3    12
4    12
Name: Date, dtype: int32


new_cycling.assign(day = new_cycling['Date'].dt.day).head()
Date Name Type Time Distance Comments day
0 2019-09-10 17:13:00 Afternoon Ride Ride 2084 12.62 Rain 10
1 2019-09-11 06:52:00 Morning Ride Ride 2531 13.03 rain 11
2 2019-09-11 17:23:00 Afternoon Ride Ride 1863 12.52 Wet road but nice whether 11
3 2019-09-12 07:06:00 Morning Ride Ride 2192 12.84 Stopped for photo of sunrise 12
4 2019-09-12 17:28:00 Afternoon Ride Ride 1891 12.48 Tired by the end of the week. 12

Here are some of the most common useful datetime tools:

  • .dt.year
  • .dt.month
  • .dt.month_name()
  • .dt.day
  • .dt.day_name()
  • .dt.hour
  • .dt.minute

For a full list, refer to the attributes and methods section of the Timestamp documentation.

new_cycling.head()
Date Name Type Time Distance Comments
0 2019-09-10 17:13:00 Afternoon Ride Ride 2084 12.62 Rain
1 2019-09-11 06:52:00 Morning Ride Ride 2531 13.03 rain
2 2019-09-11 17:23:00 Afternoon Ride Ride 1863 12.52 Wet road but nice whether
3 2019-09-12 07:06:00 Morning Ride Ride 2192 12.84 Stopped for photo of sunrise
4 2019-09-12 17:28:00 Afternoon Ride Ride 1891 12.48 Tired by the end of the week.


If I select the first example in row 1 of our new_cycling dataset, you’ll notice that it outputs something called a Timestamp.

timestamp_ex = new_cycling.loc[1,'Date']
timestamp_ex
Timestamp('2019-09-11 06:52:00')
timestamp_ex
Timestamp('2019-09-11 06:52:00')


timestamp_ex.month_name()
'September'


timestamp_ex.day
11


timestamp_ex.hour
6

.diff()

cycling_intervals = new_cycling['Date'].sort_values().diff()
cycling_intervals
0                NaT
1    0 days 13:39:00
2    0 days 10:31:00
           ...      
30   0 days 10:15:00
31   0 days 13:37:00
32   0 days 10:29:00
Name: Date, Length: 33, dtype: timedelta64[ns]

timedelta

cycling_intervals[1]
Timedelta('0 days 13:39:00')


cycling_intervals[1].seconds
49140


sec_per_hour = 60 * 60
cycling_intervals[1].seconds / sec_per_hour
13.65
cycling_intervals.max()
Timedelta('5 days 13:47:00')


cycling_intervals.min()
Timedelta('0 days 10:15:00')


interval_range = cycling_intervals.max() - cycling_intervals.min()
interval_range
Timedelta('5 days 03:32:00')

Let’s apply what we learned!