So much useful data has a date or time component. Often, data has a timestamp to represent when the data was acquired, or when an event will take place, or as an identifying attribute like an expiration date. For this reason, understanding how to work with dates and times effectively can be a very useful skill. One common need is to select dates (and times) using rules based on their offset from known times. This article will focus on some handy ways to use pandas DateOffset
s for working with dates specifically.
Since my experience is in the areas of finance and trading, I’ll use some practical examples I’ve encountered over the years. But even if you don’t work in finance, the techniques should work for any data that has dates.
What is a DateOffset
?
A DateOffset
is just a special object that represents a way to shift a date to a new date. This turns out to be really useful.
The DateOffset
class and a number of useful offset aliases are in the pd.offsets
package (an alias to pandas.tseries.offsets
).
Quick overview
Before we look at some ideas of how to use these DateOffset
s, let’s just review how they work. This is all just a high level of what you’ll find in the documentation, so head there for more detail.
First, let’s just look at the DateOffset
class itself, you can do quite a bit with it alone!
The DateOffset
constructor takes a number of keyword arguments. Plural arguments will shift the date. Singular arguments replace the resulting date’s values. Use normalize
to set the time to midnight. Note that DateOffset
will respect timezones, unlike Timedelta
, so if you cross a daylight savings boundary, it will make sure you aren’t off by an hour.
import pandas as pd now = pd.Timestamp.now() print("Add a day:", now + pd.offsets.DateOffset(days=1)) print("Add a week:", now + pd.offsets.DateOffset(weeks=1)) print("Add a month:", now + pd.offsets.DateOffset(months=1)) print("Add an hour:", now + pd.offsets.DateOffset(hours=1)) print("Add a day, replace the hour:", now + pd.offsets.DateOffset(days=1, hour=13)) print("Add a month, normalize:", now + pd.offsets.DateOffset(month=1, normalize=True)) print("Add 2 days across DST change:", pd.Timestamp("2022-11-05 00:00:00", tz="America/Chicago") + pd.offsets.DateOffset(days=2)) print("Add 2 days across DST change (with Timedelta, no adjustment):", pd.Timestamp("2022-11-05 00:00:00", tz="America/Chicago") + pd.Timedelta(days=2))
Add a day: 2022-09-26 14:20:30.243984 Add a week: 2022-10-02 14:20:30.243984 Add a month: 2022-10-25 14:20:30.243984 Add an hour: 2022-09-25 15:20:30.243984 Add a day, replace the hour: 2022-09-26 13:20:30.243984 Add a month, normalize: 2022-01-25 00:00:00 Add 2 days across DST change: 2022-11-07 00:00:00-06:00 Add 2 days across DST change (with Timedelta, no adjustment): 2022-11-06 23:00:00-06:00
Offset aliases
However, you don’t need to use the DateOffset
class directly. Pandas has a ton of named offset aliases that do what you want for a number of common scenarios. You’ll find these to be extremely useful.
print("Next business day (or weekday):", now + pd.offsets.BDay(normalize=True)) print("Three business days (or weekday):", now + pd.offsets.BDay(3, normalize=True)) print("Next Easter:", now + pd.offsets.Easter(normalize=True))
Next business day (or weekday): 2022-09-26 00:00:00 Three business days (or weekday): 2022-09-28 00:00:00 Next Easter: 2023-04-09 00:00:00
You can also subtract offsets.
print("Beginning of month:", now - pd.offsets.MonthBegin(normalize=True)) print("Beginning of quarter:", now - pd.offsets.QuarterBegin(normalize=True)) print("Beginning of year:", now - pd.offsets.YearBegin(normalize=True))
Beginning of month: 2022-09-01 00:00:00 Beginning of quarter: 2022-09-01 00:00:00 Beginning of year: 2022-01-01 00:00:00
Full offset alias list
Pandas has a plethora of configured offset aliases. You can create them by constructing them as an object as shown above, or you can pass their code (listed in parentheses below) to other pandas methods that take offsets as a parameter, as you’ll see below. Here’s a list taken right from the documentation.
DateOffset
Generic offset class, defaults to absolute 24 hoursBDay
orBusinessDay
, (B
). business day or weekdayCDay
orCustomBusinessDay
, (C
). custom business dayWeek
(W
) one week, optionally anchored on a day of the weekWeekOfMonth
(WOM
) the x-th day of the y-th week of each monthLastWeekOfMonth
(LWOM
) the x-th day of the last week of each monthMonthEnd
(M
) calendar month endMonthBegin
(MS
) calendar month beginBMonthEnd
orBusinessMonthEnd
(BM
) business month endBMonthBegin
orBusinessMonthBegin
(BMS
) business month beginCBMonthEnd
orCustomBusinessMonthEnd
(CBM
) custom business month endCBMonthBegin
orCustomBusinessMonthBegin
(CBMS
) custom business month beginSemiMonthEnd
(SM
) 15th (or other day_of_month) and calendar month endSemiMonthBegin
(SMS
) 15th (or other day_of_month) and calendar month beginQuarterEnd
(Q
) calendar quarter endQuarterBegin
(QS
) calendar quarter beginBQuarterEnd
(BQ
) business quarter endBQuarterBegin
(BQS
) business quarter beginFY5253Quarter
(REQ
) retail (aka 52-53 week) quarterYearEnd
(A
) calendar year endYearBegin
(AS
) or (BYS
) calendar year beginBYearEnd
(BA
) business year endBYearBegin
(BAS
) business year beginFY5253
(RE
) retail (aka 52-53 week) yearEaster
Easter holidayBusinessHour
(BH
) business hourCustomBusinessHour
(CBH
) custom business hourDay
(D
) one absolute dayHour
(H
) one hourMinute
(T
) or (min
) one minuteSecond
(S
) one secondMilli
(L
) or (ms
) one millisecondMicro
(U
) or (us
) one microsecondNano
(N
) one nanosecond
A useful place to use the offset aliases is in pd.date_range
. The code can be passed in as the freq
argument along with numbers. Here’s a few examples.
print("Beginning of the quarter\n", pd.date_range(start='2022-01-01', freq='QS', periods=4)) print("Beginning of the month\n", pd.date_range(start='2022-01-01', freq='MS', periods=4)) print("Beginning of every 3rd month\n", pd.date_range(start='2022-01-01', freq='3MS', periods=4))
Beginning of the quarter DatetimeIndex(['2022-01-01', '2022-04-01', '2022-07-01', '2022-10-01'], dtype='datetime64[ns]', freq='QS-JAN') Beginning of the month DatetimeIndex(['2022-01-01', '2022-02-01', '2022-03-01', '2022-04-01'], dtype='datetime64[ns]', freq='MS') Beginning of every 3rd month DatetimeIndex(['2022-01-01', '2022-04-01', '2022-07-01', '2022-10-01'], dtype='datetime64[ns]', freq='3MS')
What is the alternative to pandas DateOffset
s?
You’ve probably seen a lot of code that tries to do complex date logic using basic Python datetime times. This might make sense for trival cases, but you will quickly run into situations that cause that code to turn ugly. For example, if given a date you want to find the next Monday, you could write something like this:
import datetime today = datetime.date.today() while today.weekday() != 0: # Monday today += datetime.timedelta(days=1) today
datetime.date(2022, 9, 26)
Compare the above to
(pd.Timestamp.today() + pd.offsets.Week(1, weekday=0)).date()
datetime.date(2022, 9, 26)
Plus, it’s easy to use these offsets on pandas Series
and DataFrame
s.
s = pd.Series(pd.date_range('2022-01-01', periods=5)) s + pd.offsets.Week(1, weekday=0)
0 2022-01-03 1 2022-01-03 2 2022-01-10 3 2022-01-10 4 2022-01-10 dtype: datetime64[ns]
More complicated scenarios
One way I’ve found offsets to be useful is to select data for certain events. For example, a very important report for the US financial markets is made available every month from the U.S. Bureau of Labor Statistics. It’s called the Employment Situation, with the “Non-Farm payrolls” number in that report being one of the most closely watched pieces of data by traders. Their schedule is [listed on their website]. It generally follows the schedule of the first Friday of the month. We can generate this pretty easily using pandas. There are a couple of ways we could do this, but here’s one technique. We can make a date index using date_range
, and pass in the MonthBegin
as the freq
, using the code from the list above.
dates = pd.date_range('2022-01-01', '2022-12-31', freq='MS') dates
DatetimeIndex(['2022-01-01', '2022-02-01', '2022-03-01', '2022-04-01', '2022-05-01', '2022-06-01', '2022-07-01', '2022-08-01', '2022-09-01', '2022-10-01', '2022-11-01', '2022-12-01'], dtype='datetime64[ns]', freq='MS')
Now, given the first day of the month, can we get the first Monday of the month? One way to do this is to back up 1 day (in case the first day is a Monday itself), then move forward one week, but setting the weekday to Friday.
dates - pd.offsets.Day(1) + pd.offsets.Week(1, weekday=4)
DatetimeIndex(['2022-01-07', '2022-02-04', '2022-03-04', '2022-04-01', '2022-05-06', '2022-06-03', '2022-07-01', '2022-08-05', '2022-09-02', '2022-10-07', '2022-11-04', '2022-12-02'], dtype='datetime64[ns]', freq=None)
But, it turns out you can pass in a 0
as the week move, and in that case it will not shift if the start date is the same as the anchor point. (I hadn’t realized that until I was writing this up so used to do it the first way. The pandas docs are full of great information, you should read them!).
dates + pd.offsets.Week(0, weekday=4)
DatetimeIndex(['2022-01-07', '2022-02-04', '2022-03-04', '2022-04-01', '2022-05-06', '2022-06-03', '2022-07-01', '2022-08-05', '2022-09-02', '2022-10-07', '2022-11-04', '2022-12-02'], dtype='datetime64[ns]', freq=None)
Now if I compare the values above with this year’s dates listed at the BLS site, I see that the July data was released on July 8th, not July 1st. This is related to the U.S. Independence Day holiday the following Monday. This is a good reminder to never completely trust your understanding of the data! The BLS can choose to move things around if they want to, so having a reliable reference source for events is probably required if you are depending on this data.
Holidays
What about dealing with holidays? If we look at the 2021 schedule, we can see that the first Friday in January falls on New Year’s Day. It turns out that adding holidays is not that hard with pandas. If all you want to do is select the next business day, you can just use the calendar with a CustomBusinessDay
offset, with a value of 0
that means we should only move forward if the date is a holiday.
dates_2021 = pd.date_range('2021-01-01', '2021-12-31', freq='MS') dates_2021
DatetimeIndex(['2021-01-01', '2021-02-01', '2021-03-01', '2021-04-01', '2021-05-01', '2021-06-01', '2021-07-01', '2021-08-01', '2021-09-01', '2021-10-01', '2021-11-01', '2021-12-01'], dtype='datetime64[ns]', freq='MS')
from pandas.tseries.holiday import USFederalHolidayCalendar bday_us = pd.offsets.CustomBusinessDay(0, calendar=USFederalHolidayCalendar()) dates_2021 + bday_us
/Users/mcw/.pyenv/versions/pandas/lib/python3.8/site-packages/pandas/core/arrays/datetimes.py:760: PerformanceWarning: Non-vectorized DateOffset being applied to Series or DatetimeIndex. warnings.warn(
DatetimeIndex(['2021-01-04', '2021-02-01', '2021-03-01', '2021-04-01', '2021-05-03', '2021-06-01', '2021-07-01', '2021-08-02', '2021-09-01', '2021-10-01', '2021-11-01', '2021-12-01'], dtype='datetime64[ns]', freq=None)
Note that we get a warning that the offset is non-vectorized. This means that if you want to use this technique on an extremely large dataset, this will be quite slow (as of the time of writing with pandas 1.4.3). For this reason, for larger data sets you may want to create this index once and use it multiple times with your data.
Now, note that above we used a holiday calendar from pandas. But the holidays on the web site were slightly different – the BLS listed Inauguration Day as a holiday as well. We can make a custom holiday calendar ourselves.
bls_holidays = [ "2021-01-01", "2021-01-18", "2021-01-20", "2021-02-15", "2021-05-31", "2021-07-05", "2021-09-06", "2021-10-11", "2021-11-11", "2021-11-25", "2021-12-24", "2021-12-31", ] bday_bls = pd.offsets.CustomBusinessDay(0, holidays=bls_holidays) dates_2021_bls = dates_2021 + bday_bls dates_2021_bls
/Users/mcw/.pyenv/versions/pandas/lib/python3.8/site-packages/pandas/core/arrays/datetimes.py:760: PerformanceWarning: Non-vectorized DateOffset being applied to Series or DatetimeIndex. warnings.warn(
DatetimeIndex(['2021-01-04', '2021-02-01', '2021-03-01', '2021-04-01', '2021-05-03', '2021-06-01', '2021-07-01', '2021-08-02', '2021-09-01', '2021-10-01', '2021-11-01', '2021-12-01'], dtype='datetime64[ns]', freq=None)
Now, if you had a Series
or DataFrame
of data, say returns for a financial instrument for every day of the year, you could use this index to pick out the ones from the dates in question using pandas indexing. If you want to know more about indexing time series data in pandas, you can check out this article. Here’s an example:
# make some fake data, one value per day of the year df = pd.DataFrame(np.random.rand(365), index=pd.date_range('2021-01-01', '2021-12-31')) df.loc[dates_2021_bls]
0 2021-01-04 0.151260 2021-02-01 0.201709 2021-03-01 0.921957 2021-04-01 0.072389 2021-05-03 0.821674 2021-06-01 0.561620 2021-07-01 0.926453 2021-08-02 0.055801 2021-09-01 0.768521 2021-10-01 0.294276 2021-11-01 0.651574 2021-12-01 0.099297
In summary, you can use pandas DateOffset
s to shift dates easily. This can be a huge timesaver when you need to select data using complex (and not so complex) criteria. How will you use them in your next data investigation?