All you need to know about data/time in R and SAS
datetimevalue (when it’s represented as double) does not store timezone!
- SAS datetime values are measured in seconds, beginning with 01Jan1960 00:00:00 local time. For example, the SAS datetime value for 01Jan1960 00:00:00 in England is 0. In Japan, the SAS datetime value for 01Jan1960 00:00:00 is also 0. There is a nine-hour difference between England and Japan. At the same point in time, the datetime in England and the datetime in Japan cannot be 0. In order to work with absolute time and datetime values, SAS supports the UTC date, time, and datetime values in time zones. SAS Help Center: Specifying Time Zones in SAS
- In SAS, time zone is handled by the
options timezoneoptions. After setting the system-level time zone, then a
datetimeformat will display time zone like
- For the most compatibility with other software/programming languages, don’t use
- When you want to represent datetime and its time zone, first, convert the datetime to UTC. Then store the date part and time part separately, e.g.,
- This is how WRDS handles time zone in SAS.
- Number of days since Jan 1, 1960.
- The number of seconds since midnight (12:00 am)
- The number of seconds since the midnight of Jan 1, 1960
- The time zone is handled by the
In base R, there’s no dedicated format for “time.”
POSIXctare the only available choices.
- The number of days since 1970-01-01.
- No time zone info is stored!
- The difference of two
Dateis number of days.
- The number of seconds since 1970-01-01
- It’s the only format that stores time zone in R.
- The difference of two
POSIXcts is number of seconds.
- When created without specifying a time zone, the local time zone is used. However, even in this case, no
tzattribute is created. R implicitly treats a no-time-zone POSIXct as if it has the local time zone.
- When a POSIXct has no time zone, R will use local time zone as default (“MST” in the following case)
- When created with time zone specified, the time zone is explicitly used/displayed.
With the help of the
hms package, we can easily handle time now.
hms, time is stored as the number of seconds since 00:00:00.
as_hms is used to convert a
- WRDS converts datetime to UTC and then stores the date and time part separately, e.g.,
- The advantage is that one doesn’t need to figure out how to store time zone.
As discussed above, WRDS likes to separately store a UTC datetime into a date and a time. To combine them, use the following snippet:
The rational behind is that POSIXct stores data as the number of elapsed seconds. Since our time is already in
double, we just need to add it.
You may be tempted to use the following code:
This is problematic because of leap seconds. Multiplying the number of elapsed days with 86400 not always equals to the elapsed seconds in UTC.
If the original downloaded data is in SAS format (sas7bdat):
- When we convert the downloaded
sas7bdatinto feather, pandas/haven will sometimes convert a
dateobject to a
datetimeobject, and hence involves unwanted time zone.
- The solution is simple: during the conversion, force
datatime64[ns]to be a
dateand then save, e.g.,
- Sometimes the original column is of
datatime. In this case, we need to explicitly add a time zone:
difftimeis the result of arithmetic operation on
- By default,
difftimeis in seconds (
Date) or days (
- A very useful trick is to convert
numeric. For example, if we want to convert a
difftimeinto hours, we can do
You may be tempted to do:
Surprise. We expected “2005-10-24” but we get “2005-10-25.” The reason is that
as.Date requires you to specify the time zone. When no time zone is available, it defaults to UTC.
One (not the best) solution is:
But a better solution is to use
The base R has no dedicated class for time (recall it only has
POSIXct). We use the
hms package to format and manage time. Particularly, it represent a time in pretty format (e.g., 12:21:12) but internally still stores it as the number of elapsed seconds.