All you need to know about date/time in R and SAS
1 How SAS stores datetime
- SAS
datetime
values (when represented as doubles) do not store time zones.- SAS datetime values are measured in seconds, beginning with 01Jan1960 00:00:00 local time. For example, the SAS datetime value for 01Jan1960 00:00:00 in England is 0. In Japan, the SAS datetime value for 01Jan1960 00:00:00 is also 0. There is a nine-hour difference between England and Japan. At the same point in time, the datetime in England and the datetime in Japan cannot both be 0. To work with absolute time, SAS supports UTC date, time, and datetime values in time zones. SAS Help Center: Specifying Time Zones in SAS
- In SAS, time zone is handled by the
options timezone
option. After setting the system-level time zone, adatetime
format will display time zone like2013-03-15T04:15:00-05:00
.
- For maximum compatibility with other software/languages, avoid using the
datetime
format directly. - When you want to represent datetime and its time zone, first convert the datetime to UTC. Then store the date and time separately, e.g.,
date_utc
andtime_utc
. - This is how WRDS handles time zones in SAS.
date
- Number of days since Jan 1, 1960.
- e.g.,
0
for 1960-01-01
time
- The number of seconds since midnight (12:00 am)
datetime
- The number of seconds since the midnight of Jan 1, 1960
- The time zone is handled by the
options timezone
option
2 How R stores datetime
In base R, there’s no dedicated format for “time.”
Date
andPOSIXct
are the only available choices.
Date
- The number of days since 1970-01-01.
- No time zone info is stored.
- The difference of two
Date
values is in days.
POSIXct
- The number of seconds since 1970-01-01
- It’s the only format that stores time zone information in R.
- The difference of two
POSIXct
s is in seconds.
- When created without specifying a time zone, the local time zone is used. However, even in this case, no
tz
attribute is created. R implicitly treats a no-time-zone POSIXct as if it has the local time zone. - When a POSIXct has no time zone, R will use the local time zone as default (“MST” in the following case)
# create a POSIXct without specifying timezone
# you'll see local time zone is implicitly used
d1 = as.POSIXct('1970-01-01 08:00:00')
d1
#> [1] "1970-01-01 08:00:00 MST"
# but you'll see no `tz` attribute
attributes(d1)
#> $class
#> 'POSIXct''POSIXt'
#> $tzone
#> ''
# the following case again shows that local time zone is implicitly used
# note the time is converted to 10:00:00
d1 %>% with_tz('America/New_York')
#> [1] "1970-01-01 10:00:00 EST"
- When created with a time zone specified, the time zone is explicitly used/displayed.
# create a POSIXct with timezone specified
# you'll see the time zone is explicitly displayed (EST)
d1 = as.POSIXct('1970-01-01 08:00:00', tz = 'America/New_York')
d1
#> [1] "1970-01-01 08:00:00 EST"
# now you see the `tz` attribute
attributes(d1)
#> $class
#> 'POSIXct''POSIXt'
#> $tzone
#> 'America/New_York'
# the following case again shows that local time zone is implicitly used
d1 %>% with_tz('Asia/Shanghai')
#> [1] "1970-01-01 21:00:00 CST"
With the help of the hms
package, we can easily handle time now.
In hms
, time is stored as the number of seconds since 00:00:00.
as_hms
converts a POSIXct
to hms
.
# Say we want to search a vector of `POSIXct` for any observations that are after 3pm
datetime = ymd_hms('2000-01-01 17:00:00')
datetime[as_hms(datetime)<as_hms('15:00:00')]
#> TRUE
3 WRDS Guide
- WRDS converts datetime to UTC and then stores the date and time parts separately, e.g.,
date_utc
andtime_utc
. - The advantage is that one does not need to figure out how to store time zones.
As discussed above, WRDS likes to store a UTC datetime as separate date and time columns. To combine them, use the following snippet:
# Don't use `as.POSIXct`. You must use `as_datetime`!
data[, ':='(datetime=lubridate::as_datetime(date_utc, tz='UTC') + time_utc)][]
Results:

The rationale is that POSIXct stores data as the number of elapsed seconds. Since our time is already a double
, we just need to add it.
You may be tempted to use the following code:
data[, ':='(datetime_gold=as.POSIXct(as.numeric(date)*86400+time, origin='1970-01-01', tz='UTC'))]
This is problematic because of leap seconds. Multiplying the number of elapsed days by 86,400 does not always equal the elapsed seconds in UTC.
If the original downloaded data is in SAS format (sas7bdat
):
- When we convert the downloaded
sas7bdat
into Feather, pandas/haven will sometimes convert adate
object to adatetime
object, and hence involve unwanted time zones. - The solution is simple: during conversion, force
datetime64[ns]
to be adate
and then save, e.g.,
# force datetime64[ns] to date
df['mostimportantdateutc'] = df['mostimportantdateutc'].dt.date
- Sometimes the original column is of type
datetime
. In this case, we need to explicitly add a time zone:
# add a time zone
df['mostimportantdateutc'] = df['mostimportantdateutc'].dt.tz_localize('UTC')
4 Other tips
4.1 Time interval to numeric: difftime
difftime
is the result of arithmetic operations onPOSIXct
orDate
.- By default,
difftime
is in seconds (Date
) or days (POSIXct
). - A useful trick is to convert
difftime
tonumeric
. For example, to convert adifftime
into hours:as.difftime(x, units='hours')
.
4.2 Convert POSIXct
to Date
You may be tempted to do:
datetime = ymd_hms('2005-10-24 20:00:00') # a POSIXct
as.Date(datetime)
#> 2005-10-25
Surprise. We expected “2005-10-24” but we get “2005-10-25.” The reason is that as.Date
requires you to specify the time zone. When no time zone is available, it defaults to UTC.
One (not the best) solution is:
as.Date(datetime, 'America/New_York')
But a better solution is to use lubridate::as_date
:
as_date(datetime) # No time zone needed!
4.3 Manage time
Base R has no dedicated class for time (recall it only has Date
and POSIXct
). We use the hms
package to format and manage time. In particular, it represents a time in pretty format (e.g., 12:21:12) but internally still stores it as the number of elapsed seconds.
# 1) Create a time in double
x = data.table(time1=46768)
#> time1
#> 1: 46768
# 2) Use `as_hms` to convert time
x[, time2 := as_hms(time1)]
#> time1 time2
#> 1: 46768 12:59:28
# 3) Internally, they're identical
x$time1 == x$time2
#> TRUE