All you need to know about date/time in R and SAS

How time zone works in SAS
  • SAS datetime values (when represented as doubles) do not store time zones.
    • SAS datetime values are measured in seconds, beginning with 01Jan1960 00:00:00 local time. For example, the SAS datetime value for 01Jan1960 00:00:00 in England is 0. In Japan, the SAS datetime value for 01Jan1960 00:00:00 is also 0. There is a nine-hour difference between England and Japan. At the same point in time, the datetime in England and the datetime in Japan cannot both be 0. To work with absolute time, SAS supports UTC date, time, and datetime values in time zones. SAS Help Center: Specifying Time Zones in SAS
  • In SAS, time zone is handled by the options timezone option. After setting the system-level time zone, a datetime format will display time zone like 2013-03-15T04:15:00-05:00.
[Rule of thumb] How to handle time zones in SAS
  • For maximum compatibility with other software/languages, avoid using the datetime format directly.
  • When you want to represent datetime and its time zone, first convert the datetime to UTC. Then store the date and time separately, e.g., date_utc and time_utc.
  • This is how WRDS handles time zones in SAS.

date

  • Number of days since Jan 1, 1960.
  • e.g., 0 for 1960-01-01

time

  • The number of seconds since midnight (12:00 am)

datetime

  • The number of seconds since the midnight of Jan 1, 1960
  • The time zone is handled by the options timezone option

In base R, there’s no dedicated format for “time.” Date and POSIXct are the only available choices.

Date

  • The number of days since 1970-01-01.
  • No time zone info is stored.
  • The difference of two Date values is in days.

POSIXct

  • The number of seconds since 1970-01-01
  • It’s the only format that stores time zone information in R.
  • The difference of two POSIXcts is in seconds.
How time zone works in POSIXct?
  • When created without specifying a time zone, the local time zone is used. However, even in this case, no tz attribute is created. R implicitly treats a no-time-zone POSIXct as if it has the local time zone.
  • When a POSIXct has no time zone, R will use the local time zone as default (“MST” in the following case)

r

# create a POSIXct without specifying timezone
# you'll see local time zone is implicitly used
d1 = as.POSIXct('1970-01-01 08:00:00')
d1
#> [1] "1970-01-01 08:00:00 MST"

# but you'll see no `tz` attribute
attributes(d1)
#> $class
#>   'POSIXct''POSIXt'
#> $tzone
#>   ''

# the following case again shows that local time zone is implicitly used
# note the time is converted to 10:00:00
d1 %>% with_tz('America/New_York')
#> [1] "1970-01-01 10:00:00 EST"
  • When created with a time zone specified, the time zone is explicitly used/displayed.

R

# create a POSIXct with timezone specified
# you'll see the time zone is explicitly displayed (EST)
d1 = as.POSIXct('1970-01-01 08:00:00', tz = 'America/New_York')
d1
#> [1] "1970-01-01 08:00:00 EST"

# now you see the `tz` attribute
attributes(d1)
#> $class
#>   'POSIXct''POSIXt'
#> $tzone
#>   'America/New_York'

# the following case again shows that local time zone is implicitly used
d1 %>% with_tz('Asia/Shanghai')
#> [1] "1970-01-01 21:00:00 CST"
How to process time with "hms"?

With the help of the hms package, we can easily handle time now.

In hms, time is stored as the number of seconds since 00:00:00.

as_hms converts a POSIXct to hms.

R

# Say we want to search a vector of `POSIXct` for any observations that are after 3pm

datetime = ymd_hms('2000-01-01 17:00:00')
datetime[as_hms(datetime)<as_hms('15:00:00')]
#> TRUE
How datetime is stored in WRDS
  • WRDS converts datetime to UTC and then stores the date and time parts separately, e.g., date_utc and time_utc.
  • The advantage is that one does not need to figure out how to store time zones.
How to combine a `Date` and `double` time into a POSIXct?

As discussed above, WRDS likes to store a UTC datetime as separate date and time columns. To combine them, use the following snippet:

R

# Don't use `as.POSIXct`. You must use `as_datetime`!
data[, ':='(datetime=lubridate::as_datetime(date_utc, tz='UTC') + time_utc)][]

Results:

The rationale is that POSIXct stores data as the number of elapsed seconds. Since our time is already a double, we just need to add it.

You may be tempted to use the following code:

R

data[, ':='(datetime_gold=as.POSIXct(as.numeric(date)*86400+time, origin='1970-01-01', tz='UTC'))]

This is problematic because of leap seconds. Multiplying the number of elapsed days by 86,400 does not always equal the elapsed seconds in UTC.

How to avoid unwanted time zone conversion when reading WRDS datetime?

If the original downloaded data is in SAS format (sas7bdat):

  • When we convert the downloaded sas7bdat into Feather, pandas/haven will sometimes convert a date object to a datetime object, and hence involve unwanted time zones.
  • The solution is simple: during conversion, force datetime64[ns] to be a date and then save, e.g.,

python

# force datetime64[ns] to date
df['mostimportantdateutc'] = df['mostimportantdateutc'].dt.date
  • Sometimes the original column is of type datetime. In this case, we need to explicitly add a time zone:

python

# add a time zone
df['mostimportantdateutc'] = df['mostimportantdateutc'].dt.tz_localize('UTC')
  • difftime is the result of arithmetic operations on POSIXct or Date.
  • By default, difftime is in seconds (Date) or days (POSIXct).
  • A useful trick is to convert difftime to numeric. For example, to convert a difftime into hours: as.difftime(x, units='hours').

You may be tempted to do:

R

datetime = ymd_hms('2005-10-24 20:00:00')  # a POSIXct
as.Date(datetime)

#> 2005-10-25

Surprise. We expected “2005-10-24” but we get “2005-10-25.” The reason is that as.Date requires you to specify the time zone. When no time zone is available, it defaults to UTC.

One (not the best) solution is:

R

as.Date(datetime, 'America/New_York')

But a better solution is to use lubridate::as_date:

R

as_date(datetime)  # No time zone needed!

Base R has no dedicated class for time (recall it only has Date and POSIXct). We use the hms package to format and manage time. In particular, it represents a time in pretty format (e.g., 12:21:12) but internally still stores it as the number of elapsed seconds.

R

# 1) Create a time in double
x = data.table(time1=46768)

#>    time1 
#> 1: 46768

# 2) Use `as_hms` to convert time
x[, time2 := as_hms(time1)]

#>    time1     time2
#> 1: 46768  12:59:28

# 3) Internally, they're identical
x$time1 == x$time2

#> TRUE
Nickname
Email
Website
0/500
  • OωO
  • |´・ω・)ノ
  • ヾ(≧∇≦*)ゝ
  • (☆ω☆)
  • (╯‵□′)╯︵┴─┴
  •  ̄﹃ ̄
  • (/ω\)
  • ∠( ᐛ 」∠)_
  • (๑•̀ㅁ•́ฅ)
  • →_→
  • ୧(๑•̀⌄•́๑)૭
  • ٩(ˊᗜˋ*)و
  • (ノ°ο°)ノ
  • (´இ皿இ`)
  • ⌇●﹏●⌇
  • (ฅ´ω`ฅ)
  • (╯°A°)╯︵○○○
  • φ( ̄∇ ̄o)
  • ヾ(´・ ・`。)ノ"
  • ( ง ᵒ̌皿ᵒ̌)ง⁼³₌₃
  • (ó﹏ò。)
  • Σ(っ °Д °;)っ
  • ( ,,´・ω・)ノ"(´っω・`。)
  • ╮(╯▽╰)╭
  • o(*////▽////*)q
  • >﹏<
  • ( ๑´•ω•) "(ㆆᴗㆆ)
  • 😂
  • 😀
  • 😅
  • 😊
  • 🙂
  • 🙃
  • 😌
  • 😍
  • 😘
  • 😜
  • 😝
  • 😏
  • 😒
  • 🙄
  • 😳
  • 😡
  • 😔
  • 😫
  • 😱
  • 😭
  • 💩
  • 👻
  • 🙌
  • 🖕
  • 👍
  • 👫
  • 👬
  • 👭
  • 🌚
  • 🌝
  • 🙈
  • 💊
  • 😶
  • 🙏
  • 🍦
  • 🍉
  • 😣
  • 颜文字
  • Emoji
  • Bilibili
0 comments
No comment