(R) Dealing with CCM Consecutive Date Ranges
1 The Problem
CCM (CRSP/Compustat Merged) from WRDS allows one to link observations between CRSP and Compustat. Of all the tables provided by CCM, the ccmxpf_lnkhist
(link history) table is probably the most important, as it’s the officially recommended table to use.
However, WRDS also points out that the link history table suffers from a “consecutive data ranges” problem. To illustrate, let’s look at the following two lines from the ccmxpf_lnkhist
table:
As you can see, there’s essentially only one match, from gvkey
(001010) to permno
(10006), as the end date of the second line is followed by the start date of the next. However, it’s separated into two lines. The reason is that the first line has a linkprim
(primary link type) “C” while the second is “P.”1 But, come on, in 99% of the cases it’ll be safe to just go merging them. Actually, WRDS even provides a SAS macro for this task. But honestly, people are shifting to R/Python/Stata, rendering this SAS macro of lesser use.
In this article, I’ll provide an R equivalent to WRDS’s SAS macro to merge consecutive date ranges in CCM.
2 The Code
First, download the ccmxpf_lnkhist
table.
library(DBI)
library(data.table)
pg <- dbConnect(RPostgres::Postgres(),
bigint = "integer",
check_interrupts = TRUE)
ccm <- tbl(pg, Id(schema = "crsp", table = "ccmxpf_lnkhist"))
ccm <- as.data.table(ccm)
Then, collapse the consecutive ranges:
# create previous linkenddt and linkdt
ccm = ccm[, ':='(last_linkdt=shift(linkdt), last_linkenddt=shift(linkenddt)),
keyby=.(gvkey, lpermno)]
# if linkdt is on the same day or 1 day after last_linkenddt,
# rewrite linkdt to last_linkdt
ccm = ccm[linkdt==last_linkenddt+1 | linkdt==last_linkenddt,
':='(linkdt=last_linkdt)]
# merge consecutive observations
# only keep the last observation of (gvkey, lpermno, linkdt) pair
ccm = ccm[order(gvkey, lpermno, linkdt, linkenddt)
][, .SD[.N], keyby=.(gvkey, lpermno, linkdt)
][, c('last_linkdt', 'last_linkenddt') := NULL]