(R) Dealing with CCM Consecutive Date Ranges

CCM (CRSP/Compustat Merged) from WRDS allows one to link observations between CRSP and Compustat. Of all the tables provided by CCM, the ccmxpf_lnkhist (link history) table is probably the most important, as it’s the officially recommended table to use.

However, WRDS also points out that the link history table suffers from a “consecutive data ranges” problem. To illustrate, let look at the following two lines from the ccmxpf_lnkhist table:

/ccm-consecutive-obs/fig-1.png
An example of consecutive date ranges

As you can see, there’s essentially only one match, from gvkey (001010) to permno (10006), as the end date of the second line is followed by the start date of the next. However, it’s separated into two lines. The reason is that the first line has a linkprim (primary link type) “C” while the second is “P.”1 But, come on, in 99% of the cases it’ll be safe to just go merging them. Actually, WRDS even provides a SAS macro for this task. But honestly, people are shifting to R/Python/Stata, rendering this SAS macro of lesser use.

In this article, I’ll provide an R equivalent to WRDS’s SAS macro to merge consecutive date ranges in CCM.

First, download the ccmxpf_lnkhist table.

1
2
3
4
5
6
7
8
9
library(DBI)
library(data.table)

pg <- dbConnect(RPostgres::Postgres(), 
                bigint = "integer", 
                check_interrupts = TRUE)

ccm <- tbl(pg, Id(schema = "crsp", table = "ccmxpf_lnkhist"))
ccm <- as.data.table(ccm)

Then, collapse the consecutive ranges:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# create previous linkenddt and linkdt
ccm = ccm[, ':='(last_linkdt=shift(linkdt), last_linkenddt=shift(linkenddt)),
	  keyby=.(gvkey, lpermno)]

# if linkdt is on the same day or 1 day after last_linkenddt, 
# rewrite linkdt to last_linkdt
ccm = ccm[linkdt==last_linkenddt+1 | linkdt==last_linkenddt,  
    ':='(linkdt=last_linkdt)]

# merge consecutive observations
# only keep the last observation of (gvkey, lpermno, linkdt) pair
ccm = ccm[order(gvkey, lpermno, linkdt, linkenddt)
    ][, .SD[.N], keyby=.(gvkey, lpermno, linkdt)
    ][, c('last_linkdt', 'last_linkenddt') := NULL]

  1. If you don’t know what’s the difference between “P” and “C,” refer to this great article. But long story short, they can be viewed as the same in most cases. ↩︎