(R) Dealing with CCM Consecutive Date Ranges
1 The Problem
CCM (CRSP/Compustat Merged) from WRDS allows one to link observations between CRSP and Compustat. Of all the tables provided by CCM, the ccmxpf_lnkhist
(link history) table is probably the most important, as it’s the officially recommended table to use.
However, WRDS also points out that the link history table suffers from a “consecutive data ranges” problem. To illustrate, let look at the following two lines from the ccmxpf_lnkhist
table:
As you can see, there’s essentially only one match, from gvkey
(001010) to permno
(10006), as the end date of the second line is followed by the start date of the next. However, it’s separated into two lines. The reason is that the first line has a linkprim
(primary link type) “C” while the second is “P.”1 But, come on, in 99% of the cases it’ll be safe to just go merging them. Actually, WRDS even provides a SAS macro for this task. But honestly, people are shifting to R/Python/Stata, rendering this SAS macro of lesser use.
In this article, I’ll provide an R equivalent to WRDS’s SAS macro to merge consecutive date ranges in CCM.
2 The Code
First, download the ccmxpf_lnkhist
table.
|
|
Then, collapse the consecutive ranges:
|
|