(R) Super Efficient Event Study Code

Event study is not new, and there exists plenty of third-party packages to do it, let alone we have WRDS’s official SAS or Python versions. What I’m not satisfied about is that they’re all slow. Here I present a pure R implementation of event study that has the following advantages:
It’s fast. I tested it on a large dataset (3+ millions rows, 3000+ unique stocks, 10k+ events) and it only took 11.4 seconds to compuate the CARs of all these events. If you use WRDS’s SAS version, it may take minutes.
It’s elegant and short. The event study is wrapped up in a single function
do_car
. The core part ofdo_car
is less than 40 lines.It replicates WRDS’s version. I compared the result of my code with WRDS’s SAS version, and I found my code fully replicates WRDS.
It’s purely in R.
The full code is in Putting them together.
If you want to add more bells and whistles, see The full-fledged CRSP version. This version includes sanity checks and fast-fail conditions, among many others. You may or may not need them.
The R code I’ll show heavily relies on some advanced techniques of the
data.table
package. Withoutdata.table
(for example, usingdplyr
only), I can’t achieve the speedup you’ll see later.IMHO,
data.table
is much faster thandplyr
orpandas
in many use cases while requiring 30% or even 50% less code. It encourages you to think selecting, column modification, and grouping operations as a single elegant, coherent operation, instead of breaking them down. You don’t need to take my word, checkout the official introduction and performance benchmark yourself and you’ll realize it.You don’t need to be a
data.table
master to understand this article, but some working knowledge is definitely helpful.
1 The dataset
In this article, we’ll build a highly efficient event study program in R. We’ll use all the earnings call events in 2021 in the CRSP stock universe as an example.
events
is the only dataset we’ll use in this example. Each row of it records the daily return and an event flag:
date
: all the available trading dates for the stock in a ascending order.is_evt
: True ifdate
is an event date (in our case, earnings call day) and False if it’s an non-event trading day.ret
: daily return of the stockrf
,mktrf
,smb
,hml
,umd
: Fama-French factors plus momentum. You can add more as you like.
What’s important about the
events
table is that it contains both the daily return (which will be used in regression) and an event flag (used to determine the event window).Usually, we want to include enough pre-event and post-event daily returns. In this sample dataset, I extend each event by 2 years’ of data. That is, if the event date is 2021-01-26, I’ll include all the daily returns (and factor returns) from 2019-01-26 to 2023-01-26. That’s probably more than enough, since most event studies has much smaller windows.
Given an event date, how to extract the daily returns in its surrounding days is a problem. I included the code at the end but leave it to the reader to study it, since our main focus is the event study part.
2 For a single event
Let’s say we want to calculate the CAR to measure the abnormal return during the event period. To start, we focus on this question:
If there’s only one event and one stock, how to write a function to compute the CAR?
I assure you, after understanding the code for one event, extending it to many events is very straightforward.
2.1 Specify the windows
First, let’s specify the windows of our event study, we’ll use the setting throughout this article:

m1
andm2
specify the start and end day of the model period (“m” stands for model). Model period is used to estimate the expected return model. In this example, we use (-14, -5). This is shorter than a typical event study, but it makes our later illustration convenient.minobs
specifies the minimal trading days in the model period. In this example, we require a stock must have no less than 5 trading days before the event.e1
ande2
specify the event period, i.e., the window during which CAR (cumulative abnormal return) is calculated (“e” stands for event). In this example, we compute CAR from the previous 3 trading day to the following 3 trading days, i.e., CAR(-3,3).- Event day took place on day 0.
Conventionally, the event time is counted as “trading day” instead of “calendar day”
2.2 do_car
: the core function
do_car
is core of the program. For every event, it 1) estimate the expected return model, and 2) compute the abnormal return for each event day. The output looks like:

permno
,evtdate
,date
,ret
: these columns are already explained above.evttime
: event time indicator, generated bydo_car
. It equals 0 for the event day. You can see that because theevtdate
is 2021-01-26, onlydate
with the same value (2021-01-06) has anevttime
of 0.coef_[alpha/mktrf/smb/hml/umd]
: coefficients of the intercept term and factor returns. For the same event (i.e., for a singleevtdate
), the coefficients are the same, because we only need to estimate one model.ar
: daily abnormal return.
Now let’s a look at the do_car
function:
|
|
Most of the inputs are no strangers to us, for they’re simply columns from the events
table as we explained before. We’ll use the dataset we introduced earlier to explain what do_car
does. Let’s assume the following table contains the full sample, i.e., there’re only 18 rows (days), and the event took place on 2021-01-26 (the 15th row):
For do_car
, the inputs are:
evt_rowidx
(event row index). It’s the row index of the event in the table. In our case, it’s 15 because event date is the 15th row.permno
,date
,ret
,rf
,mktrf
,smb
,hml
,umd
are simply columns from the table. For example,permno=[10026, 10026, ...]
,date=[2021-01-05, ..., 2021-01-29]
.m1
,m2
,e1
,e2
andminobs
specify the model period and event period. See Specify the windows
Now let’s look at what do_car
exactly does. The general idea is simple: we first extract the time series that will be used for the model period (i.e., m1 to m2) and the event period (i.e., e1 to e2), respectively. We then estimate the model on the model period. Finally, we apply the model to the window period and calculate the abnormal return.
Specifically (if you lose track, read this figure again):
- We create four “indicators”:
i1
toi4
. They help us to extract the time series needed. In our example,i1=evt_rowidx+m1=15+(-14)=1
,i2=evt_rowidx+m2=15+(-5)=10
, therefore, the model period is from the 1st row to the 10th row. To get the daily return of the model period, we simply runret_model=ret[i1:i2]
. Similarly, we can get the time series of the factor returns bysmb_model=smb[i1:i2]
,hml_model=hml[i1:i2]
, etc. - Similarly, we can calculate the time indicators for the “event period”:
i3
andi4
. We use them to get the time series for the event period:ret_evt=ret[i3:i4]
,hml_evt=evt[i3:i4]
, etc. - Now, we can estimate the expected return model:
model = lm(I(ret_model-rf_model) ~ mktrf_model + smb_model + hml_model + umd_model)
- We save its coefficients with
coef = coef(model)
- We then apply the model to the event period and calculate the abnormal returns:
ars = ret_evt - predict(model, list(ret_model=ret_evt, rf_model=rf_evt, mktrf_model=mktrf_evt, smb_model=smb_evt, hml_model=hml_evt, umd_model=umd_evt))
- Finally, we assemble the results (the coefficients and the abnormal returns) into a table:
output = list(evtdate=date[evt_rowidx], date=date_evt, evttime=evttime, ret=ret_evt, ar = ars)
output = c(output, coef)
3 Scale to many events
As I said before, after you know how to compute for one event, extending to thousands of events is straightforward. All you need is:
|
|
Some of lines you’re already familiar with:
do_car
is exactly the same function we developed in "do_car
the core function"events
is the input dataset we introduced in The dataset. It contains all the earnings calls in 2021.
What’s new is:
- The
keyby=
argument at the end. If you ignore the strange code in the middle, the structure of the above code is:
|
|
- As you can see,
keyby
“groups” the dataset by thepermno
column. If you come fromdplyr
, it equals togroup_by
; if you come from pandas, it’sdf.groupby
- The
keyby
argument is executed before the middle code{...}
wrapped by curly braces. Therefore, the middle{...}
only needs to deal with one stock now (though it can have many events).
Now let’s look at what’s in the middle curly braces {...}
. We excerpt this part as follows:
|
|
evt_rowidxs = which(is_evt)
returns all the event row indexes of for the givenpermno
(remember we’re now working with only one stock before we alreadykeyby=
). For example, if it returnsc(101, 450)
, then it means there’re two events on row 101 and 450.- In
do_car
, we only dealt with one event, but nowevt_rowidxs
gives us many events. We uselapply(evt_rowidx, do_car)
to applydo_car
to each event inevt_rowidx
. - But
do_car
has many input arguments (e.g.,evt_rowidx
,hml
,smb
, etc.), andevt_rowidx
can only provide one:evt_rowidx
. To successfullylapply
, we need to first fill up the other arguments before sendingdo_car
tolapply
. That’s whatpartial
does. It “pre-fill” arguments of a function.
partial
is from thepryr
package. If you’ve read Hadley Wickham’s Advanced R, you won’t be strange topryr
. Because he created this package and used it a lot in the book.
- As the “l” in its name indicates,
lapply
returns a “list” of results, with each element for one event. To convert the list of results into a 2D table, we usedrbindlist
(row-bind-list).
Congrats! Now you’ve grasped a super efficient event study program!
3.1 Putting them together
The full code is provided below. It assumes you already have the input data as detailed in the beginning and the window specifications detailed in this figure. The output is like this figure.
|
|
I test the above code on the dataset we introduced in the beginning–all earnings call events of the whole CRSP universe in 2021. It has 3.16 million row, 3179 unique stocks, 10229 unique events. By no means it’s small.
It only took 11.4 seconds to compute!
4 The full-fledged CRSP version
The code I showed above definitely does what it’s supposed to do. However, if you want to add some bells and whistles, refer to the full-fledge version below. With the knowledge you already know, they’re not so hard to understand.
The full-fledged version:
- Include code to generate the input dataset
events
, i.e., it contains a functionget_retevt
to extract daily returns data for each event. - Fully based on CRSP.
- Include many sanity checks for fast-fail and debugging.
|
|