Contents

Regression: How to use it for causal inference?

Series - Causal Inference
Outline

A key question in regression theory is: given $Y=X’\beta + \varepsilon$ and the estimated coefficients $\widehat{\beta} =\operatorname{argmin} E(Y-X’\beta)^2$, when does $\widehat{\beta}=\beta$?

We first discuss the properties of $\widehat{\beta}$ when we do not require $Y=X’\beta+\varepsilon$ to be well behaved (e.g., $X$ could be incomplete and hence introduces OVB).

Then we discuss under what conditions $\widehat{\beta}=\beta$.

The following is always true: $$ \widehat{\beta}_{k} =\frac{\operatorname{Cov}(Y,\hat{x}_k)}{\operatorname{Var}(\hat{x}_{k})} $$ where $\hat{x}_k$ is the residual of $x_{k}$ regressed on all the other covariates.

In particular, for univariate regression $Y=\alpha+\beta x+\varepsilon$, we have $$ \widehat{\beta}=\frac{\operatorname{Cov}(Y,x)}{\operatorname{Var}(x)} $$ However, we cannot guarantee $\widehat{\beta}=\beta$ in this general case.

The key condition is $E(X\varepsilon)=0$, that is, $\varepsilon$ and $X$ are uncorrelated.

When $E(X\varepsilon)=0$ holds, not only do we have $\widehat{\beta}=\beta$, we can also guarantee:

$$ \begin{align} E(Y\varepsilon)&=0 \\ E(\varepsilon\cdot f(X))&=0 \hspace{2em}\text{where $f(X)$ is an arbitrary function of $X$} \end{align} $$

To satisfy $E(X\varepsilon)=0$, we have several choices. If any of the following conditions are met, then we can guarantee $E(X\varepsilon)=0$:

  • No omitted variables
  • Even if we have omitted variables, prove that $X$ and the omitted variables are uncorrelated.
  • $X’\beta=E(Y\mid X)$