## Key Conditions for IVs
[Instrumental Variables](Instrumental%20Variable.md) (IVs) must satisfy three key conditions:
1. **Instrument Relevance**: The instrument $Z$ must be correlated with the endogenous regressor $X$. This means there should be a non-zero correlation between $Z$ and $X$ ($\text{Cov}(Z, X) \neq 0$). This condition ensures that the instrument can explain some variation in the regressor.
2. **Instrument Exogeneity**: The instrument $Z$ must be uncorrelated with the error term $\varepsilon$ in the outcome equation $Y = \pi X + \varepsilon$ This implies that $Z$ affects the dependent variable $Y$ only through $X$, and not directly or through omitted variables ($\text{Cov}(Z, \varepsilon) = 0$). This ensures that any correlation between $Z$ and $Y$ is only due to the effect of $X$.
3. **No other latent instruments**: Especially ones that have a stronger correlation with $X$ than $Z$.
Note, an instrumental variable (IV) is not a [[confounding variable]]. Instead, it is a separate variable used to address the issue of endogeneity in causal inference and does not affect both the independent variable and the response. If anything, lurking confounding variables might affect $Z$, $X$, and $Y$.
## Mesasuring the causal effect of $X$ on $Y$
To remove the effect of the instrument on $X$ and isolate the causal effect of $X$ on $Y$, the **Two-Stage Least Squares** (2SLS) method is commonly used: In Stage 1, $Z$ isolates the variation in $X$ that is independent of confounders. In Stage 2, you estimate how this exogenous variation in $X$ affects $Y$, thus obtaining the causal effect of $X$ on $Y$ as the weight attached to $\hat{X}$.
### Isolate the exogenous variation in $X$ that is independent of confounders
**Stage 1**: Regress $X$ on $Z$ (and any [control variables](Control%20Variable.md) $W$):
$X = \pi_1 Z + \pi_2 W + \varepsilon$
Here, the fitted values $\hat{X}$ represent the variation in $X$ that is explained by $Z$ (i.e., the portion of $X$ that is exogenous).
### Estimate the causal effect of $X$ on $Y$
**Stage 2**: Regress $Y$ on the predicted $\hat{X}$ from Stage 1:
$Y = \beta_1 \hat{X} + \beta_2 W + \nu$
The coefficient $\beta_1$ provides the estimate of the causal effect of $X$ on $Y$.
## Example
Here's a concrete example of how instrumental variables (IVs) work using the variables $X$, $Y$, and $Z$:
- **$X$ (Explanatory Variable)**: Depression level.
- **$Y$ (Outcome Variable)**: Smoking behavior.
- **$Z$ (Instrumental Variable)**: Lack of job opportunities.
In this scenario, you want to understand if depression ($X$) affects smoking ($Y$). However, there might be unobserved factors affecting both. The instrumental variable, lack of job opportunities ($Z$), is correlated with depression ($X$) but does not directly affect smoking ($Y$) except through its impact on depression. This makes $Z$ a valid instrument because it helps isolate the causal effect of depression on smoking by meeting the relevance and exogeneity conditions.
### With `statsmodels`
```python
import pandas as pd
import statsmodels.api as sm
data = {'X': [1, 2, 3, 4, 5],
'Z': [2, 3, 4, 5, 6],
'Y': [3, 5, 7, 9, 11]}
df = pd.DataFrame(data)
# Add a constant term to the model
df['const'] = 1
# First-stage regression
model1 = sm.OLS(df['X'], df[['Z', 'const']]).fit()
# Predict X using the fitted model
X_hat = model1.predict()
# Second-stage regression
model2 = sm.OLS(df['Y'], sm.add_constant(X_hat)).fit()
print(model2.summary())
```
The coefficient for `x1` describes the causal effect $X$ has on $Y$ (here, the coefficient will be 2).
## References
1. [Introduction to instrumental variables and their application to large-scale assessment data](https://largescaleassessmentsineducation.springeropen.com/articles/10.1186/s40536-016-0018-2)
2. [Instrumental Variable: Definition & Overview - Statistics How To](https://www.statisticshowto.com/instrumental-variable/)
3. [Instrumental variables estimation - Wikipedia](https://en.wikipedia.org/wiki/Instrumental_variables_estimation)
4. [Harvard Ec1123 Section 7 - Instrumental Variables](https://scholar.harvard.edu/files/apassalacqua/files/section7_iv.pdf)