## Key Conditions for IVs [Instrumental Variables](Instrumental%20Variable.md) (IVs) must satisfy three key conditions: 1. **Instrument Relevance**: The instrument $Z$ must be correlated with the endogenous regressor $X$. This means there should be a non-zero correlation between $Z$ and $X$ ($\text{Cov}(Z, X) \neq 0$). This condition ensures that the instrument can explain some variation in the regressor. 2. **Instrument Exogeneity**: The instrument $Z$ must be uncorrelated with the error term $\varepsilon$ in the outcome equation $Y = \pi X + \varepsilon$ This implies that $Z$ affects the dependent variable $Y$ only through $X$, and not directly or through omitted variables ($\text{Cov}(Z, \varepsilon) = 0$). This ensures that any correlation between $Z$ and $Y$ is only due to the effect of $X$. 3. **No other latent instruments**: Especially ones that have a stronger correlation with $X$ than $Z$. Note, an instrumental variable (IV) is not a [[confounding variable]]. Instead, it is a separate variable used to address the issue of endogeneity in causal inference and does not affect both the independent variable and the response. If anything, lurking confounding variables might affect $Z$, $X$, and $Y$. ## Mesasuring the causal effect of $X$ on $Y$ To remove the effect of the instrument on $X$ and isolate the causal effect of $X$ on $Y$, the **Two-Stage Least Squares** (2SLS) method is commonly used: In Stage 1, $Z$ isolates the variation in  $X$  that is independent of confounders. In Stage 2, you estimate how this exogenous variation in  $X$  affects  $Y$, thus obtaining the causal effect of  $X$  on  $Y$ as the weight attached to $\hat{X}$. ### Isolate the exogenous variation in $X$ that is independent of confounders **Stage 1**: Regress  $X$  on  $Z$  (and any [control variables](Control%20Variable.md)  $W$): $X = \pi_1 Z + \pi_2 W + \varepsilon$ Here, the fitted values  $\hat{X}$  represent the variation in  $X$  that is explained by  $Z$  (i.e., the portion of  $X$  that is exogenous). ### Estimate the causal effect of $X$ on $Y$ **Stage 2**: Regress  $Y$  on the predicted $\hat{X}$  from Stage 1: $Y = \beta_1 \hat{X} + \beta_2 W + \nu$ The coefficient  $\beta_1$ provides the estimate of the causal effect of  $X$  on  $Y$. ## Example Here's a concrete example of how instrumental variables (IVs) work using the variables $X$, $Y$, and $Z$: - **$X$ (Explanatory Variable)**: Depression level. - **$Y$ (Outcome Variable)**: Smoking behavior. - **$Z$ (Instrumental Variable)**: Lack of job opportunities. In this scenario, you want to understand if depression ($X$) affects smoking ($Y$). However, there might be unobserved factors affecting both. The instrumental variable, lack of job opportunities ($Z$), is correlated with depression ($X$) but does not directly affect smoking ($Y$) except through its impact on depression. This makes $Z$ a valid instrument because it helps isolate the causal effect of depression on smoking by meeting the relevance and exogeneity conditions. ### With `statsmodels` ```python import pandas as pd import statsmodels.api as sm data = {'X': [1, 2, 3, 4, 5], 'Z': [2, 3, 4, 5, 6], 'Y': [3, 5, 7, 9, 11]} df = pd.DataFrame(data) # Add a constant term to the model df['const'] = 1 # First-stage regression model1 = sm.OLS(df['X'], df[['Z', 'const']]).fit() # Predict X using the fitted model X_hat = model1.predict() # Second-stage regression model2 = sm.OLS(df['Y'], sm.add_constant(X_hat)).fit() print(model2.summary()) ``` The coefficient for `x1` describes the causal effect $X$ has on $Y$ (here, the coefficient will be 2). ## References 1. [Introduction to instrumental variables and their application to large-scale assessment data](https://largescaleassessmentsineducation.springeropen.com/articles/10.1186/s40536-016-0018-2) 2. [Instrumental Variable: Definition & Overview - Statistics How To](https://www.statisticshowto.com/instrumental-variable/) 3. [Instrumental variables estimation - Wikipedia](https://en.wikipedia.org/wiki/Instrumental_variables_estimation) 4. [Harvard Ec1123 Section 7 - Instrumental Variables](https://scholar.harvard.edu/files/apassalacqua/files/section7_iv.pdf)