flowchart LR Laws[Compulsory laws and entry age] Z[QOB] D[EDUC] Y[ln weekly wage] Laws --> Z Z --> D D --> Y
Replication Guide
From compulsory schooling to quarter-of-birth IV
Prerequisites. Endogeneity, 2SLS, and first-stage logic from course slides (Ch.17-20, RevealJS). Tier-1 context: Readings Guide.
Software. R with renv::restore(); microdata: stata/raw_data.dta (same file as the public winsup/angrist_krueger_1991 package).
Four documents — one workflow.
| Step | Document | Your job |
|---|---|---|
| 1 | This guide | Follow the story; understand \(Z\), \(D\), \(Y\) and each table’s role |
| 2 | Paper (full text) | Read the sections cited in each Act |
| 3 | R replication output | Compare your numbers to pre-built tables (Tables I–VIII) |
| 4 | Rcode/Table_*.R |
Run and modify the full scripts |
This guide is narrative-first (R only). Complete replication code lives in Rcode/; Stata logs are optional elsewhere on the site.
Overview: The natural experiment in one page
Angrist and Krueger (1991) ask whether compulsory school attendance laws raise schooling and earnings. They build a chain of evidence:
- Season of birth predicts education in Census data (Section I).
- That pattern lines up with school-entry age and compulsory-attendance rules — not with post-secondary margins (Tables I–II).
- Quarter of birth instruments schooling when estimating the return to education; IV ≈ OLS (Section II).
Policy puzzle. Every developed country mandates schooling, yet the causal effect on attainment and wages is hard to identify because ability and family background correlate with both \(D\) (schooling) and \(Y\) (wages).
Natural experiment. Children born in different calendar quarters start school at different ages but face the same legal dropout birthday. Compulsory laws bind more for late-year births → exogenous variation in \(D\) from \(Z\) = quarter of birth (QOB).
Course alignment.
| Topic | Where it appears in AK91 |
|---|---|
| Endogeneity / omitted ability | Motivation; IV vs OLS comparison (Tables IV–VI) |
| Instrument relevance | Table I; first stage with \(YQ\) interactions |
| Exclusion restriction | Section III; college-graduate placebo |
| LATE / compliers | Compulsory-schooling margin; late-year births |
| Weak IV (preview) | Many \(YQ\) instruments; see (Bound, Jaeger, and Baker 1995) and Weak IV session |
Data map (replication sample).
| Object | Meaning |
|---|---|
raw_data.dta |
Census extracts; men born 1920–1949 |
YOB, QOB |
Year and quarter of birth (recoded in Table_I.R) |
EDUC |
Years of schooling |
LWKLYWGE |
Log weekly wage |
COHORT |
Sample splits (e.g. 3039 = 1930–1939) |
YQ*** |
Year-of-birth quarter-of-birth dummies (IV set for Tables IV–VIII) |
Read first: Paper (HTML) · PDF
Suggested path (~1–2 hours). Overview → Act I → Act III → Act IV → Act V → Act VI → Capstone → Appendix map.
Act I — Does season of birth predict education?
Story beat
Section I establishes reduced-form / first-stage facts: QOB is related to schooling before any wage IV. This is not yet a return-to-schooling estimate — it motivates that compulsory schooling is a plausible mechanism.
Read in the paper
- Section I: season of birth, detrending, Table I
- Figures I–IV and moving average \(MA_{c,j}\)
Identification / econometrics
Mechanism (words). Early-calendar births enter school older, reach the legal dropout age sooner, andif a fixed share of students would drop out if allowedcomplete less schooling on average.
Detrending. For cohort \(c\), quarter \(j\):
\[ MA_{c j} = \frac{E_{-2}+E_{-1}+E_{+1}+E_{+2}}{4}, \qquad (E_{icj} - MA_{cj}) = \alpha + \sum_{j=1}^{3} \beta_j Q_{icj} + \epsilon_{icj} \]
- \(E\): education outcome (years, high-school graduate, etc.)
- \(Q\): quarter-of-birth dummies (Q4 omitted)
- Not IV here: OLS on detrended outcomes shows the seasonal pattern.
Replicate
- Script:
Rcode/Table_I.R - Output: Table I in R replication
Compute \(MA\) on the full year-of-birth × quarter (YQ) sequence within each cohort (1930–1939 and 1940–1949), then take lags/leads. Do not compute moving averages only on quarters that survive after dropping edge quarters for the regression. See the detailed note on the replication page.
Sketch (do not shortcut the full script):
library(tidyverse)
library(haven)
# After recoding YOB, QOB, YQ* as in Table_I.R:
# 1) collapse to cohort-quarter means of EDUC (and other outcomes)
# 2) MA = mean of EDUC at t-2, t-1, t+1, t+2 quarters on the full YQ grid
# 3) merge MA back; run lm(detrended ~ QTR1 + QTR2 + QTR3)Check your work
- Cohort 1930–1939, total years of education: Q1 effect about −0.124 (SE ≈0.017); Q4 is the omitted category.
- High-school graduate rate: Q1 vs Q4 gap about 1.9 percentage points (1930s cohort).
- \(F\)-tests for joint quarter effects: large and significant for basic attainment rows.
Think
- Why is the seasonal pattern weak or absent for master’s and doctoral degrees, but strong for high-school completion?
- If compulsory laws bind only through the high-school margin, what should happen to college graduation rates (Table I, bottom panel)?
Act II — Do compulsory laws bind?
Story beat
Table II shifts from individual Census regressions to enrollment rates by birthday window and state dropout age (16 vs 17–18). It supports the claim that laws force some students to stay — not just that birthdays correlate with unobservables.
Read in the paper
- Section I.A: “Direct Evidence on the Effect of Compulsory Schooling Laws”
- Table II (enrollment April 1, 1960 / 1970)
Identification / econometrics
Difference in enrollment between school-leaving age 16 states and 17–18 states, for birthdays just above vs below the cutoff. Still not the wage IV — this is mechanism validation.
Replicate
- Script:
Rcode/Table_II.R - Output: Table II
The authors’ exact enrollment cells are not fully in raw_data.dta. Replication uses appendix2_table.dta to approximate Table II; some columns (e.g. last-column differences) are hand-checked. Treat this table as design literacy, not a digit-perfect target.
Check your work
- Understand columns (1) and (2): states grouped by legal dropout age.
- For the 1944 cohort (age 16 on April 1, 1960), enrollment is higher in stricter-law states for Jan–Mar birthsthe gap that identifies “laws bind.”
- Do not expect every cell to match the paper if your constructed sample differs.
Think
- How does Table II reduce skepticism that QOB is proxying family background?
- Why might the law effect shrink for the 1954 and 1964 cohorts (paper’s Table II narrative)?
Act III — A first IV: Wald from one instrument
Story beat
Section II opens the returns question. With a single instrument (QTR1), Wald/IV = ratio of reduced-form to first-stage slopes — preview of 2SLS before the full \(YQ\) instrument set.
Read in the paper
- Section II opening and Table III
- Equations (3)–(4) in the Stata replication summary (same notation)
Identification / econometrics
\[ \begin{align} \ln W_i &= \alpha_1 + \beta_1 Q_{1i} + \epsilon_{1i} \\ EDUC_i = \alpha_2 + \beta_2 Q_{1i} + \epsilon_{2i} \end{align} \]
Wald estimand (one \(Z\)): \(\hat\beta_1 / \hat\beta_2\) = effect of education on \(\ln W\) for compliers induced by \(Q_1\).
With one instrument and one endogenous regressor, Wald = 2SLS.
Replicate
- Script:
Rcode/Table_III.R - Output: Table III
Point estimates should track the paper and Stata closely. Standard errors for the Wald ratio may differ between R (systemfit + delta method) and Stata (sureg + nlcom). Use this as a lesson: reproducibility includes inferential details, not only coefficients. See the worked comparison in replicate-ak1991-R.
# Stata reference:
# sureg (eq1: LWKLYWGE QTR1) (eq2: EDUC QTR1) if COHORT==3039
# nlcom ratio: [eq1]_b[QTR1]/[eq2]_b[QTR1]Check your work
- 1970 Census, cohort 1930–39: Wald return roughly 7–10% per year of schooling; OLS in the same ballpark.
- 1980 Census, cohort 1930–39: Wald can be somewhat larger than OLS (paper discusses specification).
- Signs: \(Q_1\) associated with lower education and lower wages → positive return when taking the ratio.
Think
- Why use SUR for the Wald SE instead of two independent regressions?
- If \(\beta_2\) were near zero, what would happen to the Wald ratio (finite-sample / weak-IV intuition)?
Act IV — Core 2SLS: quarter × year interactions
Story beat
The centerpiece of Angrist and Krueger (1991): instrument \(EDUC\) with 30 excluded \(YQ\) interactions (quarter × year of birth), with cohort dummies and Mincer controls. Tables IV–VI repeat the design for different cohorts and Census years.
Read in the paper
- Section II.A: TSLS estimates
- Section III: other effects of season of birth (exclusion)
Identification / econometrics
First stage:
\[ EDUC_i = X_i'\pi + \sum_c Y_{ic}\delta_c + \sum_c\sum_{j=1}^{3} Y_{ic} Q_{ij}\theta_{jc} + \nu_i \]
Second stage:
\[ \ln W_i = X_i'\beta + \sum_c Y_{ic}\xi_c + \rho\, EDUC_i + u_i \]
| Symbol | Role |
|---|---|
| \(Y_{ic}\) | Year-of-birth dummies (cohort effects) |
| \(Q_{ij}\) | Quarter dummies interacted with cohort |
| \(X_i\) | Age, age\(^2\), region, SMSA, marital status, race |
| \(\rho\) | Return to education (target) |
Exclusion: QOB affects wages only through schooling for compliers on the compulsory margin; Section III tests college graduates (not compelled by compulsory laws).
Replicate
| Table | Sample (paper) | Script | Replication anchor |
|---|---|---|---|
| IV | Men born 1920–49, 1970 Census | Table_IV.R |
tab-IV |
| V | Men born 1930–39, 1980 Census | Table_V.R |
tab-V |
| VI | Men born 1940–49, 1980 Census | Table_VI.R |
tab-VI |
Implementation pattern (all three): build YQ* dummies → ivreg with increasing control sets (four OLS + four TSLS columns). Changing cohort = change filter() on YOB / COHORT and Census flag.
library(ivreg)
# formula pattern (Table_IV.R):
# ivreg(LWKLYWGE ~ EDUC + controls + YR* | YQ* + controls + YR*, data = ...)Check your work
- TSLS \(\hat\rho\) typically ~6–10% per year of schooling depending on cohort/specification.
- Paper’s headline: IV and OLS are statistically similar for baseline specs — interpret as limited upward ability bias in OLS.
- First stage: joint significance of \(YQ\) instruments (inspect in script output).
Think
- If omitted ability biases OLS upward, should IV be below or above OLS? Does AK91 support the standard ability-bias story?
- Why include year-of-birth dummies in both stages when instruments are \(YQ\) interactions?
Act V — More instruments and heterogeneity
Story beat
Table VII adds state-of-birth interactions — more instruments, tighter SEs, and a wider OLS–IV gap (high-school completion channel). Table VIII estimates returns for Black men; smaller returns and different state coverage matter for replication.
Read in the paper
- Section II.B: state interactions
- Section II.C: Black men
Identification / econometrics
- Table VII: Overidentified IV with state × \(YQ\) structure; efficiency vs exclusion burden.
- Table VIII: Heterogeneous \(\rho\); same IV logic on a restricted sample.
Replicate
Replication literacy
| Issue | What to know |
|---|---|
| 50 vs 51 states | Public extract may omit one state after filters (e.g. race subsample). State dummies must match the estimation sample. |
| Table VIII vs some Stata replicas | Course R code matches the published paper; an alternative Stata table online disagrees — likely state-dummy construction. Read the note under Table VIII. |
| Table II | Approximate inputs (Act II). |
| Table III SE | R vs Stata nlcom (Act III). |
| Table I MA | Full \(YQ\) grid (Act I). |
Treat discrepancies as evidence about reproducibility, not failure to read the paper.
Check your work
- Table VII: TSLS often above OLS in later columns; authors attribute part of the gap to high-school completion effects.
- Table VIII: IV returns for Black men roughly 4–5% (smaller than white samples in the paper).
Think
- When you add state × \(YQ\) instruments, what threat are you addressing — and what new exclusion worry appears?
- Why might returns differ by race in this 1970/1980 cross section (policy, school quality, labor market discrimination)?
Act VI — Threats, weak IV, and modern reading
Story beat
Section III asks whether QOB affects earnings directly (violating exclusion). The paper finds little evidence among college graduatesa placebo group not compelled by compulsory laws. The course then links AK91 to the weak-IV literature: huge \(N\) does not guarantee strong instruments when there are many weak \(Z\)’s.
Read in the paper
- Section III: other possible effects of season of birth
- Section IV: conclusion
Course follow-up (no new code required here)
- Read (Bound, Jaeger, and Baker 1995) and the Weak IV session.
- On the AK91 microdata, report:
- First-stage \(F\) (or Kleibergen–Paap if you cluster)
- Partial \(R^2\) for excluded instruments
- Compare TSLS to LIML or weak-IV-robust tests (extension)
- Map the readings-guide identification row: ability bias vs QOB IV.
Think
- List one channel through which QOB could violate exclusion (health, tax season, family timing) — does Table I’s college/PhD pattern help rule it out?
- If instruments are weak, should TSLS be closer to OLS or to zero? How does that interact with AK’s “IV ≈ OLS” headline?
Capstone — Identification memo
Write ½–1 page in English (bullet prose is fine). Use this rubric:
Policy vs estimand What policy question do compulsory laws answer? What is the estimand (LATE for whom — compliers on the compulsory margin)?
Relevance Cite Table I (and optionally Table II): how strongly does QOB predict \(EDUC\)?
Exclusion Argue why QOB should not enter the wage equation except through \(EDUC\). Cite Section III / college graduates.
Main result One sentence on IV vs OLS from Tables IV–VI.
Replication critique One issue you verified from this guide (MA, Table II data, Wald SE, or Table VIII states).
Deliverable (suggested). PDF memo + log showing you ran at least Table_I.R and Table_IV.R.
Appendix — Replication map and troubleshooting
| Table | Paper section | R script | Known issue |
|---|---|---|---|
| I | Section I | Table_I.R |
MA on full \(YQ\) grid |
| II | Section I.A | Table_II.R |
Approximate data |
| III | Section II (Wald) | Table_III.R |
Wald SE R vs Stata |
| IV | Section II.A | Table_IV.R |
ivreg specs |
| V | Section II.A | Table_V.R |
Cohort filter |
| VI | Section II.A | Table_VI.R |
Cohort filter |
| VII | Section II.B | Table_VII.R |
Many instruments |
| VIII | Section II.C | Table_VIII.R |
State dummies / \(N\) states |
Render this page locally:
quarto render 04-topics/rep-ak1991/replication-ak1991-guide.qmdMinimal data peek (optional):
library(haven)
library(here)
d <- read_dta(here("04-topics/rep-ak1991/stata/raw_data.dta"))
# Rename v* as in Table_I.R before analysisExternal resources
- (Cunningham 2021), Mixtape Ch.7 (IV narrative)
- (Angrist and Pischke 2015), Mastering Metrics, quarter-of-birth chapter
- R replication tables Paper HTML