flowchart TD JEP[angrist2021a] AK[angrist1991] BJB[bound1995] Weak[keane2024] Card[card1993] LATE[angrist1996] Mogstad[mogstad2018] JEP --> AK --> BJB --> Weak AK --> Card Weak --> LATE Card --> Mogstad
Readings Guide
Concise introductions for PhD students (case-based, reproducible focus)
This guide accompanies the journal list in Readings & Text. Each entry summarizes the paper–s research question, IV logic, main takeaway, and reproducibility notes. Primary software for replication in this course is R; Stata is optional where public replication packages exist.
Cross-references use Quarto/Pandoc citation syntax with keys from bib/endogeneity.bib, for example (Joshua D. Angrist and Krueger 1991) (parenthetical) or Joshua D. Angrist and Krueger (1991) (in-text). After quarto render, a References section is appended automatically.
| Label | Meaning | Suggested use (12-hour module) |
|---|---|---|
| Tier 1 | Core applied IV + replication | Read + replicate |
| Tier 2 | Theory, diagnostics, or second application | Read carefully; replicate selected tables |
| Tier 3 | Advanced / extension | Skim for exams and thesis work |
Report in applied papers: first-stage \(F\) (or Kleibergen–Paap), partial \(R^2\), overidentification tests when \(L>K\), and a short identification memo defending exclusion.
Pedagogical order. Sections progress from applied natural experiments (A) and weak-IV diagnostics (B) to policy cases (C), then LATE / 2SLS interpretation (D), and finally structural extensions (E). Section D is placed after B–C so students meet replication and first-stage practice before complier-based theory.
1 A. Natural experiments and returns to schooling
1.1 Joshua D. Angrist (1990) — Vietnam draft lottery and lifetime earnings
Tier 2 | AER, 80(3):313–36 | PDF
Research question. Did Vietnam-era military service (induced by draft lottery) affect long-run earnings?
IV logic. Draft lottery number \(Z\) affects veteran status \(D\) but is as-good-as random; earnings \(Y\) are studied through the effect of \(D\) induced by \(Z\) (classic Wald/IV in a binary-treatment setting).
Takeaway. Early flagship “natural experiment” showing how administrative records + lottery variation identify labor-market effects of military service; bridges reduced-form and structural policy debates.
Commentary. Excellent motivator for exclusion restriction discussion: lottery should affect earnings only through service (debate direct channels carefully). Pairs well with Joshua D. Angrist and Krueger (2021) intellectual history.
Reproducibility. Public-use SSA/Census extracts used in follow-up work; no single canonical GitHub repo in this course; treat as conceptual case. R: AER package data when available; Stata: literature supplements.
1.2 Joshua D. Angrist and Krueger (1991) — Quarter of birth and compulsory schooling
Tier 1 | QJE, 106(4):979–1014 | PDF via 04-topics replication; Census microdata via winsup/angrist_krueger_1991
Abstract (from bib). Season of birth shifts schooling through school-entry age and compulsory-attendance laws; quarter of birth instruments education. IV return to schooling is close to OLS, suggesting limited ability bias in conventional estimates.
IV logic. \(Z\): quarter of birth → \(D\): years of schooling → \(Y\): wages. Policy interaction (compulsory laws × entry age) delivers exogenous schooling variation for compliers who would otherwise drop out early.
Takeaway. Canonical “natural experiment” for education returns; also the empirical target of the weak-IV critique in (Bound, Jaeger, and Baker 1995).
Commentary. Master case for this course. First-stage \(F\) with many quarter dummies is modest; always report weak-IV diagnostics alongside headline coefficients.
Reproducibility (R + Stata).
- Start here: AK91 replication guide (story + identification + table map)
- R: replicate-ak1991-R (Tables I–VIII)
- Stata: replicate-ak1991-stata
- Homework-style pipeline:
fixest::feols/ivreg+modelsummary
1.3 Joshua D. Angrist and Krueger (1992) — Two-sample IV (age at school entry)
Tier 3 | JASA, 87(418):328–36 | PDF
Research question. Effect of age at school entry on attainment when moments come from two different samples.
IV logic. Combines cohorts with different school-entry rules; uses two-sample IV when \(Z\) and \((D,Y)\) are not in the same dataset.
Takeaway. Extends the Joshua D. Angrist and Krueger (1991) toolkit to settings common in survey merge designs–relevant for PhD students linking administrative and survey data.
Reproducibility. Method paper; replicate Table 1 logic in R with ivreg and manual moment stacking if merged data are constructed.
1.4 Joshua D. Angrist and Krueger (1995) — Split-sample IV
Tier 3 | JBES, 13(2):225–35 | PDF
Research question. Can split samples improve IV estimation of returns to schooling?
IV logic. Uses half-sample estimates of first-stage and second-stage relationships to reduce finite-sample bias (precursor to jackknife IV).
Takeaway. Practical finite-sample correction in the same empirical line as Joshua D. Angrist and Krueger (1991); links to J. D. Angrist, IMBENSh, and Krueger’ (1999).
Reproducibility. Re-implement split-sample 2SLS on AK Census extracts in R (sample, ivreg).
1.5 Joshua D. Angrist and Krueger (2021) — IV: from supply–demand to natural experiments
Tier 1 | Journal of Economic Perspectives, 15(4):69–5 (see course PDF filename) | PDF
Research question. How did IV evolve from Wright–s markets to modern natural experiments?
IV logic. Non-technical map of identification problems: simultaneity, compliance, LATE, and quasi-experimental designs.
Takeaway. Best single pre-course reading for international students–historical narrative plus design intuition before math-heavy papers.
Commentary. Assign in Week 0; use to frame every later paper–s “what shock identifies what?≈ Reproducibility. No data; discussion paper.
1.6 Card (1993) — College proximity and returns to schooling
Tier 1 | NBER Working Paper (1993) | PDF
Abstract (from bib). Nearby college raises education especially for youth with less-educated parents; IV returns are 25–0% above OLS, implying high marginal returns for compliers with low socioeconomic status.
IV logic. \(Z\): geographic proximity to college; \(D\): education; heterogeneous first stage by family background tests exclusion.
Takeaway. Contrasts with Joshua D. Angrist and Krueger (1991): IV can exceed OLS when compliers are marginal, low-education groups with high causal returns.
Commentary. Stress heterogeneous first stage as evidence on whom the instrument moves–preview of LATE ideas in (Guido W. Imbens and Angrist 1994) (Section D).
Reproducibility (R). Start here: Card (1993) replication guide → replicate-card1993-R (nls.dat, Tables 1–5, Figure 1). Homework intro: iv-wage-card (Card1995.dta, fixest/ivreg).
1.7 Card (2001) — Survey: estimating returns to schooling
Tier 2 | Econometrica, 69(5):1127–160 | PDF
Abstract (from bib). Reviews IV strategies using compulsory schooling, school proximity, etc.; IV estimates often match or exceed OLS; comparative advantage explains why low-education compliers can have high marginal returns.
IV logic. Unifying econometric framework for OLS vs IV with institutional instruments.
Takeaway. Synthesis paper tying Joshua D. Angrist and Krueger (1991), Card (1993), and policy interpretation–assign after replicating one empirical case.
Reproducibility. Reproduce comparison tables using published summaries; optional: merge Card (1993) replication with Joshua D. Angrist and Krueger (1991) estimates in one Quarto table.
2 B. Weak instruments, finite samples, and inference
2.1 Bound, Jaeger, and Baker (1995) — Problems with weak IV
Tier 1 | JASA, 90(430) | PDF | Extended HTML: Bound et al. guide
Abstract (from bib). Weak instruments cause large inconsistencies (even slight \(Z\)—$ correlation) and finite-sample bias toward OLS; Joshua D. Angrist and Krueger (1991) may suffer despite huge \(N\); report partial \(R^2\) and first-stage \(F\).
Takeaway. Mandatory critique alongside AK replication–big data does not cure weak identification.
Reproducibility. Start here: Bound (1995) replication guide (after AK91 guide). R tables: replicate-bound1995-R. Session: session-iv-weak.
2.2 Staiger and Stock (1997) — Asymptotics with weak instruments
Tier 2 | Econometrica, 65(3):557–86 | PDF
Abstract (from bib). Develops weak-IV asymptotics; TSLS can mimic OLS; LIML with fewer instruments gives more plausible education returns in Joshua D. Angrist and Krueger (1991) (roughly 8–10% vs 6% TSLS).
Takeaway. Theoretical foundation for weak-IV robust confidence sets; motivates reporting LIML/CLR alongside 2SLS.
Reproducibility. R: ivmodel, weak-IV literature code; simulate weak-IV DGP in classroom lab.
2.3 Stock and Yogo (2002) — Testing for weak instruments
Tier 2 | SSRN working paper | PDF
Abstract (from bib). Defines weak IV via maximum bias/size distortion; tabulates critical values for Cragg–Donald / first-stage \(F\) (including \(F>10\) rule).
Takeaway. Practical benchmark still used in applied work–know its limits ((Keane and Neal 2024)).
Reproducibility. R: ivreg diagnostics, weakiv packages; Stata: estat firststage, ivreg2.
2.4 Flores-Lagunes (2007) — Finite-sample comparison of IV estimators
Tier 3 | Journal of Applied Econometrics, 22(3):677–94 | PDF
Abstract (from bib). Simulation evidence: RE-QMLE and bias-corrected LIML perform well under weak IV; reliable point estimates remain difficult.
Takeaway. Supports using LIML/JIVE variants when \(F\) is borderline–complements Staiger and Stock (1997).
Reproducibility. Re-run Monte Carlo in R (simstudy); good PhD exercise.
2.5 Olea and Pflueger (2013) — Robust weak-IV test
Tier 2 | JBES, 31(3):358–69 | PDF
Research question. Valid weak-instrument testing under heteroskedasticity.
Takeaway. Preferred \(F\)-type diagnostic when errors are not i.i.d. homoskedastic–report alongside Stock and Yogo (2002).
Reproducibility. Stata: ivreg2; R: ivpack / authors’ code; apply on AK first stage with robust SE.
2.6 Lee et al. (2020) — Valid t-ratio inference for IV
Tier 2 | arXiv:2010.05058 | PDF
Abstract (from bib). Conventional \(F>10\) still anti-conservative; true 5% test may need \(F>104\) or higher critical values; tF procedure; reanalysis of AER papers overturns many significant results.
Takeaway. Sharpens weak-IV reporting standards–use with Keane and Neal (2024) in applied seminars.
Reproducibility. Authors’ replication files on arXiv; R reproduction as advanced homework.
2.7 Keane and Neal (2024) — Practical guide to weak instruments
Tier 1 | Annual Review of Economics, 16:185–12 | PDF
Abstract (from bib). Survey for practitioners: \(F>10\) is insufficient; 2SLS \(t\)-tests over-reject when 2SLS \(\approx\) OLS; prefer Anderson–Rubin tests; advocate stricter strength standards.
Takeaway. Current best-practice checklist for PhD applied papers–assign with Bound, Jaeger, and Baker (1995) and Joshua D. Angrist and Krueger (1991) replication.
Reproducibility. Replicate AR test on Joshua D. Angrist and Krueger (1991) data in R (ivmodel); compare \(t\) vs AR conclusions.
2.8 J. Angrist and Kolesár (2022) — Just-identified IV bias and coverage
Tier 2 | Journal of Econometrics, 240(2):105398 | PDF
Abstract (from bib). With one instrument, conventional IV inference is often reliable in micro applications; first-stage sign screening reduces median bias without harming coverage.
Takeaway. Counterpoint to “always panic about weak IV”context-dependent; still report first stage transparently.
Commentary. Debate paper for classroom: Bound, Jaeger, and Baker (1995) / Keane and Neal (2024) vs J. Angrist and Kolesár (2022) on reporting norms.
Reproducibility. Simulation notebooks from paper; replicate bias plots in R ggplot2.
2.9 Kitagawa (2015) — Test for instrument validity
Tier 3 | Econometrica, 83(5):2043–063 | PDF
Abstract (from bib). Tests exclusion using implications for complier outcome densities (binary treatment, discrete IV); KS-type statistic; extensions with covariates.
Takeaway. Rare falsification test for exclusion–useful when complier densities are identified.
Reproducibility. Author code; advanced R implementation for thesis students.
3 C. Education policy and institutional IV
3.1 Abdulkadiroğlu et al. (2016) — Charters without lotteries
Tier 2 | AER, 106(7):1878–920 | PDF
Abstract (from bib). — Grandfathering → IV for charter takeovers in New Orleans/Boston; large achievement gains; Boston turnaround with similar design also works.
IV logic. Students at takeover-eligible schools vs matched controls–IV when lottery randomization is unavailable.
Takeaway. Shows creative IV when RCT is infeasible”discuss threat to exclusion (anticipation, migration).
Reproducibility. School-level administrative data per paper; Stata replication if available from authors; R for reduced-form reproduction of reported tables.
3.2 Joshua D. Angrist, Pathak, and Walters (2013) — Explaining charter school effectiveness
Tier 3 | AEJ: Applied, 5(4):1–7 | PDF
Research question. Why do Boston charters raise achievement–peer effects, discipline, instruction time?
IV logic. Lottery-based charter offers as instrument for attendance; mechanism decomposition.
Takeaway. Moves from “does it work?” to mechanisms”template for IV + structural interpretation in education.
Reproducibility. Lottery data (restricted); read for design; compare with Abdulkadiroğlu et al. (2016) IV without lottery.
4 D. Causal interpretation: compliance, LATE, and 2SLS
4.1 Abadie, Angrist, and Imbens (2002) — IV quantile treatment effects (JTPA)
Tier 3 | Econometrica, 70(1):91–17 | PDF
Abstract (from bib). Estimates distributional effects of JTPA training on earnings quantiles using a new IV approach (QTE). With exogenous assignment, QTE reduces to quantile regression; with non-random assignment, QTE is computed via convex programming with nonparametric first-step theory.
IV logic. Random assignment to JTPA is the instrument; training uptake is endogenous. Effects vary across quantiles (heterogeneous treatment effects).
Takeaway. IV need not target only the mean: subsidized training raised low quantiles for women but, for men, gains appear mainly in the upper half of the earnings distribution–policy conclusions differ from average-effect summaries.
Commentary. Foundational for distributional causal inference; read after Guido W. Imbens and Angrist (1994) and Joshua D. Angrist, Imbens, and Rubin (1996).
Reproducibility. No standard course replication pack; extension in R: quantreg, custom IV-quantile code.
4.2 Abadie (2002) — Bootstrap tests for distributional effects in IV
Tier 3 | JASA, 97(457):284–92 | PDF
Research question. How to test distributional treatment effects under IV?
Takeaway. Inference companion to Abadie, Angrist, and Imbens (2002); bootstrap over identified sets for outcome distributions.
Reproducibility. Methods paper; implement in R with boot package on simulated IV designs.
4.3 Abadie (2003) — Semiparametric IV
Tier 3 | Journal of Econometrics, 113(2):231–63 | PDF
Abstract (from bib). Nonparametric identification of treatment response with covariates; allows flexible approximation under misspecification; illustrated with 401(k) savings.
Takeaway. IV with nonlinear/binary outcomes without imposing homogeneous effects; technical–reference for thesis chapter on semiparametric methods.
Reproducibility. No course lab; see Abadie–s 401(k) replication literature separately.
4.4 Joshua D. Angrist, Imbens, and Rubin (1996) — IV in the Rubin causal model
Tier 2 | JASA, 91(434):444–55 | PDF
Abstract (from bib). With imperfect compliance, IV estimates equal average causal effects for compliers under monotonicity and independence; otherwise IV is a ratio of intention-to-treat effects without causal average interpretation.
IV logic. Potential outcomes \(\{Y(1),Y(0)\}\), treatment \(D(Z)\), instrument \(Z\); links Wald estimand to complier average causal effect.
Takeaway. Bridge paper between statistics (Rubin) and econometrics (2SLS)“required for understanding LATE.
Commentary. Read together with Guido W. Imbens and Angrist (1994) and G. W. Imbens and Rubin (1997).
Reproducibility. Pen-and-paper + simulation of compliance table in R (tibble of principal strata).
4.5 G. W. Imbens and Rubin (1997) — Outcome distributions for compliers
Tier 2 | Review of Economic Studies, 64(4):555–74 | PDF
Research question. What is the full outcome distribution for compliers, not just the mean?
IV logic. Bayesian/frequentist framework for complier distributions under IV assumptions.
Takeaway. Extends LATE ((Guido W. Imbens and Angrist 1994)) to distributional policy analysis (who gains in the tail of \(Y\)?).
Reproducibility. Advanced; pair with Abadie, Angrist, and Imbens (2002) if thesis focuses on distributions.
4.6 J. D. Angrist, IMBENSh, and Krueger’ (1999) — Jackknife IV
Tier 3 | Journal of Applied Econometrics, 14:57–7 | PDF
Abstract (from bib). Proposes jackknife IV (JIVE) alternatives to 2SLS/LIML when many instruments; better bias/coverage with many weak-ish instruments.
Takeaway. When quarter-of-birth instruments are numerous, JIVE/LIML may beat 2SLS in finite samples–connects to Staiger and Stock (1997) (Section B).
Reproducibility. R: package AER (jive where available) or manual leave-one-out first stage on AK data.
4.7 Mogstad, Torgovitsky, and Walters (2021) — Multiple IVs and 2SLS interpretation
Tier 2 | AER, 111(11):3663–698 | PDF
Abstract (from bib). With multiple IVs, standard monotonicity is effectively too strong; under partial monotonicity, gives verifiable conditions for 2SLS to be a positive weighted average of LATEs; college-return application.
Takeaway. Modern reading for anyone stacking instruments (distance, tuition, parents)“always check monotonicity empirically.
Commentary. Discussed in session-iv-causality. Read after Section B (weak IV) and Card (1993).
Reproducibility. Full text and R replication: Card (1993) replication guide; empirical checks reproducible with college-return data (see paper appendix).
4.8 Mogstad and Torgovitsky (2018) — Extrapolation beyond compliers
Tier 2 | Annual Review of Economics, 10:577–13 | PDF
Abstract (from bib). IV estimates apply to instrument-induced compliers; policy targets often differ–survey of partial identification and extrapolation tools.
Takeaway. Answers “does my LATE matter for the policy I care about?”essential for thesis policy chapters.
Reproducibility. R package ivmte implements MTE bounds and extrapolation (course note in text-book).
4.9 Mogstad, Santos, and Torgovitsky (2018) — Policy-relevant treatment parameters
Tier 3 | Econometrica, 86(5):1589–619 | PDF
Abstract (from bib). Uses identified IV estimands to bound/policy-relevant parameters via weights on marginal treatment effects; incorporates shape restrictions.
Takeaway. Operationalizes extrapolation from IV to target populations beyond compliers; complements Mogstad and Torgovitsky (2018).
Reproducibility. ivmte package; replicate bounds on course machine with package vignettes.
5 E. Structural treatment effects and IV–DiD hybrids
5.1 Heckman and Vytlacil (2022) — Structural equations and policy evaluation
Tier 3 | Econometrica, 73(3):669–38 (PDF filename says 2022; published 2005) | PDF
Research question. Unify treatment-effect parameters (ATE, LATE, MTE) within latent-index selection models.
IV logic. Local instrumental variables (LIV) recover MTE and integrate IV with structural policy simulation ((Heckman and Vytlacil 2000) working-paper precursor).
Takeaway. Bridge from reduced-form IV to structural policy counterfactuals”for students continuing to SEM (Ch. 18–0).
Reproducibility. Technical; link to ivmte ((Mogstad and Torgovitsky 2018)) and Heckman selection code in specialized courses.
5.2 Hudson, Hull, and Liebersohn (2017) — Instrumented difference-in-differences
Tier 3 | MIT working paper | PDF
Research question. How should IV interact with DiD when policy adoption is endogenous?
IV logic. Clarifies estimands when instruments shift treatment in panel/event-study settings–avoids confusing 2SLS–DiD hybrids.
Takeaway. Increasingly relevant for PhD students combining event studies + IV; read before stacking DiD and 2SLS blindly.
Reproducibility. Working-paper replication files if posted; implement toy panel IV-DiD in R (fixest event study).
6 F. Suggested reading sequence (12 hours, R-first)
Follows the section order A → B → C → D (Section E is optional extension).
| Session (3h) | Sections | Read | Replicate in R |
|---|---|---|---|
| 1 | A | Joshua D. Angrist and Krueger (2021); Joshua D. Angrist (1990) or Card (2001) skim | — |
| 2 | A | Joshua D. Angrist and Krueger (1991) | AK91 guide → replicate-ak1991-R |
| 3 | B | Bound, Jaeger, and Baker (1995); Keane and Neal (2024); Staiger and Stock (1997) skim | Bound guide → first-stage \(F\), Table 3 pilot |
| 4 | C + D | Card (1993); Joshua D. Angrist, Imbens, and Rubin (1996); Mogstad and Torgovitsky (2018) or Mogstad, Torgovitsky, and Walters (2021) skim | Card guide → replicate-card1993-R; iv-wage-card |
After the 12-hour core: Section E ((Heckman and Vytlacil 2022), Hudson, Hull, and Liebersohn (2017)) for SEM / IV–DiD; remaining Tier 3 papers in D as thesis references.
Stata (optional): replicate-ak1991-stata for Joshua D. Angrist and Krueger (1991); compare coefficients to R in a single tables_compare.csv (planned course extension).
7 G. Quick reference: identification checklist per paper
| Paper | Endogeneity source | Instrument \(Z\) | Treatment \(D\) | Outcome \(Y\) | Main threat to exclusion |
|---|---|---|---|---|---|
| Joshua D. Angrist and Krueger (1991) | Ability in earnings equation | Quarter of birth | Schooling | Wages | Season of birth → earnings directly |
| Card (1993) | Ability/family | College proximity | Education | Log wage | Local labor demand / unobserved local factors |
| Joshua D. Angrist (1990) | Selection into service | Draft lottery | Veteran status | Earnings | War-era macro shocks |
| Bound, Jaeger, and Baker (1995) | (Meta) weak \(Z\) | — | → | — | \(Z\) correlated with structural error |
| Abdulkadiroğlu et al. (2016) | School choice | Grandfathering eligibility | Charter attendance | Achievement | Sorting into takeover zones |
| Mogstad, Torgovitsky, and Walters (2021) | College choice | Multiple college-cost shifters | College attendance | Earnings | Heterogeneous monotonicity |
Additional course handout: summary-on-IV. References cited above are drawn from bib/endogeneity.bib and appear in the bibliography below after quarto render.