library(ivreg)
# After bound1995-data-prep.R:
# m_iv <- ivreg(LWKLYWGE ~ EDUC + controls | QOB_YOB_dummies + controls, data = d)
# m_fs <- lm(EDUC ~ QOB_YOB_dummies + controls, data = d)
# anova(lm(EDUC ~ controls, d), m_fs) # F on excluded instrumentsReplication Guide
Weak instruments and the AK91 re-examination
Prerequisites (required). Complete the AK91 replication guide first — at least the QOB IV logic and AK91 Table V / IV specs. Tier-1 context: Readings Guide.
Software. R with renv::restore(); microdata: Rcode/raw_data.dta (same Census extract as AK91).
Four documents — one workflow.
| Step | Document | Your job |
|---|---|---|
| 1 | This guide | Understand why weak IV undermines AK91’s second stage |
| 2 | Paper (full text) | Read Sections 2–4 cited in each Act |
| 3 | R replication output | Compare coefficients and diagnostics to Tables 1–3 |
| 4 | Rcode/*.R |
Run and modify the modular scripts |
This guide is narrative-first (R only). Table 3 Monte Carlo (500 replications) takes several hours — design and interpretation here; run commands on the replication page.
1 Overview: From AK91 headline to weak-IV skepticism
Angrist and Krueger (1991) report that quarter-of-birth IV estimates of the return to schooling are close to OLS on Census microdata — a celebrated result. Bound, Jaeger, and Baker (1995) re-examine the same empirical setting and argue that conclusion is fragile:
- Inconsistency: If instruments are weak, even tiny correlation between \(Z\) and structural errors can produce large bias (Section 2.1).
- Finite-sample bias: Weak IV pushes 2SLS toward OLS — so “IV ≈ OLS” may reflect bias, not absence of ability bias (Section 2.2).
- Simulated instruments: Randomly permuting quarter of birth yields plausible-looking second stages while first-stage \(F \approx 1\) (Table 3).
Same data, different question.
| Paper | Question | Key takeaway |
|---|---|---|
| AK91 Table V | What is the causal return to schooling? | IV ≈ OLS, ~6–8% |
| Bound Table 1 col. (6) | Is IV inference credible given first-stage strength? | \(\hat\beta_{EDUC} \approx 0.060\), first-stage \(F \approx 1.61\) |
| Bound Table 3 | What if instruments carry no information? | Mean \(\hat\beta \approx 0.06\), mean \(F \approx 1\) |
Sample (all Bound tables): 5% Public-Use Sample, 1980 Census; men born 1930–1939 (\(N = 329{,}509\)) — AK91 Table V cohort.
flowchart LR
AK91[AK91 IV headline]
FS[First stage F and partial R2]
SS[Second stage EDUC coef]
Sim[Table 3 permuted QOB]
AK91 --> FS
FS --> SS
FS --> Sim
Sim --> Lesson[Report F before trusting SS]
R module map.
| File | Role |
|---|---|
bound1995-data-prep.R |
Load data; QOB/YOB/state dummies |
bound1995-iv-diagnostics.R |
First-stage \(F\), partial \(R^2\), Basmann overid \(F\) |
bound1995-chunked-iv.R |
Large IV sets (Table 2) |
Table_I.R |
Bound Table 1 |
Table_II.R |
Bound Table 2 |
Table_III.R |
Bound Table 3 simulation |
Read first: Bound et al. (1995) HTML · PDF
Suggested path (~2 hours, after AK91). Overview → Act I → Act II → Act IV (read + pilot simulation) → Act V → Capstone.
2 Act I — Two problems with weak instruments
2.1 Story beat
Section 2 states the general IV pathology before Section 3 applies it to AK91. Two distinct problems matter for teaching:
- Population inconsistency when \(Z\) is weak and slightly invalid.
- Finite-sample bias when \(Z\) is weak even if exclusion holds exactly.
2.2 Read in the paper
- Section 2.1 — Inconsistency
- Section 2.2 — Finite-sample bias
- Section 3.1 (AK91 application of inconsistency)
2.3 Identification / econometrics
Inconsistency (intuition). When \(plim(\hat\pi) = 0\) (no first-stage relevance), IV solves a nearly singular problem; small \(Cov(Z, \epsilon)\) in the second stage is amplified.
Finite-sample bias (intuition). With weak instruments, 2SLS behaves like a weighted average of OLS and IV; as \(F \downarrow\), the weight on OLS rises. If OLS is upward biased (ability), weak IV can make 2SLS look like OLS for the wrong reason.
Link to AK91. AK91 emphasize IV ≈ OLS as evidence against large ability bias. Bound reinterpret: with \(F \approx 1.6\), IV ≈ OLS is expected under weak-IV bias, not proof of a valid causal design.
2.4 Replicate
No standalone table — this act frames Table 1.
2.5 Check your work
- State in one sentence why large \(N\) does not fix weak identification.
- Contrast exclusion violation vs weak relevance as distinct threats.
2.6 Think
- If true IV > OLS (positive ability bias), what does weak-IV bias toward OLS do to the IV estimate?
- AK91 Act VI previewed weak IV — what new evidence does Bound add beyond quoting a low \(F\)?
3 Act II — Table 1: AK91 Table V with diagnostics
3.1 Story beat
Section 3.2 replicates AK91 Table V, columns (5)–(8) but adds first-stage \(F\), partial \(R^2\), and Basmann overidentification \(F\) for every IV column. The pedagogical punchline: as the specification matches AK91 more closely, \(F\) collapses.
3.2 Read in the paper
- Section 3.2 and embedded Table 1
- Compare to AK91 Section II.A / AK91 Table V
3.3 Identification / econometrics
Bound Table 1 columns (3)–(6) map to AK91 Table V (5)–(8). Simpler columns (1)–(2) use only three quarter dummies as instruments.
| Bound col. | AK91 Table V col. | Controls | Excluded instruments | Paper \(F\) (approx.) |
|---|---|---|---|---|
| (1)–(2) | — (simpler) | Age, age\(^2\), demos | 3 QOB dummies | ~13.5 |
| (3)≈4) | (5)≈6) | YOB dummies, demos | QOB × YOB | ~4.7 |
| (5)≈6) | (7)–(8) | YOB, age in quarters, demos | QOB × YOB | ~1.6 |
Partial \(R^2\): share of \(EDUC\) variation explained by excluded instruments after controls. It is tiny in col. (6) — consistent with a weak first stage.
3.4 Replicate
- Script:
Rcode/Table_I.R - Output: Table 1
Col. (6) IV \(\hat\beta_{EDUC} \approx 0.060\) is below AK91’s headline IV (~8%). Bound’s point is not “returns are 6%” — it is that with \(F \approx 1.6\), 2SLS is pulled toward OLS and standard second-stage inference is unreliable.
3.5 Check your work
- Col. (6) IV: \(\hat\beta_{EDUC} \approx 0.060\) (SE ≈0.036); first-stage \(F \approx 1.61\); partial \(R^2 \approx 0.0003\).
- \(F\) falls moving from cols. (2) → (4) → (6) — monotone loss of first-stage strength as the AK91-style spec is added.
- Compare
replicationvspaper_targetsprinted whenTable_I.Rfinishes.
3.6 Think
- Col. (6) second-stage \(t\)-stat may still look “significant.” Why is that misleading when \(F \approx 1.6\)?
- Which diagnostic would you show first in a seminar slide—\(\hat\beta_{EDUC}\) or first-stage \(F\)?
4 Act III — Table 2: State-of-birth interactions
4.1 Story beat
Bound replicate AK91 Table VII, columns (5)–(8) — adding state × quarter interactions to exploit cross-state compulsory-law variation. Precision may improve, but first-stage \(F\) stays near 2: more instruments do not rescue weak identification here.
4.2 Read in the paper
- Table 2 paragraph in Section 3.2
- AK91: Table VII
4.3 Identification / econometrics
Excluded set expands to QOB × YOB and QOB × state of birth (plus levels as in AK91). Overidentification tests (Basmann \(F\)) appear in IV columns — often fail to reject even when \(F\) is tiny (foreshadowing Table 3).
4.4 Replicate
- Script:
Rcode/Table_II.R— large IV models; often 30+ minutes - Output: Table 2
The replication page loads pre-built Table_II.RData. If absent, run Table_II.R once locally; chunked IV in bound1995-chunked-iv.R handles the large instrument set.
4.5 Check your work
- Col. (4) IV: \(\hat\beta_{EDUC} \approx 0.081\); first-stage \(F \approx 1.87\) (still far below conventional thresholds).
- Contrast partial \(R^2\) and overid \(F\) across IV columns with Table 1.
4.6 Think
- When does adding interactions as instruments help first-stage strength — and when does it hurt (collinearity with age/QOB)?
- Bound footnote 18 notes even richer state × year × QOB instruments — what would you expect for \(F\)?
5 Act IV — Table 3: Simulated quarter of birth
5.1 Story beat
Alan Krueger’s suggestion (cited in Bound, p. 448): randomly permute quarter of birth within the sample, rebuild instruments, and re-estimate IV. Instruments are irrelevant by construction, yet second-stage output still looks respectable. Only the first stage (\(F \approx 1\)) reveals the problem.
5.2 Read in the paper
- Table 3 discussion in Section 3.2
- Section 2.2 (predictions for the simulation)
5.3 Identification / econometrics
Design (each replication):
- Hold sample, controls, and model fixed (four specs: Table 1 cols. (4) and (6); Table 2 cols. (2) and (4)).
- Shuffle
QOB; rebuild QOB × YOB (and state) dummies. - Estimate IV; store \(\hat\beta_{EDUC}\), SE, first-stage \(F\).
Theory: Mean \(\hat\beta\) should cluster near OLS (~0.06); mean \(F \approx 1\). Asymptotic SEs can track the simulation SD — so “reasonable” \(t\)-stats are not evidence of valid IV.
5.4 Replicate
- Script:
Rcode/Table_III.R - Output: Table 3
Run (from repo root):
# Pilot (~1–2 h with parallel workers)
BOUND95_SIM_REPS=50 Rscript 04-topics/rep-bound1995/Rcode/Table_III.R
# Paper (500 reps; checkpoints resume if interrupted)
BOUND95_SIM_REPS=500 BOUND95_SIM_CORES=7 Rscript 04-topics/rep-bound1995/Rcode/Table_III.RCheckpoints: Table_III_spec_*.rds in Rcode/.
# Conceptual loop (see Table_III.R for full implementation):
# for (b in seq_len(B)) {
# d_sim <- d %>% mutate(QOB = sample(QOB), .by = NULL) # permute
# d_sim <- rebuild_qob_dummies(d_sim)
# fit <- bound95_run_iv(d_sim, controls, excluded, included)
# store(coef, se, first_stage_F)
# }5.5 Replication literacy
| Issue | Lesson |
|---|---|
| Mean \(\hat\beta \approx 0.06\) with fake \(Z\) | Second stage alone is not diagnostic |
| Mean \(F \approx 1\) | First stage is diagnostic |
| Overid often fails to reject | Many weak instruments ≠ valid instruments |
| Long runtime | Report \(F\) and partial \(R^2\) in applied work to avoid needing Table 3 |
This act closes the loop with AK91 guide Act VI.
5.6 Check your work
- Mean coefficient ≈ 0.060–0.062 (near OLS, not AK IV).
- Mean first-stage \(F \approx 1\) across all four columns.
- Table 1 specs: SD of \(\hat\beta \approx 0.038\); 5th–95th percentiles often straddle zero.
5.7 Think
- Why do overidentification tests fail to reject with random instruments?
- Draft one sentence for a referee report: “The authors should report ___ before interpreting IV.”
6 Act V — Conclusion and modern follow-ups
6.1 Story beat
Section 4 concludes that applied work must report first-stage strength — not only second-stage coefficients. Subsequent literature formalizes tests and alternatives.
6.2 Course extensions (links only)
- Weak IV session — full reading list
- (Staiger and Stock 1997) — weak-IV asymptotics; TSLS can mimic OLS
- (Stock and Yogo 2002) — \(F > 10\) rule and its limits
- (Keane and Neal 2024) — modern reporting checklist (readings guide)
Minimum reporting package (applied PhD):
- First-stage \(F\) (or Kleibergen–Paap) on excluded instruments
- Partial \(R^2\) for excluded instruments
- Weak-IV-robust confidence sets (AR/CLR/LIML) when \(F\) is low
6.3 Think
- Is \(F > 10\) sufficient for AK91-style \(YQ\) instruments? What would you add?
- How does Bound change how you read any paper that only tables 2SLS coefficients?
7 Capstone — Weak-IV diagnostic memo
Write ½–1 page in English (bullets OK):
- Two questions. Restate AK91’s policy estimand vs Bound’s statistical critique (they are not the same question).
- Table 1, col. (6). Report \(\hat\beta_{EDUC}\), first-stage \(F\), and partial \(R^2\); interpret all three together.
- Table 3. For one simulated spec, explain why mean \(\hat\beta\) is near OLS and mean \(F\) is near 1.
- Your lab policy. When \(F < 10\), what estimators/inferences will you report?
- Replication note. One issue from this guide (chunked IV runtime, Table II
.RData, simulation checkpoints, or AK vs Bound coefficient comparison).
Deliverable (suggested). PDF memo + console log from Table_I.R and a 50-rep pilot of Table_III.R.
8 Appendix — Replication map and troubleshooting
| Table | Paper | Script | Known issue |
|---|---|---|---|
| 1 | §3.2, AK91 Table V (5)–(8) | Table_I.R |
\(F\) must fall across cols. (2)≈6) |
| 2 | §3.2, AK91 Table VII (5)–(8) | Table_II.R |
Slow; chunked IV |
| 3 | §3.2, simulated QOB | Table_III.R |
BOUND95_SIM_REPS, checkpoints |
Render this page:
quarto render 04-topics/rep-bound1995/replication-bound1995-guide.qmdRelation to AK91 materials
| AK91 guide | Bound guide |
|---|---|
| Builds QOB natural experiment | Takes experiment as given; attacks IV inference |
| IV ≈ OLS as headline finding | IV ≈ OLS as symptom of weak IV |
| Identification memo (exclusion) | Capstone memo (first-stage reporting) |