Replication with R

Problems With Instrumental Variables When the IV–Endogenous Correlation Is Weak

Author

Hu Huaping

Published

May 1, 2025

Introduction

This page replicates the empirical tables in (Bound, Jaeger, and Baker 1995) using the same Angrist–Krueger (1991) Census extract (raw_data.dta). Bound et al. re-examine AK91’s quarter-of-birth IV strategy and emphasize:

  1. Inconsistency when instruments are weak but not fully exogenous (Section 3.1).
  2. Finite-sample bias of IV toward OLS when the first-stage \(F\) is small (Section 3.2; Table 1–3).
  3. Simulated instruments (Table 3): randomizing quarter of birth leaves second-stage estimates looking plausible while first-stage \(F \approx 1\).

Prerequisites: AK91 guide and AK91 R tables first. Bound walkthrough: replication guide. Paper text: Bound et al. (1995) qmd.

Sample (all tables): 5% Public-Use Sample, 1980 Census; men born 1930–1939 (\(N = 329{,}509\)) — the same cohort emphasized in AK91 Table V.

R workflow

File Role
bound1995-data-prep.R Load raw_data.dta, build QOB/YOB/state dummies
bound1995-iv-diagnostics.R First-stage \(F\), partial \(R^2\), overid \(F\)
Table_I.R Table 1 — six OLS/IV specifications
Table_II.R Table 2 — state-of-birth interactions
Table_III.R Table 3 — simulated-QOB replications (BOUND95_SIM_REPS, BOUND95_SIM_CORES)

Data: raw_data.dta (copy of AK91 extract).

Table 1 — Wage equations (AK91 Table V variants)

Bound Table 1 columns (3)–(6) correspond to AK91 Table V columns (5)–(8), with first-stage diagnostics reported for every IV column.

Table 1: Estimated Effect of Completed Years of Education on Men’s Log Weekly Earnings1
OLS
IV
OLS
IV
OLS
IV
(1) (2) (3) (4) (5) (6)
Coefficient 0.063 (0.000) 0.130 (0.033) 0.063 (0.000) 0.081 (0.016) 0.063 (0.000) 0.060 (0.029)
F (excluded instruments) 13.362 4.747 1.613
Partial R² (excluded instruments, ×100) 0.012 0.043 0.014
F (overidentification) 2.370 0.775 0.725
Age, Age² × × × ×
9 Year of birth dummies × × × ×
Quarter of birth (excluded IV) ×
Quarter of birth × year of birth (excluded IV) × ×
Number of excluded instruments 3 30 30
1 Standard errors in parentheses. 5% Public-Use Sample, 1980 Census; men born 1930–1939 (N = 329,509). All specifications include race, SMSA, marital status, and eight regional dummies.

Table 2 — State-of-birth controls

Replicates AK91 Table VII columns (5)–(8) with first-stage statistics (Bound Table 2).

Table 2: Estimated Effect of Education on Log Weekly Earnings, Controlling for State of Birth1
OLS
IV
OLS
IV
(1) (2) (3) (4)
Coefficient 0.063 (0.000) 0.083 (0.009) 0.063 (0.000) 0.081 (0.011)
F (excluded instruments) 2.388 1.818
Partial R² (excluded instruments, ×100) 0.133 0.101
F (overidentification) 0.904 0.892
Age, Age² (quarter years) × ×
9 Year of birth dummies × × × ×
Quarter of birth / QOB×YOB / QOB×state (excluded IV) × ×
Number of excluded instruments 183 183
1 Standard errors in parentheses. Same sample as Table 1; fifty state-of-birth dummies included as controls.

Table 3 — Simulated quarter of birth

Following Krueger’s suggestion (noted in Bound et al., p. 448), quarter of birth is randomly permuted within the sample while keeping the sample, controls, and model specification fixed. Each replication:

  1. Shuffles QOB and rebuilds QOB×YOB (and QOB×state) dummies.
  2. Re-estimates IV for the four specifications highlighted in Tables 1–2: cols. 1 (4), 1 (6), 2 (2), 2 (4).
  3. Records the EDUC coefficient, its (asymptotic) standard error, and the first-stage F on excluded instruments.

Why this matters for teaching weak IV

  • With permuted QOB, instruments are irrelevant by construction: there is no valid excluded variation for schooling.
  • Theory (Bound Section 2.2) predicts IV estimates biased toward OLS and first-stage F ≈ 1 in large samples.
  • Yet second-stage coefficients and standard errors still look reasonable — means near the OLS return (~0.06) — while overidentification tests often fail to reject.
  • The lesson: do not trust second-stage significance when first-stage F is near 1, even with \(N \approx 330{,}000\).

Run the simulation

# Pilot (50 reps, ~1–2 h with parallel workers)
BOUND95_SIM_REPS=50 Rscript 04-topics/rep-bound1995/Rcode/Table_III.R

# Paper replication (500 reps; resume from checkpoints if interrupted)
BOUND95_SIM_REPS=500 BOUND95_SIM_CORES=7 Rscript 04-topics/rep-bound1995/Rcode/Table_III.R

Checkpoints are written per specification (Table_III_spec_*.rds); re-running with a larger BOUND95_SIM_REPS continues from the last saved replication.

Table 3: IV Estimates Using Simulated Quarter of Birth1
50 replications; men born 1930–1939
1 (4) 1 (6) 2 (2) 2 (4)
Estimated coefficient
Mean 0.061 0.061 0.057 0.057
Standard deviation 0.043 0.043 0.013 0.013
5th percentile 0.001 0.000 0.038 0.038
Median 0.064 0.063 0.057 0.057
95th percentile 0.132 0.132 0.079 0.079
Estimated standard error
Mean 0.038 0.038 0.015 0.015
Standard deviation 0.005 0.005 0.001 0.001
5th percentile 0.030 0.030 0.014 0.014
Median 0.039 0.039 0.015 0.015
95th percentile 0.045 0.045 0.016 0.016
First-stage F (excluded instruments)
Mean 0.955 0.956 0.963 0.963
1 Quarter of birth is randomly permuted within the sample (Krueger suggestion). With invalid instruments, second-stage coefficients look plausible but the first-stage F statistic clusters near 1 (see mean F row).

If Table_III.RData exists, the saved object sim_summary can be compared to paper_targets (printed when the script finishes). Key checks:

  • Mean coefficient ≈ 0.060—.062 (near OLS, not the AK91 IV point estimates).
  • Mean first-stage F ≈ 1 across all four columns.
  • Cols. 1 (4) and 1 (6): SD of coefficients ≈ 0.038—.039 (wide enough that 5th—5th percentiles often straddle zero).

Comparison with the published tables

After running the scripts, compare replication objects in each .RData file to paper_targets (printed in the R console). Key targets from Bound (1995):

  • Table 1, col. (6) IV: \(\hat\beta_{EDUC} \approx 0.060\), first-stage \(F \approx 1.61\).
  • Table 2, col. (4) IV: \(\hat\beta_{EDUC} \approx 0.081\), first-stage \(F \approx 1.87\).
  • Table 3: simulated-IV coefficient means ≈ 0.06 with mean first-stage F ≈ 1; SD —0.038 in Table 1 specs and —0.015 in Table 2 specs.

Relation to AK91

Both papers use the same underlying Census extract but different estimands and diagnostics:

  • AK91 reports full wage equations (Table V) to establish a positive return to schooling using QOB×YOB instruments.
  • Bound et al. add partial \(R^2\), first-stage \(F\), and Basmann overidentification statistics, and argue that weak instruments plus finite-sample bias undermine AK91’s IV interpretation.

For teaching weak IV, pair this replication with session-iv-weak.qmd.

References

Bound, John, David A. Jaeger, and Regina M. Baker. 1995. “Problems with Instrumental Variables Estimation When the Correlation Between the Instruments and the Endogenous Explanatory Variable Is Weak.” Journal of the American Statistical Association 90 (430): 443–50.