Replication Guide

College proximity and the return to schooling

Author

Kevin Hu

Published

June 4, 2026

ImportantHow to use this guide

Prerequisites. Endogeneity and 2SLS from course slides (Ch.17–20). Recommended: AK91 replication guide first — to contrast IV ≈ OLS (AK91) with IV > OLS (Card). Bound (1995) guide is optional here. Tier-1 context: Readings Guide.

Software. R with renv::restore(); author extract: Rcode/proximity/nls.dat (column layout in code_bk.txt).

Four documents — one workflow.

Step Document Your job
1 This guide Follow the story; understand \(Z\), \(D\), \(Y\), and heterogeneous compliers
2 Paper (full text) Read the sections cited in each Act
3 R replication output Compare Figure 1 and Tables 1–5 to the paper
4 Rcode/*.R Run and modify the modular scripts

Homework note. iv-wage-card uses a simplified Card1995.dta. This guide follows Card’s original proximity replication (nls.dat), not the homework file.

This guide is narrative-first (R only). Complete code and pre-rendered tables live in Rcode/ and replicate-card1993-R.

1 Overview: Geographic IV in one page

Card (1993) asks whether growing up near a four-year college raises schooling and earnings, and whether IV estimates of the return to schooling exceed OLS. The paper builds:

  1. Descriptive facts: Proximity predicts education, especially for men with low predicted schooling (Figure 1; Table 1).
  2. OLS benchmark: Conventional Mincer regressions (Table 2).
  3. IV core: Reduced form + 2SLS using nearc4 as an instrument (Table 3)IV 25–30% above OLS.
  4. Robustness and exclusion: Alternative instruments and interaction IV (Tables 4–5).

Contrast with AK91 and Bound (course arc).

Paper Instrument \(Z\) Headline Compliers / lesson
(Angrist and Krueger 1991) Quarter of birth IV ≈ OLS Compulsory-schooling margin
(Bound, Jaeger, and Baker 1995) (Critique of AK91) Low first-stage \(F\) Report \(F\) before trusting IV
Card (1993) nearc4 IV > OLS Low-SES youth; high marginal returns

flowchart LR
  Motive[Endogenous schooling]
  Z[nearc4]
  D[ed76]
  Y[lwage76]
  Het[Low family background]
  Motive --> Z
  Z --> D
  D --> Y
  Het --> Z
  Het --> D

Data map.

Object Meaning
nearc4 Four-year college in local labor market, 1966
nearc2 Two-year college nearby (Table 4)
ed76, lwage76 Education and log hourly wage, 1976
daded, momed, famed, lowfam Parental education and family structure
exp76, exp2 Experience and experience squared / 100
reg66*, smsa66r 1966 region and SMSA controls

Samples.

Use \(N\) Notes
Wage equations (Tables 2–5) 3010 Valid 1976 log wages
Full NLS-YM in paper Table 1 col (1) 5225 Not in distributed nls.dat
Table 1 cols (2)–(3) Replicated See partial replication warning

R module map.

File Role
card1993-data-prep.R read_fwf, variables, sample filters
card1993-gt-quarto.R Table formatting
Figure_1.R Predicted-education quartiles
Table_I.R · Table_V.R Tables 1–5

Read first: Paper (HTML)

Suggested path (~8–10 hours). Overview → Act I → II → III → IV → V → Capstone → Appendix.


2 Act I — Does proximity predict schooling?

2.1 Story beat

Before IV, Card documents that college proximity correlates with schooling, with the largest gaps among men predicted to attain low education — setting up both relevance and heterogeneous compliers.

2.2 Read in the paper

2.3 Identification / econometrics

Figure 1 logic.

  1. Fit a prediction equation for ed76 on the subsample with no nearby college (nearc4 = 0).
  2. Assign predicted-education quartiles for the full sample.
  3. Compare mean ed76 by quartile for nearc4 = 1 vs 0.

This is not IVit visualizes who proximity moves.

2.4 Replicate

WarningPartial replication (Table 1)

The author’s Table 1 column (1) (\(N = 5225\) full cohort) is not in the distributed nls.dat. Replication covers columns (2)–(3) only. See notes on replicate-card1993-R.

# 1) wage <- card93_wage_sample(load_card93_data())
# 2) Fit ed76 ~ controls on subset nearc4 == 0
# 3) predict(ed) for all; cut into quartiles; group_by(quartile, nearc4) %>% summarise(mean(ed76))

2.5 Check your work

  • Prediction \(R^2 \approx 0.32\) (paper fn. 16: 0.30).
  • Lowest predicted-education quartile: near-college men have ~0.9–1.0 extra years vs no college (paper: \(\approx 1.1\)).

2.6 Think

  1. Why fit the prediction model only where nearc4 = 0?
  2. How does Figure 1 support the abstract’s focus on “poorly-educated parents”

3 Act II — OLS benchmark

3.1 Story beat

Table 2 reports conventional OLS returns to schooling. These are the benchmarks IV will exceed — motivating ability bias / comparative advantage stories.

3.2 Read in the paper

  • OLS wage-equation section leading to Table 2

3.3 Identification / econometrics

\[ \ln W_i = \beta_0 + \rho\, EDUC_i + f(EXPER_i) + X_i'\gamma + u_i \]

Columns add 1966 geography and family-background blocks. OLS \(\hat\rho\) is biased if \(EDUC\) correlates with \(u\) (ability, family resources).

3.4 Replicate

# lm(lwage76 ~ ed76 + exp76 + exp2 + black + reg76r + smsa76r + reg66..., data = wage)

3.5 Check your work

  • Column (1): \(\hat\rho \approx 0.0747\) on ed76 (\(N = 3010\)) — matches Card’s SAS listing read1.lst.
  • Adding family controls (cols 3–5) leaves \(\hat\rho\) near 0.07–0.08.

3.6 Think

  1. If ability bias pushes OLS up, what sign do you expect for IV under homogeneous returns?
  2. Card finds IV above OLS — what compliers story fits that pattern?

4 Act III — Core IV: reduced form and 2SLS

4.1 Story beat

Table 3 is the econometric centerpiece: reduced-form effects of nearc4 on education and wages, then 2SLS/IV estimates of \(\rho\). Panel A uses a single excluded instrument; Panel B adds family × proximity structure.

4.2 Read in the paper

4.3 Identification / econometrics

Reduced form (Panel A, col 1–4):

\[ EDUC_i = \pi_0 + \pi_1 nearc4_i + X_i'\delta + \nu_i, \qquad \ln W_i = \lambda_0 + \lambda_1 nearc4_i + X_i'\phi + \eta_i \]

IV / 2SLS (cols 5–8): nearc4 instruments \(EDUC\) in the wage equation.

Wald shortcut: \(\hat\rho_{IV} \approx \hat\lambda_1 / \hat\pi_1\) (one instrument, one endogenous regressor).

Panel Excluded \(Z\) Teaching point
A nearc4 Baseline geographic IV
B nearc4 + interactions Richer first stage; similar IV

4.4 Replicate

TipContrast with AK91

AK91: IV ≈ OLS → “little ability bias.” Card: IV ≈0.13 vs OLS ≈0.07 — marginal returns for low-education compliers may exceed average OLS gaps ((Card 2001) synthesis).

4.5 Check your work

  • RF: nearc4 on ed760.32 (Panel A).
  • IV: \(\hat\rho_{ed76}\)0.132–0.140 (Panel A, with family controls).
  • IV > OLS by roughly 25–30% — paper’s abstract range.

4.6 Think

  1. Who are the compliers induced by nearc4 in this cohort?
  2. Would you expect a strong first-stage \(F\) here relative to AK91’s \(YQ\) instruments (see Bound guide)?

5 Act IV — Robustness

5.1 Story beat

Table 4 asks whether IV > OLS is an artifact of one proximity measure or one sample year. Alternatives include nearc2, public/private college splits, and 1978 wages.

5.2 Read in the paper

  • Robustness discussion around Table 4

5.3 Replicate

5.4 Replication literacy

Issue Note
Table 1 col (1) Not in nls.dat
Parent education missing Imputed to sample mean; nodaded / nomomed flags in models
Table 4 row 2 Uses lwage78, \(N = 2639\) (not the 3010 subsample)
Table 4 row 4 IV SE May differ slightly from paper (see replicate page notes)

5.5 Check your work

  • IV estimates remain above OLS across rows 1, 3, 5–7.
  • Row 3: nearc2 as instrument — compare magnitude to nearc4.

5.6 Think

  1. Which robustness row would most threaten the exclusion restriction?
  2. If nearc4 proxies for local labor-market prosperity, which table row addresses that?

6 Act V — Interaction IV and exclusion

6.1 Story beat

Section “Is College Proximity a Legitimate Instrument?” uses heterogeneity: proximity raises schooling most for low-family-background men. Table 5 instruments with nearc4 × lowfam (and richer nearc4 × famed interactions) and tests whether proximity enters the wage equation directly.

6.2 Read in the paper

6.3 Identification / econometrics

  • Relevance: nearc4_low (or nearc4 f_j) shifts \(EDUC\) more for disadvantaged families.
  • Exclusion test logic: If proximity raised wages only through college attendance, RF effects on wages should align with IV-implied effects; direct nearc4 in the structural equation should be small.

6.4 Replicate

6.5 Check your work

  • RF nearc4_low on ed76 larger than RF nearc4 alone.
  • IV return from interaction spec still exceeds OLS (~0.12+).
  • Structural coefficient on nearc4 (direct effect) not inconsistent with zero (paper’s conclusion).

6.6 Think

  1. How does interaction IV differ from control-function or LATE by subgroup?
  2. What local labor-market story would violate exclusion — and does Card’s design detect it?

7 Act VI — Synthesis with AK91 and Bound

7.1 Three-paper comparison

AK91 Bound Card93
\(Z\) QOB × YOB (Same data; meta) nearc4
IV vs OLS ≈ equal IV biased toward OLS if \(F\) low IV >> OLS
Main threat Weak IV; slight exclusion debate Finite-sample / weak \(F\) Local labor demand; family sorting
Heterogeneity Compulsory laws Low parental education
Course guide AK91 Bound This page

Follow-up reading: (Card 2001) survey; readings-guide identification row.

You do not need to re-run Bound’s simulated-QOB exercise for Card — only understand that Card’s first stage is economically stronger than AK91’s in Bound’s Table 1 col (6).

7.2 Think

  1. Can all three papers be “right” simultaneously? (Hint: different compliers and marginal returns.)
  2. Which paper would you assign to a policy maker who wants a single return-to-schooling number?

8 Capstone — Identification memo

Write ½–1 page in English:

  1. Estimand: Return to schooling for whom? (Low-SES compliers near colleges.)
  2. Relevance: Cite Figure 1 and Table 3 RF on ed76.
  3. Exclusion: Summarize Table 5 and the “legitimate instrument” section.
  4. Magnitude: IV vs OLS; contrast with AK91 guide Act IV.
  5. Replication limit: One issue from replication literacy (Table 1 col 1, imputation, or Table 4 sample).

Deliverable (suggested). PDF memo + console output from Table_III.R (print(replication) / print(paper_targets)).


9 Appendix — Replication map

Output Paper Script Known issue
Figure 1 Preliminary Figure_1.R Fit on nearc4 = 0 subsample
Table 1 Sample chars Table_I.R Col (1) missing
Table 2 OLS Table_II.R \(N = 3010\)
Table 3 RF + IV Table_III.R Panels A/B
Table 4 Robustness Table_IV.R Row 2 sample; row 4 SE
Table 5 Interaction IV Table_V.R Wald + 2SLS

Render this page:

quarto render 04-topics/rep-card1993/replication-card1993-guide.qmd

Minimal data peek:

library(here)
source(here("04-topics/rep-card1993/Rcode/card1993-data-prep.R"))
d <- load_card93_data()
nrow(card93_wage_sample(d))  # expect 3010

10 References

Angrist, Joshua D., and Alan B. Krueger. 1991. “Does Compulsory School Attendance Affect Schooling and Earnings?” The Quarterly Journal of Economics 106 (4): 979–1014. https://doi.org/10.2307/2937954.
Bound, John, David A. Jaeger, and Regina M. Baker. 1995. “Problems with Instrumental Variables Estimation When the Correlation Between the Instruments and the Endogenous Explanatory Variable Is Weak.” Journal of the American Statistical Association 90 (430): 443–50.
Card, David. 1993. “Using Geographic Variation in College Proximity to Estimate the Return to Schooling.” Working {{Paper}}. Working Paper Series. National Bureau of Economic Research. https://doi.org/10.3386/w4483.
———. 2001. “Estimating the Return to Schooling: Progress on Some Persistent Econometric Problems.” Econometrica 69 (5): 1127–60. https://www.jstor.org/stable/2692217.