flowchart LR Motive[Endogenous schooling] Z[nearc4] D[ed76] Y[lwage76] Het[Low family background] Motive --> Z Z --> D D --> Y Het --> Z Het --> D
Replication Guide
College proximity and the return to schooling
Prerequisites. Endogeneity and 2SLS from course slides (Ch.17–20). Recommended: AK91 replication guide first — to contrast IV ≈ OLS (AK91) with IV > OLS (Card). Bound (1995) guide is optional here. Tier-1 context: Readings Guide.
Software. R with renv::restore(); author extract: Rcode/proximity/nls.dat (column layout in code_bk.txt).
Four documents — one workflow.
| Step | Document | Your job |
|---|---|---|
| 1 | This guide | Follow the story; understand \(Z\), \(D\), \(Y\), and heterogeneous compliers |
| 2 | Paper (full text) | Read the sections cited in each Act |
| 3 | R replication output | Compare Figure 1 and Tables 1–5 to the paper |
| 4 | Rcode/*.R |
Run and modify the modular scripts |
Homework note. iv-wage-card uses a simplified Card1995.dta. This guide follows Card’s original proximity replication (nls.dat), not the homework file.
This guide is narrative-first (R only). Complete code and pre-rendered tables live in Rcode/ and replicate-card1993-R.
1 Overview: Geographic IV in one page
Card (1993) asks whether growing up near a four-year college raises schooling and earnings, and whether IV estimates of the return to schooling exceed OLS. The paper builds:
- Descriptive facts: Proximity predicts education, especially for men with low predicted schooling (Figure 1; Table 1).
- OLS benchmark: Conventional Mincer regressions (Table 2).
- IV core: Reduced form + 2SLS using
nearc4as an instrument (Table 3)IV 25–30% above OLS. - Robustness and exclusion: Alternative instruments and interaction IV (Tables 4–5).
Contrast with AK91 and Bound (course arc).
| Paper | Instrument \(Z\) | Headline | Compliers / lesson |
|---|---|---|---|
| (Angrist and Krueger 1991) | Quarter of birth | IV ≈ OLS | Compulsory-schooling margin |
| (Bound, Jaeger, and Baker 1995) | (Critique of AK91) | Low first-stage \(F\) | Report \(F\) before trusting IV |
| Card (1993) | nearc4 |
IV > OLS | Low-SES youth; high marginal returns |
Data map.
| Object | Meaning |
|---|---|
nearc4 |
Four-year college in local labor market, 1966 |
nearc2 |
Two-year college nearby (Table 4) |
ed76, lwage76 |
Education and log hourly wage, 1976 |
daded, momed, famed, lowfam |
Parental education and family structure |
exp76, exp2 |
Experience and experience squared / 100 |
reg66*, smsa66r |
1966 region and SMSA controls |
Samples.
| Use | \(N\) | Notes |
|---|---|---|
| Wage equations (Tables 2–5) | 3010 | Valid 1976 log wages |
| Full NLS-YM in paper Table 1 col (1) | 5225 | Not in distributed nls.dat |
| Table 1 cols (2)–(3) | Replicated | See partial replication warning |
R module map.
| File | Role |
|---|---|
card1993-data-prep.R |
read_fwf, variables, sample filters |
card1993-gt-quarto.R |
Table formatting |
Figure_1.R |
Predicted-education quartiles |
Table_I.R · Table_V.R |
Tables 1–5 |
Read first: Paper (HTML)
Suggested path (~8–10 hours). Overview → Act I → II → III → IV → V → Capstone → Appendix.
2 Act I — Does proximity predict schooling?
2.1 Story beat
Before IV, Card documents that college proximity correlates with schooling, with the largest gaps among men predicted to attain low education — setting up both relevance and heterogeneous compliers.
2.2 Read in the paper
- Preliminary Analysis of Earnings and Schooling
- Figure 1 and Table 1 in the paper qmd
2.3 Identification / econometrics
Figure 1 logic.
- Fit a prediction equation for
ed76on the subsample with no nearby college (nearc4 = 0). - Assign predicted-education quartiles for the full sample.
- Compare mean
ed76by quartile fornearc4 = 1vs0.
This is not IVit visualizes who proximity moves.
2.4 Replicate
- Figure:
Figure_1.R· Figure 1 - Sample stats:
Table_I.R· Table 1
The author’s Table 1 column (1) (\(N = 5225\) full cohort) is not in the distributed nls.dat. Replication covers columns (2)–(3) only. See notes on replicate-card1993-R.
# 1) wage <- card93_wage_sample(load_card93_data())
# 2) Fit ed76 ~ controls on subset nearc4 == 0
# 3) predict(ed) for all; cut into quartiles; group_by(quartile, nearc4) %>% summarise(mean(ed76))2.5 Check your work
- Prediction \(R^2 \approx 0.32\) (paper fn. 16: 0.30).
- Lowest predicted-education quartile: near-college men have ~0.9–1.0 extra years vs no college (paper: \(\approx 1.1\)).
2.6 Think
- Why fit the prediction model only where
nearc4 = 0? - How does Figure 1 support the abstract’s focus on “poorly-educated parents”
3 Act II — OLS benchmark
3.1 Story beat
Table 2 reports conventional OLS returns to schooling. These are the benchmarks IV will exceed — motivating ability bias / comparative advantage stories.
3.2 Read in the paper
- OLS wage-equation section leading to Table 2
3.3 Identification / econometrics
\[ \ln W_i = \beta_0 + \rho\, EDUC_i + f(EXPER_i) + X_i'\gamma + u_i \]
Columns add 1966 geography and family-background blocks. OLS \(\hat\rho\) is biased if \(EDUC\) correlates with \(u\) (ability, family resources).
3.4 Replicate
- Script:
Table_II.R - Output: Table 2
# lm(lwage76 ~ ed76 + exp76 + exp2 + black + reg76r + smsa76r + reg66..., data = wage)3.5 Check your work
- Column (1): \(\hat\rho \approx 0.0747\) on
ed76(\(N = 3010\)) — matches Card’s SAS listingread1.lst. - Adding family controls (cols 3–5) leaves \(\hat\rho\) near 0.07–0.08.
3.6 Think
- If ability bias pushes OLS up, what sign do you expect for IV under homogeneous returns?
- Card finds IV above OLS — what compliers story fits that pattern?
4 Act III — Core IV: reduced form and 2SLS
4.1 Story beat
Table 3 is the econometric centerpiece: reduced-form effects of nearc4 on education and wages, then 2SLS/IV estimates of \(\rho\). Panel A uses a single excluded instrument; Panel B adds family × proximity structure.
4.2 Read in the paper
4.3 Identification / econometrics
Reduced form (Panel A, col 1–4):
\[ EDUC_i = \pi_0 + \pi_1 nearc4_i + X_i'\delta + \nu_i, \qquad \ln W_i = \lambda_0 + \lambda_1 nearc4_i + X_i'\phi + \eta_i \]
IV / 2SLS (cols 5–8): nearc4 instruments \(EDUC\) in the wage equation.
Wald shortcut: \(\hat\rho_{IV} \approx \hat\lambda_1 / \hat\pi_1\) (one instrument, one endogenous regressor).
| Panel | Excluded \(Z\) | Teaching point |
|---|---|---|
| A | nearc4 |
Baseline geographic IV |
| B | nearc4 + interactions |
Richer first stage; similar IV |
4.4 Replicate
- Script:
Table_III.R(ivregfromAER) - Output: Table 3
AK91: IV ≈ OLS → “little ability bias.” Card: IV ≈0.13 vs OLS ≈0.07 — marginal returns for low-education compliers may exceed average OLS gaps ((Card 2001) synthesis).
4.5 Check your work
- RF:
nearc4oned76≈0.32 (Panel A). - IV: \(\hat\rho_{ed76}\) ≈0.132–0.140 (Panel A, with family controls).
- IV > OLS by roughly 25–30% — paper’s abstract range.
4.6 Think
- Who are the compliers induced by
nearc4in this cohort? - Would you expect a strong first-stage \(F\) here relative to AK91’s \(YQ\) instruments (see Bound guide)?
5 Act IV — Robustness
5.1 Story beat
Table 4 asks whether IV > OLS is an artifact of one proximity measure or one sample year. Alternatives include nearc2, public/private college splits, and 1978 wages.
5.2 Read in the paper
- Robustness discussion around Table 4
5.3 Replicate
- Script:
Table_IV.R - Output: Table 4
5.4 Replication literacy
| Issue | Note |
|---|---|
| Table 1 col (1) | Not in nls.dat |
| Parent education missing | Imputed to sample mean; nodaded / nomomed flags in models |
| Table 4 row 2 | Uses lwage78, \(N = 2639\) (not the 3010 subsample) |
| Table 4 row 4 IV SE | May differ slightly from paper (see replicate page notes) |
5.5 Check your work
- IV estimates remain above OLS across rows 1, 3, 5–7.
- Row 3:
nearc2as instrument — compare magnitude tonearc4.
5.6 Think
- Which robustness row would most threaten the exclusion restriction?
- If
nearc4proxies for local labor-market prosperity, which table row addresses that?
6 Act V — Interaction IV and exclusion
6.1 Story beat
Section “Is College Proximity a Legitimate Instrument?” uses heterogeneity: proximity raises schooling most for low-family-background men. Table 5 instruments with nearc4 × lowfam (and richer nearc4 × famed interactions) and tests whether proximity enters the wage equation directly.
6.2 Read in the paper
6.3 Identification / econometrics
- Relevance:
nearc4_low(ornearc4 f_j) shifts \(EDUC\) more for disadvantaged families. - Exclusion test logic: If proximity raised wages only through college attendance, RF effects on wages should align with IV-implied effects; direct
nearc4in the structural equation should be small.
6.4 Replicate
6.5 Check your work
- RF
nearc4_lowoned76larger than RFnearc4alone. - IV return from interaction spec still exceeds OLS (~0.12+).
- Structural coefficient on
nearc4(direct effect) not inconsistent with zero (paper’s conclusion).
6.6 Think
- How does interaction IV differ from control-function or LATE by subgroup?
- What local labor-market story would violate exclusion — and does Card’s design detect it?
7 Act VI — Synthesis with AK91 and Bound
7.1 Three-paper comparison
| AK91 | Bound | Card93 | |
|---|---|---|---|
| \(Z\) | QOB × YOB | (Same data; meta) | nearc4 |
| IV vs OLS | ≈ equal | IV biased toward OLS if \(F\) low | IV >> OLS |
| Main threat | Weak IV; slight exclusion debate | Finite-sample / weak \(F\) | Local labor demand; family sorting |
| Heterogeneity | Compulsory laws | — | Low parental education |
| Course guide | AK91 | Bound | This page |
Follow-up reading: (Card 2001) survey; readings-guide identification row.
You do not need to re-run Bound’s simulated-QOB exercise for Card — only understand that Card’s first stage is economically stronger than AK91’s in Bound’s Table 1 col (6).
7.2 Think
- Can all three papers be “right” simultaneously? (Hint: different compliers and marginal returns.)
- Which paper would you assign to a policy maker who wants a single return-to-schooling number?
8 Capstone — Identification memo
Write ½–1 page in English:
- Estimand: Return to schooling for whom? (Low-SES compliers near colleges.)
- Relevance: Cite Figure 1 and Table 3 RF on
ed76. - Exclusion: Summarize Table 5 and the “legitimate instrument” section.
- Magnitude: IV vs OLS; contrast with AK91 guide Act IV.
- Replication limit: One issue from replication literacy (Table 1 col 1, imputation, or Table 4 sample).
Deliverable (suggested). PDF memo + console output from Table_III.R (print(replication) / print(paper_targets)).
9 Appendix — Replication map
| Output | Paper | Script | Known issue |
|---|---|---|---|
| Figure 1 | Preliminary | Figure_1.R |
Fit on nearc4 = 0 subsample |
| Table 1 | Sample chars | Table_I.R |
Col (1) missing |
| Table 2 | OLS | Table_II.R |
\(N = 3010\) |
| Table 3 | RF + IV | Table_III.R |
Panels A/B |
| Table 4 | Robustness | Table_IV.R |
Row 2 sample; row 4 SE |
| Table 5 | Interaction IV | Table_V.R |
Wald + 2SLS |
Render this page:
quarto render 04-topics/rep-card1993/replication-card1993-guide.qmdMinimal data peek:
library(here)
source(here("04-topics/rep-card1993/Rcode/card1993-data-prep.R"))
d <- load_card93_data()
nrow(card93_wage_sample(d)) # expect 3010