Replication Guide

From average returns to marginal policy effects

Author

Hu Huaping

Published

June 4, 2026

How to use this guide

Prerequisites. Endogeneity, IV, and LATE from course slides (Ch.17–20). Helpful background: AK91 guide (IV mechanics) and Card (1993) guide (college proximity as an instrument — the same pub4 margin appears in CHV2011).

Software. R with renv::restore(); bundled NLSY files under replication_aer/. Technical run order: minimum runnable checklist.

Four documents — one workflow.

Step	Document	Your job
1	This guide	Follow the story; link each table to an economic question and an identification step
2	Paper (full text)	Read the sections cited in each Act
3	R replication output	Compare cached tables/figures and read per-table replication notes
4	`Rcode/`	Run scripts; inspect shared modules when a number diverges

This guide is narrative-first. It explains why each table exists and how completely the course R pipeline reproduces the paper — not only which command to type.

Suggested path (~3–4 hours). Overview → Act I → Act II → Act III → Act IV → Act V → Act VI → Capstone → Appendix map.

1 Overview: Marginal returns in one page

Carneiro, Heckman, and Vytlacil (2011) ask a policy question sharper than “what is the average return to college?”

Returns vary across people (heterogeneous $\beta_i = Y_{1i} - Y_{0i}$).
People select into college partly on those returns (selection on gains).
Conventional ATE, TT, and IV are weighted averages of the marginal treatment effect (MTE) — they need not answer the policy you care about.
The paper estimates MTE and constructs marginal policy relevant treatment effects (MPRTE) for specific policy shifts (Table 1).

Evidence chain in the paper.

Act	Economic question	Main outputs
I	Who selects into college, and are instruments strong?	Tables 2, A-3, 3
II	Do agents select on ex post gains?	Tables 4(a), 4(b), A-5
III	What does a normal selection model imply?	Fig 1, Table 5 (normal), A-6
IV	What does a semi-parametric LIV model imply?	Figs 2–3, A-7, Table 5 (semi)
V	How do MPRTE, IV, and OLS compare?	Table 5, 1, A-8
VI	Are conclusions robust?	Table 6

flowchart TB
  subgraph latent [Latent heterogeneity]
    MTE["MTE(x, u_S)"]
    Sel["Selection on gains: Cor(beta, S) != 0"]
  end
  subgraph emp [Empirical steps]
    Z["Instruments Z: pub4, tuition, local labor market"]
    P["Propensity P(Z|X)"]
    Y["Log wage Y"]
    S["College S"]
  end
  Z --> P
  P --> S
  X["Controls X: AFQT, family, experience"] --> S
  X --> Y
  S --> Y
  Sel --> MTE
  MTE --> MPRTE["MPRTE for policy shift"]
  MTE --> IVest["Linear IV / LATE"]

Course alignment.

Topic	Where in CHV2011
Endogeneity of schooling	Motivation; OLS vs IV in Table 5
IV / LATE	Table 3 instruments; linear IV row in Table 5
Heterogeneous treatment effects	MTE curve (Figs 1, 4); Table 4(b)
Selection on unobservables	Heckman selection; normal vs semi-parametric
Policy-relevant parameters	MPRTE weights (Table 1); “better MPRTE than LATE?” (Conclusion)
Weak / limited support	Fig 2: $f(P \mid X)$ support; trimming $P$

Data map (replication sample).

Object	Meaning
`basicvariables.dta` + `localvariables.dta`	NLSY79 white males, $N = 1747$
$Y$	Log average hourly wage 1989–93
$S$	Ever enrolled in college vs HS grad only
$X$	AFQT (corrected), family, experience, local labor market in 1991
$Z \setminus X$	College proximity at 14, local wage/unemployment/tuition at 17 (⊥ $X$)
`bootdata/phat*.asc`	250 bootstrap resamples for choice-equation inference

Read first: Paper HTML · R tables/figures · Checklist

2 Act 0 — Framework: MTE, LIV, and MPRTE

2.1 Story beat

Section I defines the potential outcomes setup with heterogeneous returns, the MTE as $\partial E(Y \mid X, P=p) / \partial p$, and MPRTE as a weighted integral of MTE for a policy that shifts instruments or $P$. Table 1 and Web Appendix Table A-1 give the weight functions.

This Act is conceptual — no table script. Skim it before Act II so Table 4(a)’s polynomial test is interpretable as a test of constant MTE in $u_S$.

2.2 Read in the paper

Section I.A–B: PRTE, MPRTE, practical issues
Main-text Table 1 (policy weights) and Web Appendix Table A-1

2.3 Identification / econometrics

Under the LIV assumptions (independence of $(U_0, U_1, V)$ from $Z$ given $X$):

\[ E(Y \mid X, P=p) = \mu_0(X) + p \cdot [\mu_1(X) - \mu_0(X)] + K(p) \]

Nonlinearity of $K(p)$ → MTE varies with $u_S$ → selection on gains. Table 4(a) tests polynomial terms in $P$; Table 4(b) tests equality of LATE over adjacent $u_S$ bins.

2.4 Replicate

Display: Table 1, Table A-1 · Table_1.R, Table_A1.R

2.5 Think

Why can IV answer a different policy question than MPRTE?
Table 5 marks semi-parametric ATE/TT/TUT as “Not Identified.” What does limited support of $P$ (Figure 2) imply?

3 Act I — Sample, instruments, and the first stage

3.1 Story beat

Section II.A introduces the NLSY sample and the Card/Cameron–Taber/Kane–Rouse instrument set. Table 3 shows that AFQT, family background, and instruments predict college attendance. This is the first stage for all later MTE work.

Start replication here: verify $N = 1747$ and wage means (Table A-3) before trusting downstream tables.

3.2 Read in the paper

Section II.A; Table 2, Table 3
Web Appendix Table A-3, A-4 (men subsample)

3.3 Identification / econometrics

Logit choice model (flexible polynomials in $X$ and interactions with instruments):

\[ P_i = \Pr(S_i = 1 \mid Z_i, X_i) \]

Average marginal derivatives (Table 3): for each regressor, $\partial \Pr(S=1)/\partial z$ averaged over $i$, with bootstrap SEs (250 draws).

Joint test on excluded instruments: strong relevance in the paper ($p = 0.0001$).

3.4 Replicate

Output	Script	Pipeline	R output
Sample means	`Table_A3.R`	P0 merge	tab-A3
Choice equation	`Table_3.R`	P0 + P1 `bootdata/`	tab-3
Men subsample	`Table_A4.R`	P1	tab-A4
Bootstrap files	`getboot.R`	P1	`replication_aer/bootdata/`

# From repo root, after renv::restore():
# Rscript 04-topics/rep-chv2011/Rcode/getboot.R
# CHV2011_BOOT=250 Rscript 04-topics/rep-chv2011/Rcode/Table_3.R

3.5 Check your work

Table A-3: $N = 1747$ (882 / 865); mean log wage $\approx 2.21$ ($S=0$) and $\approx 2.55$ ($S=1$).
Table 3: control derivatives (AFQT, mother’s education, siblings) within ~0.04 of paper.
Course R on bundled data: instrument rows and joint IV $p$ do not match the paper (see replication note on tab-3).

3.6 Replication literacy

Bundled merge vs geocode merge

The AER package expects geocode-linked newid.dta → dataset.dta. This course repo merges basicvariables.dta and localvariables.dta by row order (bundled_aligned). Core demographics match Table A-3; local IV variables are misaligned (`cor(caseid, newid) $). First-stage IV coefficients are the main casualty — not the logit functional form.

3.7 Think

Why interact instruments with AFQT and mother’s education?
If joint instrument relevance fails in your replication, should you proceed to MTE tables?

4 Act II — Selection on gains: Is MTE constant?

4.1 Story beat

The paper’s first major finding: agents select on ex post gains. Table 4(a) rejects linearity of $E(Y \mid X, P)$ in $P$ (polynomial test). Table 4(b) rejects equality of LATE across $u_S$ bins. Table A-5 repeats 4(a) with double bootstrap for estimated $P$.

This motivates everything in Acts III–V: average returns and IV are mixtures of heterogeneous MTE.

4.2 Read in the paper

Table 4(a)–(b) narrative; Web Appendix Table A-5
Heckman, Schmierer and Urzua (2010) test referenced in text

4.3 Identification / econometrics

Table 4(a): Estimate $Y$ on $X$, $\hat P$, $\hat P^2, \ldots, \hat P^d$; test joint significance of nonlinear terms. Probit first step (author bootstrap_MT.R); Murphy–Topel / bootstrap $p$-values; Romano–Wolf stepdown.

Table 4(b): Bin $u_S$, average MTE within bins, bootstrap test adjacent LATE differences.

4.4 Replicate

Output	Script	Env vars	R output
Export data	`export_data_with_tuition.R`	✓	`mainresults/table4a/data_with_tuition.RData`
Table 4(a)	`Table_4a.R`	`CHV2011_BOOT_4A=1000`	tab-4a
Table 4(b)	`Table_4b.R`	`CHV2011_BOOT=250`	tab-4b
Table A-5	`Table_A5.R`	`CHV2011_BOOT_A5_OUT=500`, `CHV2011_BOOT_A5_IN=100`	tab-A5

Core logic: chv2011-table4a-core.R.

4.5 Check your work

Test	Paper	Course R (bundled)	Verdict
4(a) HC $p$, deg 2	0.035	≈0.035	Close
4(a) bootstrap $p$, deg 2	0.035	≈0.14	Gap
4(a) RW joint	Reject	Fail to reject	Gap
A-5 RW critical	≈0.052	≈0.054	Close
A-5 bootstrap $p$	0.004–0.026	0.09–0.28	Gap
4(b) pattern	Declining LATE diff	Negative diffs, joint reject	Qualitative OK

What the gap means for students

The course pipeline implements the author’s test logic and runs paper-grade bootstrap sizes. On bundled data, asymptotic inference tracks the paper better than pivotal bootstrap $p$-values. Treat this as a lesson in replication literacy: identical code structure ≠ identical numbers when inputs (merge) or unpublished details (double_bootstrap.R absent from AER bundle) differ.

4.6 Think

If MTE is declining in $u_S$, who has the highest returns — the inframarginal enrolled or those who skip college?
Why does Table 4(a) use probit while Table 3 uses logit?

5 Act III — Normal selection benchmark

5.1 Story beat

Section II.B estimates a joint normal selection model (Willis–Rosen tradition). Figure 1 shows a precisely estimated declining MTE. Table 5 column 1 converts MTE to ATE, TT, TUT, and MPRTE. The normal model rejects constant MTE ($\sigma_{1V} - \sigma_{0V} \neq 0$ in Table A-6) — consistent with Act II.

5.2 Read in the paper

Section II.B; Figure 1; Table 5 (normal column); Table A-6

5.3 Replicate

Output	Script	R output
Normal MTE curve	`Figure_1.R`	fig-1
Switching regression	`Table_A6.R`	tab-A6
Table 5 (normal col.)	`Table_5.R`	tab-5

Course shortcut: Figure 1 and Table 5 normal column use shipped normalmodel/mtexvb.out (author ML output) — not re-estimated in R.

5.4 Check your work

Figure 1: downward-sloping MTE; annualized scale (~divide by 4 years).
Table 5 normal: MPRTE (Z+ε) ≈0.066; TT > MPRTE > TUT ordering in paper.
Course R: normal ATE ≈0.066 matches; TT/TUT may differ slightly on bundled data.

5.5 Think

Why are ATE/TT/TUT identified in the normal model but not in the semi-parametric column?
What is unattractive about normal MTE at $u_S \to 0$ or $1$ (footnote 17)?

6 Act IV — Semi-parametric MTE (LIV)

6.1 Story beat

Section II.C is the empirical core: estimate $E(Y \mid X, P=p)$ semi-parametrically, differentiate to get MTE, integrate with weights for MPRTE. Figures 2–3 show limited support of $P$ given $X$ (why ATE/TT/TUT are not identified). Figures 4–7 plot MTE and policy weights.

The author pipeline: Stata/GAUSS superboot (250 iterations) → MATLAB mte4c.m, treatwithiv.m. The course R pipeline uses a polynomial proxy aligned with Table 4(a) — faster, transparent, but not numerically identical.

6.2 Read in the paper

Section II.C; Figures 2–3; Web Appendix Table A-7 (Robinson partial linear)

6.3 Identification / econometrics

Two-step logic:

$\hat P = \text{logit}(S \mid Z, X)$
Partially linear outcome: $Y = X'\delta_0 + P \cdot X'(\delta_1 - \delta_0) + K(P) + \varepsilon$
MTE$(u_S) = \partial K(p) / \partial p$ evaluated at $p = u_S$ (plus $X$ terms)

Trim $P \in [0.0324, 0.9775]$ (common support).

6.4 Replicate

Output	Script	Module	R output
MTE / MPRTE / IV / OLS	`Table_5.R`	`chv2011-mte-core.R`	tab-5
Adjacent LATE test	`Table_4b.R`	same	tab-4b
Partial linear	`Table_A7.R`	data-prep	tab-A7
Figures	`Figure_2.R` · `Figure_7.R`	`chv2011-figures.R`	fig-2 ✓

# CHV2011_BOOT=250 Rscript 04-topics/rep-chv2011/Rcode/Table_5.R
# Rscript 04-topics/rep-chv2011/Rcode/run_all_figures.R

6.5 Check your work

Quantity	Paper (semi / IV / OLS)	Course R (bundled)
MPRTE (Z+ε)	0.080	≈0.038
Linear IV	0.095	≈0.119
OLS	0.084	≈0.077

Pattern to verify: IV and OLS same order of magnitude; MTE declining in $u_S$; MPRTE below TT in normal column.

6.6 Think

Why is semi-parametric MPRTE estimated but ATE labeled “Not Identified”?
How does Figure 2 change your reading of Table 5?

7 Act V — Policy parameters and linear IV

7.1 Story beat

Table 5 compares policy objects: MPRTE for three policy shifts (Table 1 weights), linear IV using $P(Z)$ as instrument, and OLS. The conclusion: MPRTE < TT for marginal expansions considered — expanding college to the next marginal group yields lower returns than inframarginal enrolled. Table A-8 reports linear IV/OLS with bootstrap SEs as a bridge to the IV literature.

Paper quote (Conclusion): “Better MPRTE than LATE?” — policy questions should target the margin the policy moves.

7.2 Read in the paper

Table 5; Section III Summary; Table A-8

7.3 Replicate

Table_5.R · tab-5
Table_A8.R · tab-A8

7.4 Check your work

MPRTE policies roughly 6–8% annualized (semi-parametric) in paper; IV ≈9.5%, OLS ≈8.4%.
Course R: IV/OLS magnitudes reasonable; semi-parametric MPRTE below paper (polynomial proxy).

7.5 Think

When is linear IV a weighted average of MTE? Which margins does $P(Z)$ weight?
Murray (2008) argument vs CHV finding on low marginal returns for some expansions.

8 Act VI — Robustness (Table 6)

8.1 Story beat

Section II.D re-estimates treatment parameters under eight specifications: sample restrictions, dropout dummies, alternative IV sets, Cameron–Taber instrument subset, no tuition, and SLS index for $f(P \mid X)$.

Finding: Point estimates are stable across most columns; SLS index (column 8) matches baseline but is computationally prohibitive for bootstrap SEs.

8.2 Read in the paper

Section II.D; Table 6(a)–(b)

8.3 Replicate

Table_6.R · run_chv2011_spec() for each column
Env: CHV2011_BOOT=250 (~7 min for all eight columns on bundled data)
Output: tab-6a, tab-6b

8.4 Check your work

SLS column: matches paper (cached author coefficients; npindex not run).
No-dropouts TT: R ≈0.243 vs paper ≈0.261 (closest column).
Baseline $\tilde{\text{ATE}}$: R ≈0.044 vs paper ≈0.082 (polynomial MTE gap).

Author vs course pipeline (Table 6)

Each author column re-runs table6/*/superboot (GAUSS cdens + MATLAB treatparnew.m). Course R reuses one polynomial MTE core with spec-varying formulas — a deliberate teaching tradeoff. Stability across R columns still illustrates robustness logic; levels vs paper require the author pipeline or geocode data.

8.5 Think

Which robustness check speaks to the Card proximity instrument vs tuition?
Why does the paper cache SLS point estimates but not bootstrap SEs?

9 Act VII — What we can and cannot reproduce

9.1 Story beat

Step back from individual tables. The course repo delivers a complete R teaching pipeline; the AER package delivers a multi-language production pipeline. Both are “replications” — with different completion criteria.

9.2 Course R pipeline — done

Phase	Deliverable	Status
P0	Merge + descriptive tables	✓
P1	`bootdata/` (250)	✓
P2	Table 4(a), A-5 at paper $B$	≈(bootstrap $p$ gaps on bundled data)
P3	Table 4(b), 5, Figs 1–7 at $B=250$	≈(polynomial MTE proxy)
P4	Table 6 eight columns at $B=250$	✓

9.3 Author pipeline — not run in course repo

Component	Software	Unlocks
`getdataset.do` + geocode	Stata	Exact IV merge
`superboot`	GAUSS + MATLAB + shell	Exact MTE, Table 4(b), 5 semi
`cdens` / `chv01b.prg`	GAUSS	Semi-parametric MTE shape
`table6/bootdata_nodrop`	Stata	Nodrop bootstrap SEs
`npindex` SLS	R (`np` package, >24h)	SLS column estimation

9.4 Reproducibility lessons (for your memo)

Data linkage matters as much as econometric code (Act I literacy box).
Asymptotic vs bootstrap can diverge when $P$ is estimated (Act II).
Proxy vs production estimator: polynomial MTE teaches LIV logic; GAUSS implements it (Act IV).
Honest reporting: HC deg-2 matches; bootstrap joint tests may not — both are findings (Act II table).

10 Capstone — Identification and replication memo

Write 1–2 pages in English. Suggested rubric:

Policy question
What marginal expansion of college attendance is MPRTE designed to evaluate (Table 1, Z+ε policy)?
Selection on gains
Cite Table 4(a) or 4(b): what evidence rejects constant MTE / equal LATEs?
Heterogeneity
One sentence on Figure 1 or 4: who has highest vs lowest marginal returns?
IV vs MPRTE vs OLS
From Table 5, compare linear IV and one MPRTE row. Why can they differ under heterogeneity and selection?
Replication audit
Pick one issue from this guide (bundled merge, Table 4(a) bootstrap gap, polynomial MTE proxy, or Table 6 SLS cache). State what you verified in replicate-chv2011-R.qmd.
Robustness
One column from Table 6: what sensitivity question does it answer?

Deliverable (suggested). PDF memo + log showing you ran Table_A3.R and one of Table_4a.R, Table_5.R, or Table_6.R.

11 Appendix — Replication map

Table / Fig	Paper section	Act	R script	Completion on bundled data
1	I.A	0	`Table_1.R`	Display only
2	II.A	I	`Table_2.R`	Display only
3	II.A	I	`Table_3.R`	Partial (controls ✓; IV ~
4(a)	II.A	II	`Table_4a.R`	Partial (HC ✓; boot ~
4(b)	II.A	II	`Table_4b.R`	Partial (pattern ~
5	II.B–C	III–V	`Table_5.R`	Partial
6(a)(b)	II.D	VI	`Table_6.R`	Partial (SLS cached)
A-3	App.	I	`Table_A3.R`	Complete (core vars)
A-4	App.	I	`Table_A4.R`	Partial
A-5	App.	II	`Table_A5.R`	Partial (RW ✓; boot ~
A-6	App.	III	`Table_A6.R`	Partial
A-7	App.	IV	`Table_A7.R`	Partial
A-8	App.	V	`Table_A8.R`	Partial
Fig 1	II.B	III	`Figure_1.R`	Shipped `mtexvb.out`
Figs 2–7	II.C	IV	`Figure_*.R`	Polynomial proxy

Render this page:

quarto render 04-topics/rep-chv2011/replication-chv2011-guide.qmd

Minimal merge check:

library(here)
source(here("04-topics/rep-chv2011/Rcode/chv2011-data-prep.R"))
raw <- load_chv2011_raw() |> add_chv2011_derived()
validate_chv2011_merge(raw)

Related course materials

Checklist — env vars, P0–P4 commands
R output ≈per-table replication notes
Paper ≈full text with tables
Author README.txt

12 References

Carneiro, Pedro, James J. Heckman, and Edward Vytlacil. 2011. “Estimating Marginal Returns to Education.” The American Economic Review 101 (6): 2754–81. https://doi.org/10.1257/aer.101.6.2754.