Replication Guide

From average returns to marginal policy effects

Author

Hu Huaping

Published

June 4, 2026

ImportantHow to use this guide

Prerequisites. Endogeneity, IV, and LATE from course slides (Ch.17–20). Helpful background: AK91 guide (IV mechanics) and Card (1993) guide (college proximity as an instrument — the same pub4 margin appears in CHV2011).

Software. R with renv::restore(); bundled NLSY files under replication_aer/. Technical run order: minimum runnable checklist.

Four documents — one workflow.

Step Document Your job
1 This guide Follow the story; link each table to an economic question and an identification step
2 Paper (full text) Read the sections cited in each Act
3 R replication output Compare cached tables/figures and read per-table replication notes
4 Rcode/ Run scripts; inspect shared modules when a number diverges

This guide is narrative-first. It explains why each table exists and how completely the course R pipeline reproduces the paper — not only which command to type.

Suggested path (~3–4 hours). Overview → Act I → Act II → Act III → Act IV → Act V → Act VI → Capstone → Appendix map.

1 Overview: Marginal returns in one page

Carneiro, Heckman, and Vytlacil (2011) ask a policy question sharper than “what is the average return to college?”

  1. Returns vary across people (heterogeneous \(\beta_i = Y_{1i} - Y_{0i}\)).
  2. People select into college partly on those returns (selection on gains).
  3. Conventional ATE, TT, and IV are weighted averages of the marginal treatment effect (MTE) — they need not answer the policy you care about.
  4. The paper estimates MTE and constructs marginal policy relevant treatment effects (MPRTE) for specific policy shifts (Table 1).

Evidence chain in the paper.

Act Economic question Main outputs
I Who selects into college, and are instruments strong? Tables 2, A-3, 3
II Do agents select on ex post gains? Tables 4(a), 4(b), A-5
III What does a normal selection model imply? Fig 1, Table 5 (normal), A-6
IV What does a semi-parametric LIV model imply? Figs 2–3, A-7, Table 5 (semi)
V How do MPRTE, IV, and OLS compare? Table 5, 1, A-8
VI Are conclusions robust? Table 6

flowchart TB
  subgraph latent [Latent heterogeneity]
    MTE["MTE(x, u_S)"]
    Sel["Selection on gains: Cor(beta, S) != 0"]
  end
  subgraph emp [Empirical steps]
    Z["Instruments Z: pub4, tuition, local labor market"]
    P["Propensity P(Z|X)"]
    Y["Log wage Y"]
    S["College S"]
  end
  Z --> P
  P --> S
  X["Controls X: AFQT, family, experience"] --> S
  X --> Y
  S --> Y
  Sel --> MTE
  MTE --> MPRTE["MPRTE for policy shift"]
  MTE --> IVest["Linear IV / LATE"]

Course alignment.

Topic Where in CHV2011
Endogeneity of schooling Motivation; OLS vs IV in Table 5
IV / LATE Table 3 instruments; linear IV row in Table 5
Heterogeneous treatment effects MTE curve (Figs 1, 4); Table 4(b)
Selection on unobservables Heckman selection; normal vs semi-parametric
Policy-relevant parameters MPRTE weights (Table 1); “better MPRTE than LATE?” (Conclusion)
Weak / limited support Fig 2: \(f(P \mid X)\) support; trimming \(P\)

Data map (replication sample).

Object Meaning
basicvariables.dta + localvariables.dta NLSY79 white males, \(N = 1747\)
\(Y\) Log average hourly wage 1989–93
\(S\) Ever enrolled in college vs HS grad only
\(X\) AFQT (corrected), family, experience, local labor market in 1991
\(Z \setminus X\) College proximity at 14, local wage/unemployment/tuition at 17 (⊥ \(X\))
bootdata/phat*.asc 250 bootstrap resamples for choice-equation inference

Read first: Paper HTML · R tables/figures · Checklist


2 Act 0 — Framework: MTE, LIV, and MPRTE

2.1 Story beat

Section I defines the potential outcomes setup with heterogeneous returns, the MTE as \(\partial E(Y \mid X, P=p) / \partial p\), and MPRTE as a weighted integral of MTE for a policy that shifts instruments or \(P\). Table 1 and Web Appendix Table A-1 give the weight functions.

This Act is conceptual — no table script. Skim it before Act II so Table 4(a)’s polynomial test is interpretable as a test of constant MTE in \(u_S\).

2.2 Read in the paper

  • Section I.A–B: PRTE, MPRTE, practical issues
  • Main-text Table 1 (policy weights) and Web Appendix Table A-1

2.3 Identification / econometrics

Under the LIV assumptions (independence of \((U_0, U_1, V)\) from \(Z\) given \(X\)):

\[ E(Y \mid X, P=p) = \mu_0(X) + p \cdot [\mu_1(X) - \mu_0(X)] + K(p) \]

Nonlinearity of \(K(p)\) → MTE varies with \(u_S\)selection on gains. Table 4(a) tests polynomial terms in \(P\); Table 4(b) tests equality of LATE over adjacent \(u_S\) bins.

2.4 Replicate

2.5 Think

  1. Why can IV answer a different policy question than MPRTE?
  2. Table 5 marks semi-parametric ATE/TT/TUT as “Not Identified.” What does limited support of \(P\) (Figure 2) imply?

3 Act I — Sample, instruments, and the first stage

3.1 Story beat

Section II.A introduces the NLSY sample and the Card/Cameron–Taber/Kane–Rouse instrument set. Table 3 shows that AFQT, family background, and instruments predict college attendance. This is the first stage for all later MTE work.

Start replication here: verify \(N = 1747\) and wage means (Table A-3) before trusting downstream tables.

3.2 Read in the paper

3.3 Identification / econometrics

Logit choice model (flexible polynomials in \(X\) and interactions with instruments):

\[ P_i = \Pr(S_i = 1 \mid Z_i, X_i) \]

Average marginal derivatives (Table 3): for each regressor, \(\partial \Pr(S=1)/\partial z\) averaged over \(i\), with bootstrap SEs (250 draws).

Joint test on excluded instruments: strong relevance in the paper (\(p = 0.0001\)).

3.4 Replicate

Output Script Pipeline R output
Sample means Table_A3.R P0 merge tab-A3
Choice equation Table_3.R P0 + P1 bootdata/ tab-3
Men subsample Table_A4.R P1 tab-A4
Bootstrap files getboot.R P1 replication_aer/bootdata/
# From repo root, after renv::restore():
# Rscript 04-topics/rep-chv2011/Rcode/getboot.R
# CHV2011_BOOT=250 Rscript 04-topics/rep-chv2011/Rcode/Table_3.R

3.5 Check your work

  • Table A-3: \(N = 1747\) (882 / 865); mean log wage \(\approx 2.21\) (\(S=0\)) and \(\approx 2.55\) (\(S=1\)).
  • Table 3: control derivatives (AFQT, mother’s education, siblings) within ~0.04 of paper.
  • Course R on bundled data: instrument rows and joint IV \(p\) do not match the paper (see replication note on tab-3).

3.6 Replication literacy

WarningBundled merge vs geocode merge

The AER package expects geocode-linked newid.dtadataset.dta. This course repo merges basicvariables.dta and localvariables.dta by row order (bundled_aligned). Core demographics match Table A-3; local IV variables are misaligned (`cor(caseid, newid) $). First-stage IV coefficients are the main casualty — not the logit functional form.

3.7 Think

  1. Why interact instruments with AFQT and mother’s education?
  2. If joint instrument relevance fails in your replication, should you proceed to MTE tables?

4 Act II — Selection on gains: Is MTE constant?

4.1 Story beat

The paper’s first major finding: agents select on ex post gains. Table 4(a) rejects linearity of \(E(Y \mid X, P)\) in \(P\) (polynomial test). Table 4(b) rejects equality of LATE across \(u_S\) bins. Table A-5 repeats 4(a) with double bootstrap for estimated \(P\).

This motivates everything in Acts III–V: average returns and IV are mixtures of heterogeneous MTE.

4.2 Read in the paper

  • Table 4(a)–(b) narrative; Web Appendix Table A-5
  • Heckman, Schmierer and Urzua (2010) test referenced in text

4.3 Identification / econometrics

Table 4(a): Estimate \(Y\) on \(X\), \(\hat P\), \(\hat P^2, \ldots, \hat P^d\); test joint significance of nonlinear terms. Probit first step (author bootstrap_MT.R); Murphy–Topel / bootstrap \(p\)-values; Romano–Wolf stepdown.

Table 4(b): Bin \(u_S\), average MTE within bins, bootstrap test adjacent LATE differences.

4.4 Replicate

Output Script Env vars R output
Export data export_data_with_tuition.R mainresults/table4a/data_with_tuition.RData
Table 4(a) Table_4a.R CHV2011_BOOT_4A=1000 tab-4a
Table 4(b) Table_4b.R CHV2011_BOOT=250 tab-4b
Table A-5 Table_A5.R CHV2011_BOOT_A5_OUT=500, CHV2011_BOOT_A5_IN=100 tab-A5

Core logic: chv2011-table4a-core.R.

4.5 Check your work

Test Paper Course R (bundled) Verdict
4(a) HC \(p\), deg 2 0.035 ≈0.035 Close
4(a) bootstrap \(p\), deg 2 0.035 ≈0.14 Gap
4(a) RW joint Reject Fail to reject Gap
A-5 RW critical ≈0.052 ≈0.054 Close
A-5 bootstrap \(p\) 0.004–0.026 0.09–0.28 Gap
4(b) pattern Declining LATE diff Negative diffs, joint reject Qualitative OK
NoteWhat the gap means for students

The course pipeline implements the author’s test logic and runs paper-grade bootstrap sizes. On bundled data, asymptotic inference tracks the paper better than pivotal bootstrap \(p\)-values. Treat this as a lesson in replication literacy: identical code structure ≠ identical numbers when inputs (merge) or unpublished details (double_bootstrap.R absent from AER bundle) differ.

4.6 Think

  1. If MTE is declining in \(u_S\), who has the highest returns — the inframarginal enrolled or those who skip college?
  2. Why does Table 4(a) use probit while Table 3 uses logit?

5 Act III — Normal selection benchmark

5.1 Story beat

Section II.B estimates a joint normal selection model (Willis–Rosen tradition). Figure 1 shows a precisely estimated declining MTE. Table 5 column 1 converts MTE to ATE, TT, TUT, and MPRTE. The normal model rejects constant MTE (\(\sigma_{1V} - \sigma_{0V} \neq 0\) in Table A-6) — consistent with Act II.

5.2 Read in the paper

  • Section II.B; Figure 1; Table 5 (normal column); Table A-6

5.3 Replicate

Output Script R output
Normal MTE curve Figure_1.R fig-1
Switching regression Table_A6.R tab-A6
Table 5 (normal col.) Table_5.R tab-5

Course shortcut: Figure 1 and Table 5 normal column use shipped normalmodel/mtexvb.out (author ML output) — not re-estimated in R.

5.4 Check your work

  • Figure 1: downward-sloping MTE; annualized scale (~divide by 4 years).
  • Table 5 normal: MPRTE (Z+ε) ≈0.066; TT > MPRTE > TUT ordering in paper.
  • Course R: normal ATE ≈0.066 matches; TT/TUT may differ slightly on bundled data.

5.5 Think

  1. Why are ATE/TT/TUT identified in the normal model but not in the semi-parametric column?
  2. What is unattractive about normal MTE at \(u_S \to 0\) or \(1\) (footnote 17)?

6 Act IV — Semi-parametric MTE (LIV)

6.1 Story beat

Section II.C is the empirical core: estimate \(E(Y \mid X, P=p)\) semi-parametrically, differentiate to get MTE, integrate with weights for MPRTE. Figures 2–3 show limited support of \(P\) given \(X\) (why ATE/TT/TUT are not identified). Figures 4–7 plot MTE and policy weights.

The author pipeline: Stata/GAUSS superboot (250 iterations) → MATLAB mte4c.m, treatwithiv.m. The course R pipeline uses a polynomial proxy aligned with Table 4(a) — faster, transparent, but not numerically identical.

6.2 Read in the paper

  • Section II.C; Figures 2–3; Web Appendix Table A-7 (Robinson partial linear)

6.3 Identification / econometrics

Two-step logic:

  1. \(\hat P = \text{logit}(S \mid Z, X)\)
  2. Partially linear outcome: \(Y = X'\delta_0 + P \cdot X'(\delta_1 - \delta_0) + K(P) + \varepsilon\)
  3. MTE\((u_S) = \partial K(p) / \partial p\) evaluated at \(p = u_S\) (plus \(X\) terms)

Trim \(P \in [0.0324, 0.9775]\) (common support).

6.4 Replicate

Output Script Module R output
MTE / MPRTE / IV / OLS Table_5.R chv2011-mte-core.R tab-5
Adjacent LATE test Table_4b.R same tab-4b
Partial linear Table_A7.R data-prep tab-A7
Figures Figure_2.R · Figure_7.R chv2011-figures.R fig-2
# CHV2011_BOOT=250 Rscript 04-topics/rep-chv2011/Rcode/Table_5.R
# Rscript 04-topics/rep-chv2011/Rcode/run_all_figures.R

6.5 Check your work

Quantity Paper (semi / IV / OLS) Course R (bundled)
MPRTE (Z+ε) 0.080 ≈0.038
Linear IV 0.095 ≈0.119
OLS 0.084 ≈0.077

Pattern to verify: IV and OLS same order of magnitude; MTE declining in \(u_S\); MPRTE below TT in normal column.

6.6 Think

  1. Why is semi-parametric MPRTE estimated but ATE labeled “Not Identified”?
  2. How does Figure 2 change your reading of Table 5?

7 Act V — Policy parameters and linear IV

7.1 Story beat

Table 5 compares policy objects: MPRTE for three policy shifts (Table 1 weights), linear IV using \(P(Z)\) as instrument, and OLS. The conclusion: MPRTE < TT for marginal expansions considered — expanding college to the next marginal group yields lower returns than inframarginal enrolled. Table A-8 reports linear IV/OLS with bootstrap SEs as a bridge to the IV literature.

Paper quote (Conclusion): “Better MPRTE than LATE?” — policy questions should target the margin the policy moves.

7.2 Read in the paper

  • Table 5; Section III Summary; Table A-8

7.3 Replicate

7.4 Check your work

  • MPRTE policies roughly 6–8% annualized (semi-parametric) in paper; IV ≈9.5%, OLS ≈8.4%.
  • Course R: IV/OLS magnitudes reasonable; semi-parametric MPRTE below paper (polynomial proxy).

7.5 Think

  1. When is linear IV a weighted average of MTE? Which margins does \(P(Z)\) weight?
  2. Murray (2008) argument vs CHV finding on low marginal returns for some expansions.

8 Act VI — Robustness (Table 6)

8.1 Story beat

Section II.D re-estimates treatment parameters under eight specifications: sample restrictions, dropout dummies, alternative IV sets, Cameron–Taber instrument subset, no tuition, and SLS index for \(f(P \mid X)\).

Finding: Point estimates are stable across most columns; SLS index (column 8) matches baseline but is computationally prohibitive for bootstrap SEs.

8.2 Read in the paper

  • Section II.D; Table 6(a)–(b)

8.3 Replicate

  • Table_6.R · run_chv2011_spec() for each column
  • Env: CHV2011_BOOT=250 (~7 min for all eight columns on bundled data)
  • Output: tab-6a, tab-6b

8.4 Check your work

  • SLS column: matches paper (cached author coefficients; npindex not run).
  • No-dropouts TT: R ≈0.243 vs paper ≈0.261 (closest column).
  • Baseline \(\tilde{\text{ATE}}\): R ≈0.044 vs paper ≈0.082 (polynomial MTE gap).
WarningAuthor vs course pipeline (Table 6)

Each author column re-runs table6/*/superboot (GAUSS cdens + MATLAB treatparnew.m). Course R reuses one polynomial MTE core with spec-varying formulas — a deliberate teaching tradeoff. Stability across R columns still illustrates robustness logic; levels vs paper require the author pipeline or geocode data.

8.5 Think

  1. Which robustness check speaks to the Card proximity instrument vs tuition?
  2. Why does the paper cache SLS point estimates but not bootstrap SEs?

9 Act VII — What we can and cannot reproduce

9.1 Story beat

Step back from individual tables. The course repo delivers a complete R teaching pipeline; the AER package delivers a multi-language production pipeline. Both are “replications” — with different completion criteria.

9.2 Course R pipeline — done

Phase Deliverable Status
P0 Merge + descriptive tables
P1 bootdata/ (250)
P2 Table 4(a), A-5 at paper \(B\) ≈(bootstrap \(p\) gaps on bundled data)
P3 Table 4(b), 5, Figs 1–7 at \(B=250\) ≈(polynomial MTE proxy)
P4 Table 6 eight columns at \(B=250\)

9.3 Author pipeline — not run in course repo

Component Software Unlocks
getdataset.do + geocode Stata Exact IV merge
superboot GAUSS + MATLAB + shell Exact MTE, Table 4(b), 5 semi
cdens / chv01b.prg GAUSS Semi-parametric MTE shape
table6/bootdata_nodrop Stata Nodrop bootstrap SEs
npindex SLS R (np package, >24h) SLS column estimation

9.4 Reproducibility lessons (for your memo)

  1. Data linkage matters as much as econometric code (Act I literacy box).
  2. Asymptotic vs bootstrap can diverge when \(P\) is estimated (Act II).
  3. Proxy vs production estimator: polynomial MTE teaches LIV logic; GAUSS implements it (Act IV).
  4. Honest reporting: HC deg-2 matches; bootstrap joint tests may not — both are findings (Act II table).

10 Capstone — Identification and replication memo

Write 1–2 pages in English. Suggested rubric:

  1. Policy question
    What marginal expansion of college attendance is MPRTE designed to evaluate (Table 1, Z+ε policy)?

  2. Selection on gains
    Cite Table 4(a) or 4(b): what evidence rejects constant MTE / equal LATEs?

  3. Heterogeneity
    One sentence on Figure 1 or 4: who has highest vs lowest marginal returns?

  4. IV vs MPRTE vs OLS
    From Table 5, compare linear IV and one MPRTE row. Why can they differ under heterogeneity and selection?

  5. Replication audit
    Pick one issue from this guide (bundled merge, Table 4(a) bootstrap gap, polynomial MTE proxy, or Table 6 SLS cache). State what you verified in replicate-chv2011-R.qmd.

  6. Robustness
    One column from Table 6: what sensitivity question does it answer?

Deliverable (suggested). PDF memo + log showing you ran Table_A3.R and one of Table_4a.R, Table_5.R, or Table_6.R.


11 Appendix — Replication map

Table / Fig Paper section Act R script Completion on bundled data
1 I.A 0 Table_1.R Display only
2 II.A I Table_2.R Display only
3 II.A I Table_3.R Partial (controls ✓; IV ~
4(a) II.A II Table_4a.R Partial (HC ✓; boot ~
4(b) II.A II Table_4b.R Partial (pattern ~
5 II.B–C III–V Table_5.R Partial
6(a)(b) II.D VI Table_6.R Partial (SLS cached)
A-3 App. I Table_A3.R Complete (core vars)
A-4 App. I Table_A4.R Partial
A-5 App. II Table_A5.R Partial (RW ✓; boot ~
A-6 App. III Table_A6.R Partial
A-7 App. IV Table_A7.R Partial
A-8 App. V Table_A8.R Partial
Fig 1 II.B III Figure_1.R Shipped mtexvb.out
Figs 2–7 II.C IV Figure_*.R Polynomial proxy

Render this page:

quarto render 04-topics/rep-chv2011/replication-chv2011-guide.qmd

Minimal merge check:

library(here)
source(here("04-topics/rep-chv2011/Rcode/chv2011-data-prep.R"))
raw <- load_chv2011_raw() |> add_chv2011_derived()
validate_chv2011_merge(raw)

Related course materials

12 References

Carneiro, Pedro, James J. Heckman, and Edward Vytlacil. 2011. “Estimating Marginal Returns to Education.” The American Economic Review 101 (6): 2754–81. https://doi.org/10.1257/aer.101.6.2754.