| Table 1: The Effect of Quarter of Birth on Various Educational Outcome Variables | ||||||
| Outcome variable | Birth cohort | Mean |
Quarter-of-birth effect
|
F-test | ||
|---|---|---|---|---|---|---|
| QTR1 | QTR2 | QTR3 | ||||
| Total years of education | 1930-1939 | 12.7922 | -0.1243 (0.0167) |
-0.0860 (0.0168) |
-0.0149 (0.0160) |
25.1183 [0.0000] |
| Total years of education | 1940-1949 | 13.5600 | -0.0855 (0.0125) |
-0.0348 (0.0126) |
-0.0188 (0.0126) |
17.3582 [0.0000] |
| High school graduate | 1930-1939 | 0.7741 | -0.0191 (0.0021) |
-0.0198 (0.0021) |
-0.0039 (0.0020) |
46.5981 [0.0000] |
| High school graduate | 1940-1949 | 0.8637 | -0.0145 (0.0014) |
-0.0121 (0.0014) |
-0.0020 (0.0014) |
51.1658 [0.0000] |
| Years of educ. for high school graduates | 1930-1939 | 14.0060 | -0.0296 (0.0143) |
0.0051 (0.0143) |
0.0165 (0.0136) |
3.7914 [0.0099] |
| Years of educ. for high school graduates | 1940-1949 | 14.2813 | -0.0093 (0.0110) |
0.0205 (0.0111) |
0.0079 (0.0111) |
2.6535 [0.0468] |
| College graduates | 1930-1939 | 0.2356 | -0.0050 (0.0022) |
0.0028 (0.0022) |
0.0019 (0.0021) |
4.9976 [0.0018] |
| College graduates | 1940-1949 | 0.2996 | -0.0028 (0.0019) |
0.0046 (0.0019) |
-0.0000 (0.0019) |
5.1465 [0.0015] |
| Completed master’s degree | 1930-1939 | 0.0898 | -0.0010 (0.0015) |
0.0019 (0.0015) |
-0.0009 (0.0014) |
1.7151 [0.1615] |
| Completed master’s degree | 1940-1949 | 0.1102 | 0.0001 (0.0013) |
0.0039 (0.0013) |
0.0010 (0.0013) |
3.8492 [0.0091] |
| Completed doctoral degree | 1930-1939 | 0.0350 | 0.0016 (0.0009) |
0.0025 (0.0009) |
0.0004 (0.0009) |
2.8831 [0.0343] |
| Completed doctoral degree | 1940-1949 | 0.0360 | -0.0018 (0.0008) |
0.0009 (0.0008) |
-0.0005 (0.0008) |
4.3011 [0.0049] |
Replication with R
Does Compulsory School Attendance Affect Schooling and Earning?
Introduction
Hu Huaping replicate all 8 Tables in (Angrist and Krueger 1991) using R and provide some learning resources and notes. The R codes (.R) and results files (.rds) could be found here.
Thanks to Win Supanwanid for providing the data and STATA scripts <www.github.com/winsup/angrist_krueger_1991>.
R Scripts and Results files
The data set and R scripts are provided as following:
- Data set:
ak91.dta - R code:
Table_I.R - R code:
Table_II.R - R code:
Table_III.R - R code:
Table_IV.R - R code:
Table_V.R - R code:
Table_VI.R - R code:
Table_VII.R - R code:
Table_VIII.R
The results table (table_data and gt_tbl objects) are provided as following (you can download the files and use them in your own project):
- R results object file for Table I:
Table_I.RData - R results object file for Table II:
Table_II.RData - R results object file for Table III:
Table_III.RData - R results object file for Table IV:
Table_IV.RData - R results object file for Table V:
Table_V.RData - R results object file for Table VI:
Table_VI.RData - R results object file for Table VII:
Table_VII.RData - R results object file for Table VIII:
Table_VIII.RData
Learning Resources
Econometric Replication Paper Project of “Does Compulsory School Attendance Affect Schooling and Earning?” github repo. notes: Stata code and Latex pdf files; also all tables and figures in the paper are provided.
Causal Inference: The Mixtape online free book and github repo here. Chapter 7 Instrumental Variables. notes: Throughly explained this example, but not supply the data and code.
(Cunningham 2021) Cunningham, S. Causal Inference: The Mixtape[M]. Yale University Press, 2021.
Useful slides for the textbook companion by Prof Scott Cunningham: chapter 7 Instrumental Variables. see here.
Dive into Advanced IV topic with Prof Peter Hull. see here. notes: slides and R code.
R Code for Mastering ’Metrics online free book. Chapter 9: Quarter of Birth and Returns to Schooling. notes: useful Figures with R code.
- (Angrist and Pischke 2015) Angrist, J. D., and J. S. Pischke. Mastering “Metrics— The Path from Cause to Effect[M]. Princeton, Oxford: Princeton University Press, 2015.
chapter 13 Instrumental Variables Regression. by (Muller, Winship, and Morgan 2014) Muller, C., C. Winship, and S. Morgan. Instrumental Variables Regression[A]. The SAGE Handbook of Regression Analysis and Causal Inference[C]. SAGE Publications Ltd, 2014. notes: Another useful example which may supply R code.
- See the book of (Best and Wolf 2015). Best, H., and C. Wolf. The SAGE Handbook of Regression Analysis and Causal Inference[M]. Los Angeles: Sage, 2015. we may find more resources in this book’s website with permission by subscription.
Appendix Tables
Table I
Notes 1: Be careful about the calculation of movement average MA. We should calculate the mean of EDUC and n for each YQ group with the full sequence of YQ within the two cohorts (1930-1939, and 1940-1949). The lag and lead calculation of MA should be based on the full sequence of YQ within the two cohorts. Check the theory math in the paper. You can read the R code lines to understand the correct calculation of MA.
\[ \begin{aligned} M A_{c j} & =\frac{E_{-2}+E_{-1}+E_{+1}+E_{+2}}{4} & \text{(1)}\\ E_{i c j}-M A_{c j} & =\alpha+\sum_{j}^{3} \beta_{j} Q_{i c j}+\epsilon_{i c j} & \text{(2)} \end{aligned} \]
for \(i=1,2, \ldots, N_{c} ; c=1,2, \ldots, 10 ; j=1,2,3\)
- \(E_{i c j}\) : the educational outcome variable listed in Table I: total years of education, high school graduate, years of education for high school graduates, college graduates, completed master’s degree, completed doctoral degree.
- \(M A_{c j}\) : the moving average of educational outcome variable for men born in the surrounding quarters. It’s purpose is to detrend the series.
Notes 2: Keep in mind that there are two different dependent variables in Table I: years of EDUC and level of EDUC. Both of these two dependent variables are calculated by using de-trended technique. The variable years of EDUC is calculated by using EDUC - MA(Educ), and the mean of EDUC may be 12.05 for cohort 1930-1939. While level of EDUC is calculated by using EDUC dummies indirectly,e.g. hs_grad - MA(hs_grad), and the mean of hs_grad may be 0.25 for cohort 1930-1939.
Table II
| Table II: Percent enrolled April 1, 19601,2 | |||
Type of state law
|
|||
|---|---|---|---|
| Date of Birth |
Percent enrolled April 1, 1960
|
Column (1) - (2) | |
| School-leaving age: 16 (1) |
School-leaving age: 17 or 18 (2) |
||
Jan 1-Mar 31, 1994
|
84.8912 (0.3827) |
85.7213 (0.7705) |
-0.8301 (0.8604) |
Apr 1-Dec 31, 1994
|
85.7225 (0.2179) |
85.9915 (0.4442) |
-0.2690 (0.4947) |
Within-state diff.
|
-0.8314 (0.4404) |
-0.2702 (0.8894) |
-0.5611 (0.9925) |
| 1 Standard errors in parentheses, both estimates and standard errors are in percentage | |||
| 2 The result of this table will not be the same as in the paper since the author doesn’t provide data for this session. I only used the available data to approximate the result of this table. Results in the last column were manually calculated e.g. \(-0.5611=(-0.8301)-(-0.2690)\) | |||
Notes 1: The results of this table will not be the same as in the paper since the author doesn’t provide data for this session. I only used the available data (appendix2_table.dta) to approximate the result of this table. Results in the last column were manually calculated e.g. \(-0.5611=(-0.8301)-(-0.2690)\)
Table III
| Table 3: Wald Estimates1 | |||
| Variable |
(1)
|
(2)
|
(3)
|
|---|---|---|---|
| Born in 1st quarter of year | Born in 2nd, 3rd, or 4th quarter of year | Difference (std. error) (1)-(2) | |
| PANEL A: WALD ESTIMATES FOR 1970 CENSUS - MEN BORN 1920-1929 | |||
| \(ln\)(wkly. wage) | 5.1485 | 5.1574 | -0.0090 (0.0030) |
| Education | 11.3996 | 11.5252 | -0.1256 (0.0155) |
| Wald est. of return to education | 0.0715 (0.0256) | ||
| OLS return to education | 0.0801 (0.0004) | ||
| PANEL B: WALD ESTIMATES FOR 1980 CENSUS - MEN BORN 1930-1939 | |||
| \(ln\)(wkly. wage) | 5.8916 | 5.9027 | -0.0111 (0.0027) |
| Education | 12.6881 | 12.7969 | -0.1088 (0.0132) |
| Wald est. of return to education | 0.1020 (0.0281) | ||
| OLS return to education | 0.0709 (0.0003) | ||
| 1 The results of standard errors of Wald estimates is different from Stata's nlcom command, eg. R result is 0.02556816 in panel A while Stata's result is 0218682. | |||
Notes 1: The results of Wald Estimates in R are not consistent with the results in the paper of AK1991 and the results in STATA. I think the reason is the standard error calculation is not correct.
Here is the STATA command for calculating the Wald Estimates:
sureg (eq1: LWKLYWGE QTR1 ) (eq2: EDUC QTR1 ) if COHORT==3039
nlcom ratio:[eq1]_b[QTR1]/[eq2]_b[QTR1]
And here is the R code for calculating the Wald Estimates:
# Seemingly unrelated regression for Panel A
sur_a <- systemfit(list(
wage = LWKLYWGE ~ QTR1,
educ = EDUC ~ QTR1
), data = panel_a)
# Calculate Wald estimate for Panel A
# Note: In Stata, the standard error is calculated using nlcom command after sureg
# We use the SUR results from systemfit to match Stata's calculation
# From coef(sur_a), we can see:
# wage_QTR1 is at index 2 (-0.008978875)
# educ_QTR1 is at index 4 (-0.125555298)
wald_a <- coef(sur_a)[2] / coef(sur_a)[4] # ratio of QTR1 coefficients
# Extract the variance-covariance matrix from SUR
# From vcov_sur_a, we can see:
# wage_QTR1 variance is at [2,2] (9.070617e-06)
# educ_QTR1 variance is at [4,4] (2.414637e-04)
# wage_QTR1-educ_QTR1 covariance is at [2,4] (0.000000e+00)
vcov_sur_a <- vcov(sur_a)
# Calculate standard error using the delta method with SUR variance-covariance matrix
# The formula follows Stata's nlcom command calculation:
# SE = sqrt(Var(b1/b2)) = sqrt(
# (Var(b1)/b2^2) + (b1^2*Var(b2)/b2^4) - (2*b1*Cov(b1,b2)/b2^3)
# )
# where b1 is the coefficient of QTR1 in wage equation
# and b2 is the coefficient of QTR1 in education equation
wald_a_se <- sqrt(
(vcov_sur_a[2,2] / (coef(sur_a)[4]^2)) +
((coef(sur_a)[2]^2) * vcov_sur_a[4,4] / (coef(sur_a)[4]^4)) -
(2 * coef(sur_a)[2] * vcov_sur_a[2,4] / (coef(sur_a)[4]^3))
)Table IV
| Table IV: OLS and TSLS Estimates of the Return to Education for Men Born 1920-1929: 1970 Census1,2 | ||||||||
| Independent Variables | (1) OLS |
(2) TSLS |
(3) OLS |
(4) TSLS |
(5) OLS |
(6) TSLS |
(7) OLS |
(8) TSLS |
|---|---|---|---|---|---|---|---|---|
| Years of education | 0.0802*** (0.0004) |
0.0769*** (0.0150) |
0.0802*** (0.0004) |
0.1310*** (0.0334) |
0.0701*** (0.0004) |
0.0669*** (0.0151) |
0.0701*** (0.0004) |
0.1007** (0.0334) |
| Race (1 = black) | -0.2980*** (0.0043) |
-0.3055*** (0.0353) |
-0.2980*** (0.0043) |
-0.2271** (0.0776) |
||||
| SMSA (1 = center city) | -0.1343*** (0.0026) |
-0.1362*** (0.0092) |
-0.1343*** (0.0026) |
-0.1163*** (0.0198) |
||||
| Married (1 = married) | 0.2928*** (0.0037) |
0.2941*** (0.0072) |
0.2928*** (0.0037) |
0.2804*** (0.0141) |
||||
| Age | 0.1446* (0.0676) |
0.1409* (0.0704) |
0.1162 (0.0652) |
0.1170 (0.0662) |
||||
| Age-squared | -0.0015* (0.0007) |
-0.0014 (0.0008) |
-0.0013 (0.0007) |
-0.0012 (0.0007) |
||||
| 1 Yes-No Dummies are not kept in the shown figure. | ||||||||
| 2 \(^{*}p<0.05; ^{**}p<0.01; ^{***}p<0.001\) | ||||||||
Table V
| Table V: OLS and TSLS Estimates of the Return to Education for Men Born 1930-1939: 1980 Census1,2,3,4 | ||||||||
| Independent Variables | (1) OLS |
(2) TSLS |
(3) OLS |
(4) TSLS |
(5) OLS |
(6) TSLS |
(7) OLS |
(8) TSLS |
|---|---|---|---|---|---|---|---|---|
| Years of education | 0.0711*** (0.0003) |
0.0891*** (0.0161) |
0.0711*** (0.0003) |
0.0760** (0.0290) |
0.0632*** (0.0003) |
0.0806*** (0.0164) |
0.0632*** (0.0003) |
0.0600* (0.0290) |
| Race (1 = black) | -0.2575*** (0.0040) |
-0.2302*** (0.0261) |
-0.2575*** (0.0040) |
-0.2626*** (0.0458) |
||||
| SMSA (1 = center city) | -0.1763*** (0.0029) |
-0.1581*** (0.0174) |
-0.1763*** (0.0029) |
-0.1797*** (0.0305) |
||||
| Married (1 = married) | 0.2479*** (0.0032) |
0.2440*** (0.0049) |
0.2479*** (0.0032) |
0.2486*** (0.0073) |
||||
| Age | -0.0772 (0.0621) |
-0.0801 (0.0645) |
-0.0760 (0.0604) |
-0.0741 (0.0626) |
||||
| Age-squared | 0.0008 (0.0007) |
0.0008 (0.0007) |
0.0008 (0.0007) |
0.0007 (0.0007) |
||||
| 1 Yes-No Dummies are not kept in the shown figure. | ||||||||
| 2 \(^{*}p<0.05; ^{**}p<0.01; ^{***}p<0.001\) | ||||||||
| 3 The dependent variable is the log of weekly wage. The sample includes men born 1930-1939 in the 1980 Census. | ||||||||
| 4 The excluded instrumental variables for all TSLS models are not shown in the table and are the same which are the 27 interaction terms between the quarter of birth and year of birth \(QTR_j \times YR_i\), \(i \in \{0, 1, \dots, 9\}\) and \(j \in \{1, 2, 3\}\). | ||||||||
Table VI
| Table VI: OLS and TSLS Estimates of the Return to Education for Men Born 1940-1949: 1980 Census1,2,3,4 | ||||||||
| Independent Variables | (1) OLS |
(2) TSLS |
(3) OLS |
(4) TSLS |
(5) OLS |
(6) TSLS |
(7) OLS |
(8) TSLS |
|---|---|---|---|---|---|---|---|---|
| Years of education | 0.0573*** (0.0003) |
0.0553*** (0.0138) |
0.0573*** (0.0003) |
0.0948*** (0.0223) |
0.0520*** (0.0003) |
0.0393** (0.0145) |
0.0521*** (0.0003) |
0.0779** (0.0239) |
| Race (1 = black) | -0.2107*** (0.0032) |
-0.2266*** (0.0183) |
-0.2108*** (0.0032) |
-0.1786*** (0.0299) |
||||
| SMSA (1 = center city) | -0.1418*** (0.0023) |
-0.1535*** (0.0135) |
-0.1419*** (0.0023) |
-0.1182*** (0.0220) |
||||
| Married (1 = married) | 0.2445*** (0.0022) |
0.2442*** (0.0022) |
0.2444*** (0.0022) |
0.2450*** (0.0023) |
||||
| Age | 0.1800*** (0.0389) |
0.1325** (0.0486) |
0.1518*** (0.0379) |
0.1215* (0.0474) |
||||
| Age-squared | -0.0023*** (0.0006) |
-0.0016* (0.0007) |
-0.0019*** (0.0005) |
-0.0015* (0.0007) |
||||
| 1 Yes-No Dummies are not kept in the shown figure. | ||||||||
| 2 \(^{*}p<0.05; ^{**}p<0.01; ^{***}p<0.001\) | ||||||||
| 3 The dependent variable is the log of weekly wage. The sample includes men born 1940-1949 in the 1980 Census. | ||||||||
| 4 The excluded instrumental variables for all TSLS models are not shown in the table and are the same which are the 27 interaction terms between the quarter of birth and year of birth \(QTR_j \times YR_i\), \(i \in \{0, 1, \dots, 9\}\) and \(j \in \{1, 2, 3\}\). | ||||||||
Table VII
| Table VII: OLS and TSLS Estimates of the Return to Education for Men Born 1930-1939: 1980 Census1,2,3,4 | ||||||||
| Independent Variables | (1) OLS |
(2) TSLS |
(3) OLS |
(4) TSLS |
(5) OLS |
(6) TSLS |
(7) OLS |
(8) TSLS |
|---|---|---|---|---|---|---|---|---|
| Years of education | 0.0673*** (0.0003) |
0.0928*** (0.0093) |
0.0673*** (0.0003) |
0.0907*** (0.0107) |
0.0628*** (0.0003) |
0.0831*** (0.0095) |
0.0628*** (0.0003) |
0.0811*** (0.0109) |
| Race(1 = black) | -0.2547*** (0.0043) |
-0.2333*** (0.0109) |
-0.2547*** (0.0043) |
-0.2354*** (0.0122) |
||||
| SMSA (1 = center city) | -0.1705*** (0.0029) |
-0.1511*** (0.0095) |
-0.1705*** (0.0029) |
-0.1531*** (0.0107) |
||||
| Married (1 = married) | 0.2487*** (0.0032) |
0.2435*** (0.0040) |
0.2487*** (0.0032) |
0.2441*** (0.0042) |
||||
| Age | -0.0757 (0.0617) |
-0.0880 (0.0624) |
-0.0778 (0.0603) |
-0.0876 (0.0609) |
||||
| Age-squared | 0.0008 (0.0007) |
0.0009 (0.0007) |
0.0008 (0.0007) |
0.0009 (0.0007) |
||||
| 1 Yes-No Dummies are not kept in the shown figure. | ||||||||
| 2 \(^{*}p<0.05; ^{**}p<0.01; ^{***}p<0.001\) | ||||||||
| 3 The dependent variable is the log of weekly wage. The sample includes men born 1930-1939 in the 1980 Census. | ||||||||
| 4 The excluded instrumental variables for all TSLS models are not shown in the table and include quarter of birth-year of birth (27 interactions) and quarter of birth-state of birth (150 interactions) dummies. | ||||||||
Table VIII
| Table VIII: OLS and TSLS Estimates of the Return to Education for Black Men Born 1930-1939: 1980 Census1,2,3,4 | ||||||||
| Independent Variables | (1) OLS |
(2) TSLS |
(3) OLS |
(4) TSLS |
(5) OLS |
(6) TSLS |
(7) OLS |
(8) TSLS |
|---|---|---|---|---|---|---|---|---|
| Years of education | 0.0672*** (0.0013) |
0.0635*** (0.0185) |
0.0671*** (0.0013) |
0.0555** (0.0199) |
0.0576*** (0.0013) |
0.0461* (0.0187) |
0.0576*** (0.0013) |
0.0391 (0.0199) |
| SMSA (1 = center city) | -0.1885*** (0.0142) |
-0.2053*** (0.0308) |
-0.1884*** (0.0142) |
-0.2155*** (0.0324) |
||||
| Married (1 = married) | 0.2216*** (0.0100) |
0.2272*** (0.0136) |
0.2216*** (0.0100) |
0.2307*** (0.0140) |
||||
| Age | -0.3099 (0.2538) |
-0.3274 (0.2560) |
-0.2978 (0.2473) |
-0.3237 (0.2497) |
||||
| Age-squared | 0.0033 (0.0028) |
0.0035 (0.0028) |
0.0032 (0.0027) |
0.0035 (0.0028) |
||||
| 1 Yes-No Dummies are not kept in the shown figure. | ||||||||
| 2 \(^{*}p<0.05; ^{**}p<0.01; ^{***}p<0.001\) | ||||||||
| 3 The dependent variable is the log of weekly wage. The sample includes black men born 1930-1939 in the 1980 Census. | ||||||||
| 4 The excluded instrumental variables for all TSLS models are not shown in the table and include quarter of birth-year of birth (27 interactions) and 150 quarter of birth-state of birth interactions. | ||||||||
Notes 1: I found the sample data set contains only 50 states, but the paper reports 51 states. I think this is because the filtered the data set keep for black thus exclude one of the states. So your should be careful about the state dummy set you used.
Notes 2: I found the reported Table VIII in Win Supanwanid using STATA scripts, is not consistent with the Paper of AK1991. So I checked the R code and obtain the same results as the Paper of AK1991. I think the problem behind Table VIII in Win Supanwanid is source from the state dummy as I have mentioned in Notes 1.