作业03：R代码 – 计量经济学II

这里将展示Homework 03的R编程细节

数据准备

Show the code

require(knitr)
require(here)
require(broom)
require(haven)
require(systemfit)
require(AER)
#renv::install("ivmodel")
require(ivmodel) # for LIML fit
# load package
library(systemfit)

为了研究居民的工资收入（wage，或对数化的工资lwage）是如何决定的，我们考虑如下所示的一些变量（具体定义见表 1)）。

表 1: 变量定义及说明

变量_代码	变量_中文	定义和取值
obs	序号	序号
lwage	对数化工资	定量变量：取对数后的工资
edu	受教育年数	定量变量：受教育年数（年）
exp	工作年数	定量变量：工作年数(年)
exp2	工作年数平方	定量变量：工作年数平方/100
black	是否黑人	虚拟变量：1=黑人；0=其他
south	是否南方地区	虚拟变量：1=南方地区；0=其他
urban	是否居住市区	虚拟变量：1=居住市区；0=其他
college	附近是否有大学	虚拟变量：1=附近有大学；0=其他
public	附近是否有公立大学	虚拟变量：1=附近有公立大学；0=其他
private	附近是否有私立大学	虚拟变量：1=附近有私立大学；0=其他
age	年龄	定量变量：年龄(年)
age2	年龄平方	定量变量：年龄(年)的平方/100
momedu	母亲受教育年数	定量变量：母亲受教育年数（年）
dadedu	父亲受教育年数	定量变量：父亲受教育年数（年）

任务1：OLS回归分析

Warning: replacing previous import 'stats::filter' by 'dplyr::filter' when
loading 'xmerit'

Warning: replacing previous import 'stats::lag' by 'dplyr::lag' when loading
'xmerit'

\[\begin{align} \begin{split} lwage_i=&+\beta_{1}+\beta_{2}edu_i+\beta_{3}exp_i+\beta_{4}exp2_i\\&+\beta_{5}black_i+\beta_{6}south_i+\beta_{7}urban_i+u_i \end{split} \end{align}\]

Show the code

# model ols
mod_ls <- formula("lwage ~ edu + exp + exp2 + black + south + urban")
fit_lm <- lm(formula = mod_ls, data = tbl_reg)
smry_lm <- summary(fit_lm)


Call:
lm(formula = mod_ls, data = tbl_reg)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.59297 -0.22315  0.01893  0.24223  1.33190 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  4.733664   0.067603  70.022  < 2e-16 ***
edu          0.074009   0.003505  21.113  < 2e-16 ***
exp          0.083596   0.006648  12.575  < 2e-16 ***
exp2        -0.224088   0.031784  -7.050 2.21e-12 ***
black       -0.189632   0.017627 -10.758  < 2e-16 ***
south       -0.124862   0.015118  -8.259  < 2e-16 ***
urban        0.161423   0.015573  10.365  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.3742 on 3003 degrees of freedom
Multiple R-squared:  0.2905,    Adjusted R-squared:  0.2891 
F-statistic: 204.9 on 6 and 3003 DF,  p-value: < 2.2e-16

任务2：一个内生变量且等量工具变量情形下的2SLS分析

edu内生，工具变量为college：

Show the code

mod_iv_cl <- formula("lwage ~ edu + exp + exp2 + black + south + urban |college + exp + exp2 + black + south + urban")
fit_iv_cl <- AER::ivreg(formula = mod_iv_cl,data = tbl_reg )
smry_iv_cl <- summary(fit_iv_cl)


Call:
AER::ivreg(formula = mod_iv_cl, data = tbl_reg)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.82125 -0.24065  0.02368  0.25469  1.43205 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  3.75278    0.82934   4.525 6.27e-06 ***
edu          0.13229    0.04923   2.687  0.00725 ** 
exp          0.10750    0.02130   5.047 4.76e-07 ***
exp2        -0.22841    0.03341  -6.836 9.84e-12 ***
black       -0.13080    0.05287  -2.474  0.01342 *  
south       -0.10490    0.02307  -4.546 5.67e-06 ***
urban        0.13132    0.03013   4.359 1.35e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.391 on 3003 degrees of freedom
Multiple R-Squared: 0.2252, Adjusted R-squared: 0.2237 
Wald test: 120.8 on 6 and 3003 DF,  p-value: < 2.2e-16

任务3：两个内生变量且等量工具变量情形下的2SLS分析

edu、exp和exp2内生，工具变量为college、age和age2：

Show the code

mod_iv_ca <- formula("lwage ~ edu + exp + exp2 + black + south + urban |college + age + age2 + black + south + urban")
fit_iv_ca <- AER::ivreg(formula = mod_iv_ca,data = tbl_reg )
smry_iv_ca <- summary(fit_iv_ca)


Call:
AER::ivreg(formula = mod_iv_ca, data = tbl_reg)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.82400 -0.25248  0.02286  0.26349  1.31561 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  4.06567    0.60850   6.682 2.81e-11 ***
edu          0.13295    0.05138   2.588 0.009712 ** 
exp          0.05596    0.02599   2.153 0.031412 *  
exp2        -0.07957    0.13403  -0.594 0.552796    
black       -0.10314    0.07737  -1.333 0.182623    
south       -0.09818    0.02876  -3.413 0.000651 ***
urban        0.10798    0.04974   2.171 0.030010 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.4032 on 3003 degrees of freedom
Multiple R-Squared: 0.1764, Adjusted R-squared: 0.1747 
Wald test: 148.1 on 6 and 3003 DF,  p-value: < 2.2e-16

任务4：一个内生变量且富余工具变量情形下的2SLS分析

edu内生，工具变量为public和private：

Show the code

mod_tsls_pp <- formula("lwage ~ edu + exp + exp2 + black + south + urban |public +private + exp + exp2 + black + south + urban")
fit_tsls_pp <- AER::ivreg(formula = mod_tsls_pp,data = tbl_reg )
smry_tsls_pp <- summary(fit_tsls_pp)


Call:
AER::ivreg(formula = mod_tsls_pp, data = tbl_reg)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.93985 -0.25152  0.01722  0.27365  1.48154 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  3.26801    0.68718   4.756 2.07e-06 ***
edu          0.16109    0.04077   3.951 7.96e-05 ***
exp          0.11931    0.01818   6.564 6.16e-11 ***
exp2        -0.23054    0.03503  -6.582 5.46e-11 ***
black       -0.10173    0.04531  -2.245   0.0248 *  
south       -0.09504    0.02165  -4.389 1.18e-05 ***
urban        0.11645    0.02705   4.305 1.73e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.4108 on 3003 degrees of freedom
Multiple R-Squared: 0.1447, Adjusted R-squared: 0.143 
Wald test:   111 on 6 and 3003 DF,  p-value: < 2.2e-16

任务5：两个内生变量且富余工具变量情形下的2SLS分析

edu、exp和exp2内生，工具变量为public和private、以及age和age2：

Show the code

mod_tsls_ppa <- formula("lwage ~ edu + exp + exp2 + black + south + urban |public +private + age + age2 + black + south + urban")
fit_tsls_ppa <- AER::ivreg(formula = mod_tsls_ppa,data = tbl_reg )
smry_tsls_ppa <- summary(fit_tsls_ppa)


Call:
AER::ivreg(formula = mod_tsls_ppa, data = tbl_reg)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.92063 -0.27218  0.02074  0.28186  1.43044 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  3.74815    0.48338   7.754 1.21e-14 ***
edu          0.15969    0.04090   3.904 9.65e-05 ***
exp          0.04703    0.02502   1.880 0.060213 .  
exp2        -0.03225    0.12811  -0.252 0.801255    
black       -0.06403    0.06301  -1.016 0.309610    
south       -0.08573    0.02563  -3.345 0.000834 ***
urban        0.08348    0.04125   2.024 0.043073 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.4327 on 3003 degrees of freedom
Multiple R-Squared: 0.05111,    Adjusted R-squared: 0.04921 
Wald test: 129.7 on 6 and 3003 DF,  p-value: < 2.2e-16

任务6：模型综合比较

以上模型的综合汇总结果比较见下表：

说明：

如果使用两阶段最小二乘法，则只列出了第二阶段回归结果
最后一列LIML_pp采用的是有限信息极大似然估计方法（LIML）。

采用R包`systemfit`进行估计

等量工具变量情形

在节 4 中，给定情形为：edu、exp和exp2内生，工具变量为college、age和age2。作为对比，我们此处使用R包systemfit进行估计。具体结果见表 2：

Note

方程1.3需要去掉截距项，否则会报错！
hansen2021此时应用的是IV(b)方法。

表 2: 等量工具变量下的2SLS结果

Show the code

```{r}
#| label: tbl-just-systemfit-3t3
#| tbl-cap: "等量工具变量下的2SLS结果"
#| eval: false

# load package
library(systemfit)
# set two models
eq_1.1 <- edu ~  age + age2 +black +south +urban + college
eq_1.2 <- exp ~   age + age2 +black +south +urban + college
eq_1.3 <- exp2 ~  -1 + age + age2 +black +south +urban + college # important !!

eq_2 <- lwage ~ edu + exp + exp2 +black +south +urban

sys <- list(
  eq1.1 = eq_1.1,
  eq1.2 = eq_1.2, 
  eq1.3 = eq_1.3, 
  eq2 = eq_2)

# specify the instruments

instr <- ~age + age2 +black +south +urban + college

# fit models 
fit.sys <- systemfit(
  formula =  sys, 
  inst = instr, 
  method ="2SLS", 
  data = tbl_reg)

smry <- summary(fit.sys)
out <- broom::tidy(fit.sys) %>%
  separate(., col = term, 
           into = c("eq", "vars"), sep = "_") %>%
  select(-all_of(starts_with("conf")))

knitr::kable(out, 
      #caption ="等量工具变量下的2SLS结果" ,
      digits = 4)
```

富余工具变量情形

在节 6 中，给定情形为：edu、exp和exp2内生，工具变量为public和private、以及age和age2。作为对比，我们此处使用R包systemfit进行估计。具体结果见表表 3：

表 3: 富余工具变量下的2SLS结果

Show the code

```{r}
#| label: tbl-rich-systemfit
#| tbl-cap: "富余工具变量下的2SLS结果"
#| eval: false

# load package
library(systemfit)
# set two models
eq_1.1 <- edu ~  age + age2 +black +south +urban + public +private
eq_1.2 <- exp ~  age + age2 +black +south +urban + public +private
eq_1.3 <- exp2 ~  age + age2 +black +south +urban + public +private

eq_2 <- lwage ~ edu + exp + exp2 +black +south +urban

sys <- list(
  eq1.1 = eq_1.1,
  eq1.2 = eq_1.2, 
  eq1.3 = eq_1.3, 
  eq2 = eq_2)

# specify the instruments

instr <- ~age + age2 +black +south +urban + public +private


# fit models 
fit.sys <- systemfit(
  formula =  sys, 
  inst = instr, 
  method ="2SLS", 
  data = tbl_reg)

smry <- summary(fit.sys)
out <- tidy(fit.sys) %>%
  separate(., col = term, 
           into = c("eq", "vars"), sep = "_") %>%
  select(-all_of(starts_with("conf")))

knitr::kable(out, 
      #caption ="富余工具变量下的2SLS结果" ,
      digits = 4)
```

数据准备

任务1：OLS回归分析

任务2：一个内生变量且等量工具变量情形下的2SLS分析

任务3：两个内生变量且等量工具变量情形下的2SLS分析

任务4：一个内生变量且富余工具变量情形下的2SLS分析

任务5：两个内生变量且富余工具变量情形下的2SLS分析

任务6：模型综合比较

采用R包systemfit进行估计

等量工具变量情形

富余工具变量情形

采用R包`systemfit`进行估计