这里将展示Homework Week02的Stata
编程细节
配置Stata环境
在Rmarkdown
或quarto
文档下,需要配置对Stata
命令实时运算的基本环境。
Show the code
```{r}
#| code-fold: show
library(rmarkdown)
library(Statamarkdown)
```
Stata found at C:/Program Files/Stata18/StataMP-64.exe
The 'stata' engine is ready to use.
Show the code
```{r}
#| code-fold: show
# stata path and engine
stataexe <- "C:/Program Files/Stata18/StataMP-64.exe"
knitr::opts_chunk$set(engine.path=list(stata=stataexe))
```
读取数据
Show the code
```{stata first-Stata, collectcode=TRUE}
set more off
import excel "D:\github\master-SEM\04-students\lab02-dataset-cps.xlsx", sheet("Sheet 1") firstrow //导入数据
```
Show the code
```{stata}
**** 查看数据情况 *******
describe // 变量说明
summarize // 基本统计
```
Contains data
Observations: 46,943
Variables: 12
-------------------------------------------------------------------------------
Variable Storage Display Value
name type format label Variable label
-------------------------------------------------------------------------------
obs long %10.0g obs
wage double %10.0g wage
lgwg double %10.0g lgwg
edu byte %10.0g edu
exper byte %10.0g exper
exper2 double %10.0g exper2
female byte %10.0g female
union str6 %9s union
married str6 %9s married
formMarried str6 %9s formMarried
hispanic byte %10.0g hispanic
race str5 %9s race
-------------------------------------------------------------------------------
Sorted by:
Note: Dataset has changed since last saved.
Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
obs | 46,943 23472 13551.42 1 46943
wage | 46,943 24.79247 21.15943 .0003846 266.0557
lgwg | 46,943 2.990225 .660089 -7.863267 5.583706
edu | 46,943 14.34433 2.298745 12 20
exper | 46,943 21.85787 11.48215 -4 67
-------------+---------------------------------------------------------
exper2 | 46,943 6.096033 5.442559 0 44.89
female | 46,943 .4340157 .4956322 0 1
union | 0
married | 0
formMarried | 0
-------------+---------------------------------------------------------
hispanic | 46,943 .1158852 .3200906 0 1
race | 0
任务1:处理定性变量
定性变量 gender
:
Show the code
```{stata, collectcode=TRUE}
**** 虚拟变量 gender****
tabulate female
generate gender_female = (female==1)
generate gender_male = (female==0)
```
female | Freq. Percent Cum.
------------+-----------------------------------
0 | 26,569 56.60 56.60
1 | 20,374 43.40 100.00
------------+-----------------------------------
Total | 46,943 100.00
定性变量 married
:
Show the code
```{stata, collectcode=TRUE}
**** 虚拟变量 married****
tabulate married
generate married_female = (married=="female")
generate married_male = (married=="male")
generate married_other = (married=="other")
```
married | Freq. Percent Cum.
------------+-----------------------------------
female | 12,155 25.89 25.89
male | 19,213 40.93 66.82
other | 15,575 33.18 100.00
------------+-----------------------------------
Total | 46,943 100.00
定性变量 union
:
Show the code
```{stata, collectcode=TRUE}
**** 虚拟变量 union****
tabulate union
generate union_female = (union=="female")
generate union_male = (union=="male")
generate union_other = (union=="other")
```
union | Freq. Percent Cum.
------------+-----------------------------------
female | 423 0.90 0.90
male | 622 1.33 2.23
other | 45,898 97.77 100.00
------------+-----------------------------------
Total | 46,943 100.00
定性变量 formMarried
:
Show the code
```{stata, collectcode=TRUE}
**** 虚拟变量 formMarried****
tabulate formMarried
generate formMarried_female = (formMarried=="female")
generate formMarried_male = (formMarried=="male")
generate formMarried_other = (formMarried=="other")
```
formMarried | Freq. Percent Cum.
------------+-----------------------------------
female | 4,067 8.66 8.66
male | 2,615 5.57 14.23
other | 40,261 85.77 100.00
------------+-----------------------------------
Total | 46,943 100.00
定性变量 hispanic
:
Show the code
```{stata, collectcode=TRUE}
**** 虚拟变量 hispanic****
tabulate hispanic
generate hispanic_y= (hispanic==1)
generate hispanic_n = (hispanic==0)
```
hispanic | Freq. Percent Cum.
------------+-----------------------------------
0 | 41,503 88.41 88.41
1 | 5,440 11.59 100.00
------------+-----------------------------------
Total | 46,943 100.00
定性变量 race
:
Show the code
```{stata, collectcode=TRUE}
**** 虚拟变量 race****
tabulate hispanic
generate race_black = (race=="black")
generate race_ai = (race=="AI")
generate race_asian = (race=="asian")
generate race_mixed = (race=="mixed")
generate race_other = (race=="other")
```
hispanic | Freq. Percent Cum.
------------+-----------------------------------
0 | 41,503 88.41 88.41
1 | 5,440 11.59 100.00
------------+-----------------------------------
Total | 46,943 100.00
任务3-1:OLS快速回归分析
stata回归结果:
Show the code
```{stata, collectcode=TRUE}
regress lgwg edu exper exper2 gender_female union_female union_male married_female married_male formMarried_female formMarried_male hispanic_y race_black race_ai race_asian race_mixed
```
Source | SS df MS Number of obs = 46,943
-------------+---------------------------------- F(15, 46927) = 1137.67
Model | 5454.42352 15 363.628235 Prob > F = 0.0000
Residual | 14999.0242 46,927 .319624613 R-squared = 0.2667
-------------+---------------------------------- Adj R-squared = 0.2664
Total | 20453.4477 46,942 .435717432 Root MSE = .56535
------------------------------------------------------------------------------
lgwg | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
edu | .1166983 .0011767 99.17 0.000 .1143919 .1190047
exper | .0331565 .0008865 37.40 0.000 .031419 .034894
exper2 | -.0564307 .0018151 -31.09 0.000 -.0599882 -.0528731
gender_fem~e | -.0982573 .0120798 -8.13 0.000 -.1219339 -.0745806
union_female | .0228563 .0277993 0.82 0.411 -.0316307 .0773433
union_male | .0951874 .0229579 4.15 0.000 .0501896 .1401852
married_fe~e | .0161599 .0106713 1.51 0.130 -.004756 .0370758
married_male | .2111205 .009854 21.42 0.000 .1918064 .2304346
formMa~emale | -.0064213 .013036 -0.49 0.622 -.031972 .0191294
formMa~_male | .0828954 .0143544 5.77 0.000 .0547605 .1110303
hispanic_y | -.1081364 .0083306 -12.98 0.000 -.1244644 -.0918084
race_black | -.0955306 .0088518 -10.79 0.000 -.1128803 -.0781809
race_ai | -.1374391 .025821 -5.32 0.000 -.1880488 -.0868295
race_asian | -.0384218 .0119141 -3.22 0.001 -.0617735 -.01507
race_mixed | -.04128 .0196722 -2.10 0.036 -.0798378 -.0027223
_cons | .9085073 .0201129 45.17 0.000 .8690857 .9479289
------------------------------------------------------------------------------
获取斜率系数\(\boldsymbol{\hat{\beta}}\) ,转换为列向量:
Show the code
```{stata}
matrix b_quick = (e(b))' // coefficients
matrix list b_quick
```
b_quick[16,1]
y1
edu .1166983
exper .03315647
exper2 -.05643068
gender_fem~e -.09825725
union_female .02285629
union_male .09518739
married_fe~e .01615988
married_male .21112049
formMa~emale -.00642132
formMa~_male .08289541
hispanic_y -.1081364
race_black -.0955306
race_ai -.13743914
race_asian -.03842176
race_mixed -.04128004
_cons .90850729
估计系数的标准差\(\boldsymbol{S_\hat{\beta}}\) :
Show the code
```{stata}
scalar RSS_quick = e(rss) // 获取残差平方和
matrix covvar_b_quick = e(V) // 获取估计系数的方差协方差矩阵
mata: st_matrix("s_b_quick", sqrt(diagonal(st_matrix("e(V)")))) // 估计系数的标准差, 这里使用了mata语法
matrix list s_b_quick
```
s_b_quick[16,1]
c1
r1 .00117674
r2 .00088648
r3 .00181506
r4 .01207982
r5 .02779927
r6 .0229579
r7 .01067131
r8 .00985405
r9 .01303598
r10 .01435442
r11 .00833056
r12 .00885184
r13 .02582104
r14 .01191406
r15 .01967218
r16 .02011292
任务4-1:矩阵分析
Show the code
```{stata, collectcode=TRUE}
generate const = (female > -0.1) // 常数列,全部取值为1
// 以下采用mata语法体系
mata: st_view(y=., ., "lgwg") // 构造y矩阵
mata: st_view(X=., ., ("edu", "exper", "exper2", "gender_female", "union_female", "union_male", "married_female", "married_male", "formMarried_female", "formMarried_male", "hispanic_y", "race_black", "race_ai", "race_asian", "race_mixed", "const")) // 构造X矩阵
mata: b = (cholinv(cross(X,X)))*(cross(X,y)) // 计算斜率系数
mata: b
mata: N = rows(y) // 样本数
mata: k = cols(X) // 变量数
mata: printf("rows = %f k = %f", N, k)
```
1
+----------------+
1 | .1166983029 |
2 | .0331564661 |
3 | -.0564306759 |
4 | -.0982572522 |
5 | .0228562922 |
6 | .0951873853 |
7 | .0161598823 |
8 | .2111204886 |
9 | -.00642132 |
10 | .0828954056 |
11 | -.1081364005 |
12 | -.0955306037 |
13 | -.1374391407 |
14 | -.0384217612 |
15 | -.0412800403 |
16 | .9085072854 |
+----------------+
rows = 46943 k = 16
任务4-2:方差协方差矩阵
Show the code
```{stata, collectcode=TRUE}
mata: e = y - X*b // 残差向量
mata: RSS = e'e // 残差平方和
mata: sigma2 = RSS/(N-k) // 回归误差方差 或 残差均方和MSS_RSS
mata: printf("RSS = %f sigma2 = %f", RSS, sigma2)
mata: covvar_b = cholinv(cross(X,X))*sigma2 // 估计系数的方差协方差矩阵
mata: s_b = sqrt(diagonal(covvar_b)) // 估计系数的标准差
mata: s_b
```
RSS = 14999.0242 sigma2 = .319624613
1
+---------------+
1 | .001176742 |
2 | .0008864807 |
3 | .0018150624 |
4 | .0120798163 |
5 | .0277992726 |
6 | .0229579012 |
7 | .0106713148 |
8 | .0098540481 |
9 | .0130359761 |
10 | .0143544223 |
11 | .0083305603 |
12 | .0088518353 |
13 | .0258210409 |
14 | .0119140569 |
15 | .0196721817 |
16 | .0201129195 |
+---------------+
任务5-1:异方差一致性标准误差
stata快速命令法, HC1
使用'vce(robust)'
命令; HC2
使用' vce(hc2)'
命令;HC3使用'vce(hc3)'
命令。此外,HC0
往往在实证中不使用,因为它没有进行自由度调整
以下进行HC1
稳健性方差估计结果:
Show the code
```{stata, collectcode=TRUE}
regress lgwg edu exper exper2 gender_female union_female union_male married_female married_male formMarried_female formMarried_male hispanic_y race_black race_ai race_asian race_mixed, vce(robust)
```
Linear regression Number of obs = 46,943
F(15, 46927) = 969.84
Prob > F = 0.0000
R-squared = 0.2667
Root MSE = .56535
------------------------------------------------------------------------------
| Robust
lgwg | Coefficient std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
edu | .1166983 .0012822 91.01 0.000 .1141851 .1192115
exper | .0331565 .000952 34.83 0.000 .0312906 .0350223
exper2 | -.0564307 .0020755 -27.19 0.000 -.0604988 -.0523626
gender_fem~e | -.0982573 .0110319 -8.91 0.000 -.1198799 -.0766346
union_female | .0228563 .0196079 1.17 0.244 -.0155755 .0612881
union_male | .0951874 .0203004 4.69 0.000 .0553983 .1349765
married_fe~e | .0161599 .0095411 1.69 0.090 -.0025409 .0348606
married_male | .2111205 .0097006 21.76 0.000 .1921072 .2301338
formMa~emale | -.0064213 .0118607 -0.54 0.588 -.0296685 .0168259
formMa~_male | .0828954 .0145607 5.69 0.000 .0543562 .1114346
hispanic_y | -.1081364 .0081209 -13.32 0.000 -.1240535 -.0922193
race_black | -.0955306 .0083367 -11.46 0.000 -.1118706 -.0791906
race_ai | -.1374391 .0263727 -5.21 0.000 -.1891301 -.0857482
race_asian | -.0384218 .0133569 -2.88 0.004 -.0646014 -.0122421
race_mixed | -.04128 .0208531 -1.98 0.048 -.0821523 -.0004077
_cons | .9085073 .0212559 42.74 0.000 .8668455 .9501691
------------------------------------------------------------------------------
提取出估计系数的稳健标准差, 这里使用了mata
语法
Show the code
```{stata, collectcode=TRUE}
mata: st_matrix("s_b_quick_robust", sqrt(diagonal(st_matrix("e(V)"))))
matrix list s_b_quick_robust
```
s_b_quick_robust[16,1]
c1
r1 .00128222
r2 .00095195
r3 .00207555
r4 .01103191
r5 .01960791
r6 .02030042
r7 .00954113
r8 .00970058
r9 .01186074
r10 .0145607
r11 .0081209
r12 .00833668
r13 .02637275
r14 .01335685
r15 .02085306
r16 .02125586
手动编程计算法:
Show the code
```{stata, collectcode=TRUE}
mata: Xe = X:*e // element product
mata: XXe2 = cross(Xe, Xe) // (Xe)'*(Xe)
mata: hc0 = cholinv(cross(X,X)) * XXe2 * cholinv(cross(X,X)) // the HC0 formula
mata: hc1 = hc0 *N/(N-k)
mata: s_b_hc1 = sqrt(diagonal(hc1)) // 估计系数的HC1稳健标准差
mata: s_b_hc1
```
1
+---------------+
1 | .0012822244 |
2 | .000951951 |
3 | .002075546 |
4 | .0110319057 |
5 | .0196079143 |
6 | .0203004152 |
7 | .0095411323 |
8 | .0097005811 |
9 | .0118607367 |
10 | .0145607043 |
11 | .0081208969 |
12 | .0083366799 |
13 | .0263727477 |
14 | .0133568523 |
15 | .0208530587 |
16 | .0212558573 |
+---------------+
附录: do文件
以下是do文件的全部代码行:
Show the code
```{stata, eval=FALSE}
set more off
import excel "D:\github\master-SEM\04-students\lab02-dataset-cps.xlsx", sheet("Sheet 1") firstrow //导入数据
**** 查看数据情况 *******
describe
summarize
******任务1:处理定性变量******
**** 虚拟变量 gender****
tabulate female
generate gender_female = (female==1)
generate gender_male = (female==0)
**** 虚拟变量 married****
tabulate married
generate married_female = (married=="female")
generate married_male = (married=="male")
generate married_other = (married=="other")
**** 虚拟变量 union****
tabulate union
generate union_female = (union=="female")
generate union_male = (union=="male")
generate union_other = (union=="other")
**** 虚拟变量 formMarried****
tabulate formMarried
generate formMarried_female = (formMarried=="female")
generate formMarried_male = (formMarried=="male")
generate formMarried_other = (formMarried=="other")
**** 虚拟变量 hispanic****
tabulate hispanic
generate hispanic_y= (hispanic==1)
generate hispanic_n = (hispanic==0)
**** 虚拟变量 race****
tabulate hispanic
generate race_black = (race=="black")
generate race_ai = (race=="AI")
generate race_asian = (race=="asian")
generate race_mixed = (race=="mixed")
generate race_other = (race=="other")
******任务3-1:OLS快速回归分析******
regress lgwg edu exper exper2 gender_female union_female union_male married_female married_male formMarried_female formMarried_male hispanic_y race_black race_ai race_asian race_mixed
ereturn list
matrix b_quick = (e(b))' // 获取斜率系数,转换为列向量
matrix list b_quick
scalar RSS_quick = e(rss) // 获取残差平方和
matrix covvar_b_quick = e(V) // 获取估计系数的方差协方差矩阵
matrix list covvar_b_quick
mata: st_matrix("s_b_quick", sqrt(diagonal(st_matrix("e(V)")))) // 估计系数的标准差, 这里使用了mata语法
matrix list s_b_quick
******任务4-1:矩阵分析******
generate const = (female > -0.1) // 常数列,全部取值为1
// 以下采用mata语法体系
mata: st_view(y=., ., "lgwg") // 构造y矩阵
mata: st_view(X=., ., ("edu", "exper", "exper2", "gender_female", "union_female", "union_male", "married_female", "married_male", "formMarried_female", "formMarried_male", "hispanic_y", "race_black", "race_ai", "race_asian", "race_mixed", "const")) // 构造X矩阵
mata: b = (cholinv(cross(X,X)))*(cross(X,y)) // 计算斜率系数
mata: b
mata: N = rows(y) // 样本数
mata: k = cols(X) // 变量数
mata: printf("rows = %f k = %f", N, k)
******任务4-2:方差协方差矩阵******
mata: e = y - X*b // 残差向量
mata: RSS = e'e // 残差平方和
mata: sigma2 = RSS/(N-k) // 回归误差方差 或 残差均方和MSS_RSS
mata: printf("RSS = %f sigma2 = %f", RSS, sigma2)
mata: covvar_b = cholinv(cross(X,X))*sigma2 // 估计系数的方差协方差矩阵
mata: s_b = sqrt(diagonal(covvar_b)) // 估计系数的标准差
mata: s_b
******任务5-1:异方差一致性标准误差******
// stata快速命令法, HC1使用', vce(robust)'命令; HC2使用', vce(hc2)'命令;HC3使用', vce(hc3)'命令。此外,HC0往往在实证中不使用,因为它没有进行自由度调整
regress lgwg edu exper exper2 gender_female union_female union_male married_female married_male formMarried_female formMarried_male hispanic_y race_black race_ai race_asian race_mixed, vce(robust)
mata: st_matrix("s_b_quick_robust", sqrt(diagonal(st_matrix("e(V)")))) // 估计系数的稳健标准差, 这里使用了mata语法
matrix list s_b_quick_robust
// 手动编程计算法
mata: Xe = X:*e // element product
mata: XXe2 = cross(Xe, Xe) // (Xe)'*(Xe)
mata: hc0 = cholinv(cross(X,X)) * XXe2 * cholinv(cross(X,X)) // the HC0 formula
mata: hc1 = hc0 *N/(N-k)
mata: s_b_hc1 = sqrt(diagonal(hc1)) // 估计系数的HC1稳健标准差
mata: s_b_hc1
```