c

cgyyy

V1

2022/09/28阅读：22主题：默认主题

# HW01

0.初始化

`````` global path "D:\Stata17_working\HW01"  //定义目录
global D0    "\$path\data0"
global D1    "\$path\data1"
global Out  "\$path\out"       //结果：图形和表格
cd "\$D1"
sysuse nlsw88.dta, clear
``````

1. nlsw88.dta 描述性统计 : 用sum2docx

``````sum2docx age grade wage hours ttl_exp tenure using \$Out\Table01.docx, ///
replace stats(N mean(%9.2f) sd min(%9.0g) median(%9.0g) max(%9.0g))
shellout \$Out\Table01.docx , replace
``````

2.变量生成

``````gen age2 = age^2
gen ln_wage = ln(wage)
gen wage_hour = wage/hours
egen meanwage = mean(wage)
gen dum =(wage>meanwage)
``````

3.绘制 `ttl_exp` 变量的直方图和密度函数图

``````histogram wage
graph export \$Out\His_ttl.png
kdensity wage
graph export \$Out\Kendi_ttl.png
``````

4.分行业统计

4.1 每个行业的观察值个数

``````tabulate industry, sum(age)
//结果
|   Summary of Age in current year
Industry |        Mean   Std. dev.       Freq.
------------+------------------------------------
Ag/Forest |   39.941176   3.1517969          17
Mining |       37.25   3.8622101           4
Construct |    38.62069   2.8587434          29
Manufactu |   38.989101     3.10383         367
Transport |   39.277778   3.3688186          90
Wholesale |   39.288288   3.0999049         333
Finance/I |   38.828125   3.1384527         192
Personal  |   39.237113   3.0508646          97
Entertain |   40.117647   3.2187411          17
Professio |   39.239078   3.0356132         824
Public ad |   39.159091   2.8400658         176
------------+------------------------------------
Total |   39.146057   3.0614786       2,232
``````

4.2 统计各个行业妇女的平均工资等

``````tabstat wage hours age, by (industry) stat(mean)
//结果
industry |      wage     hours       age
-----------------+------------------------------
Ag/Forestry/Fish |  5.621121  34.47059  39.94118
Mining |  15.34959        40     37.25
Construction |  7.564934  35.65517  38.62069
Manufacturing |  7.501578  40.89373   38.9891
Transport/Comm/U |  11.44335  39.85556  39.27778
Wholesale/Retail |  6.125897  35.24699  39.28829
Finance/Ins/Real |  9.843174  38.51563  38.82813
Personal service |  4.401093  32.09375  39.23711
Entertainment/Re |  6.724409  34.35294  40.11765
Professional ser |  7.871186  36.71655  39.23908
Public administr |  9.148407  38.54545  39.15909
-----------------+------------------------------
Total |  7.783463  37.23205  39.14606
------------------------------------------------
``````

4.3 列表统计不同行业中白种人、黑种人和其他人种的比例

``````bysort industry : egen indnum = count(industry)  //生成行业人数的变量

bysort industry : egen indwhite = count(industry) if race==1 //该行业白人数目
replace indwhite = indwhite/indnum //白人比例

bysort industry : egen indblack = count(industry) if race==2 //行业黑人数目
replace indblack = indblack/indnum //黑人比例

bysort industry : egen indother = count(industry) if race==3 //行业其他人数目
replace indother = indother/indnum //其他人比例

tabstat indwhite indblack indother, by (industry) stat(mean) f(%9.4f)
``````

``        industry |  indwhite  indblack  indother-----------------+------------------------------Ag/Forestry/Fish |    0.7647    0.2353         .          Mining |    1.0000         .         .    Construction |    0.8276    0.1379    0.0345   Manufacturing |    0.6240    0.3651    0.0109Transport/Comm/U |    0.6889    0.3000    0.0111Wholesale/Retail |    0.8018    0.1982         .Finance/Ins/Real |    0.8594    0.1302    0.0104Business/Repair  |    0.7442    0.2326    0.0233Personal service |    0.5258    0.4639    0.0103Entertainment/Re |    0.8235    0.1765         .Professional ser |    0.7476    0.2391    0.0133Public administr |    0.6705    0.3068    0.0227-----------------+------------------------------``

5.请使用 `label define``label value` 命令，把 `race` 变量中的数值做定义

``````label define race 1 "白种人" 2 "黑种人" 3 "其他"
label value race race
``````

6.续别变量转类别变量

``````gen G_age=(age<=37)
replace G_age=2 if age>37 & age<=42
replace G_age=3 if age>42
label define G_age 1 "37岁以下" 2 "38到42岁之间" 3 "43岁以上"
label values G_age G_age
``````

7.工资分布

7.1 使用kdensity

``````set scheme white_tableau
twoway 	(kdensity wage if race == 1, color(gs10)) ///
(kdensity wage if race == 2, color(emerald)), ///
legend(order(1 "White" 2 "Black")) ///
xtitle(wage) ytitle(density)
graph export \$Out\kdensity.png
``````

①从平均工资来看，白人的平均工资较高

②从分布来看，黑人的工资分布更为集中，5美元/小时左右的分布集中度最大

7.2 用柱状图呈现白人 (`race==1`) 和黑人 (`race==2`) 妇女的工资 (`wage`) 在不同行业 (`industry`) 的分布特征。

``````en wage_1 = (wage<5) //生成工资分段的分类变量
replace wage_1 = 2 if wage>=5 & wage<10
replace wage_1 = 3 if wage>=10 & wage<20
replace wage_1 = 4 if wage>=20
label define wage_1 1 "<5\$/h" 2 "5-10\$/h" 3 "10-20\$/h" 4 ">=20\$/h"
label values wage_1 wage_1

bysort industry : egen indnum = count(industry)  //生成行业人数的变量
bysort industry wage_1: egen wagepct = sum(1/indnum) //各行业工资分段的人数比例

global  pct `" 0 "0%" .25 "25%" .5 "50%" .75 "75%" 1 "100%" "'

betterbar wagepct, over(wage_1) by(industry) xlabel(\${pct}) bar pct
graph export \$Out\betterbar.png
``````

c
V1