Show the code
library(tidyverse)
library(magrittr)
#library(openxlsx)
#library(here)
library(janitor)
library(Hmisc)
library(scales)
胡华平
2024年4月27日
考虑如下的数据尺度变换
给定变量\(X_i\)以及权重\(f_i\)
分别给出尺度因子\(w\)和\(v\)(大于0),也即\(X^\ast_i= w*X_i\) ,以及\(f_i^\ast = v*f_i\)。
现在考虑尺度变换前后,均值和方差计算的关系。
(1)变换前:
\[ \begin{aligned} \overline{X} &= \frac{\sum{(X_i\cdot f_i)}}{\sum{f_i}} \\ S^2_{X} &= \frac{\sum{\left((X_i - \overline{X})^2f_i\right)}}{(\sum{f_i}) - 1} \end{aligned} \]
(2)变换后:
\[ \begin{aligned} \overline{X}^{\ast} &= \frac{\sum{(X^{\ast}_i\cdot f^{\ast}_i)}}{\sum{f^{\ast}_i}} = \frac{\sum{(wX_i\cdot vf_i)}}{\sum{vf_i}} = w\cdot \frac{\sum{(X_i\cdot f_i)}}{\sum{f_i}} = w \overline{X}\\ S^2_{X^{\ast}} &= \frac{\sum{\left((X^{\ast}_i - \overline{X}^{\ast})^2f^{\ast}_i\right)}}{(\sum{f^{\ast}_i}) - 1} = \frac{\sum{\left((wX_i - w\overline{X})^2vf_i\right)}}{(\sum{vf_i}) - 1} \\ &= \frac{w^2v \cdot \sum{\left((X_i - \overline{X})^2f_i\right)}}{(\sum{vf_i}) - 1} \\ & = \frac{w^2 \cdot \sum{\left((X_i - \overline{X})^2f_i\right)}}{(\sum{f_i}) - 1/v} \quad \text{ (if } v \approx 1\text{ )} \\ & \approx w^2 \cdot S^2_X \end{aligned} \]
尺度变换下,方差计算可能会有数值近似差异。理论上,方差公式有两种形式:
\[ \begin{aligned} S^2_{X} &= \frac{\sum{\left((X_i - \overline{X})^2f_i\right)}}{(\sum{f_i}) - 1} && \text{(calcuation)} \end{aligned} \tag{1}\]
\[ \begin{aligned} S^2_{X} &= \frac{\sum{\left((X_i - \overline{X})^2f_i\right)}}{\sum{f_i}} && \text{(theory)} \end{aligned} \tag{2}\]
trans | w | v | x | f | xf | x_dm | x_dm_sqr | x_dm_sqr_f |
---|---|---|---|---|---|---|---|---|
before | 1 | 1.0 | 10 | 30 | 300 | -5.3 | 28.09 | 842.7 |
before | 1 | 1.0 | 19 | 20 | 380 | 3.7 | 13.69 | 273.8 |
before | 1 | 1.0 | 13 | 10 | 130 | -2.3 | 5.29 | 52.9 |
before | 1 | 1.0 | 20 | 20 | 400 | 4.7 | 22.09 | 441.8 |
before | 1 | 1.0 | 16 | 20 | 320 | 0.7 | 0.49 | 9.8 |
before | 1 | 1.0 | 合计 | 100 | 1530 | 1.5 | 69.65 | 1621.0 |
after | 10 | 0.1 | 100 | 3 | 300 | -53.0 | 2809.00 | 8427.0 |
after | 10 | 0.1 | 190 | 2 | 380 | 37.0 | 1369.00 | 2738.0 |
after | 10 | 0.1 | 130 | 1 | 130 | -23.0 | 529.00 | 529.0 |
after | 10 | 0.1 | 200 | 2 | 400 | 47.0 | 2209.00 | 4418.0 |
after | 10 | 0.1 | 160 | 2 | 320 | 7.0 | 49.00 | 98.0 |
after | 10 | 0.1 | 合计 | 10 | 1530 | 15.0 | 6965.00 | 16210.0 |
我们来验证两类方差公式的计算结果:
(1)采用经典方差计算公式 (式 1)
$w
[1] 10
$var1
[1] 16.37374
$var2
[1] 1801.111
[1] FALSE
[1] "1 801.111111111"
[1] "1 637.373737374"
(2)采用其他理论方差计算公式 (式 2)
$w
[1] 10
$var1_alt
[1] 16.21
$var2_alt
[1] 1621
[1] TRUE
[1] "1 621.000000000"
[1] "1 621.000000000"
实际上,计算机程序默认采用经典方差计算公式 (式 1)