# 统计学01: 中心极限定律、正态分布、z-score

## 中心极限定律

For samples of size 30 or more, the sample mean is approximately normally distributed

## 正态分布

Z分布，即标准正态分布，z=(x−μ)/σ，Z值可以查表。

Standardizing with Z scores

Since we cannot just compare these two raw scores, we instead compare how many standard deviations beyond the mean each observation is.

• Pam's score is (1800 - 1500) / 300 = 1 standard deviation above the mean.
• Jim's score is (24 - 21) / 5 = 0.6 standard deviations above the mean.

So Pam is better.

## z-score

z-scores are the signed number of standard deviations above the mean that an observation lies, z=(x−μ)/σ

• python

axis=0时对列z-score处理

ddof=1的意思是（自由度）计算标准差中分母上是n-1，默认是n-0，n就是样本数；当axis=0时，n=5

import numpy as npfrom scipy.stats import zscorem = np.array([[ 0.3148,  0.0478,  0.6243,  0.4608],              [ 0.7149,  0.0775,  0.6072,  0.9656],              [ 0.6341,  0.1403,  0.9759,  0.4064],              [ 0.5918,  0.6948,  0.904 ,  0.3721],              [ 0.0921,  0.2481,  0.1188,  0.1366]])zscore(m, axis=1, ddof=0)
• julia

using StatsBasem = [0.3148  0.0478 0.6243 0.4608  0.7149 0.0775 0.6072 0.9656  0.6341 0.1403 0.9759 0.4064  0.5918 0.6948 0.904 0.3721  0.0921 0.2481 0.1188 0.1366]μ = mean.(eachrow(m'))σ = std.(eachrow(m'))z=zscore(m', μ, σ)z'
• R

m = matrix(c(0.3148,  0.0478,  0.6243,  0.4608,  0.7149,  0.0775,  0.6072,  0.9656,  0.6341,  0.1403,  0.9759,  0.4064,  0.5918,  0.6948,  0.904 ,  0.3721,  0.0921,  0.2481,  0.1188,  0.1366)  ,ncol=4,byrow=T)scale(m)

## Reference

