fundamental statistics theory notes (6)

Central Limit Theorem (CLT)

CLT workd for i.i.d sample from “most” distribution.

Let $x_1,x_2…x_n$ be i.i.d from any distribution, where $M_x(t)$ exists. Denote $E(x_i) = mu and Var(x_i) = sigma^2$. Let $bar x_n = frac{1}{n}sum_{i=1}^{n}x_i and U = frac{bar x_n - mu}{sigma/sqrt{n}}$.
Then $U Rightarrow N(0,1) as nrightarrow infty$. This means the PDF of U will approximate standard normal distribution when n is efficiently large.

Proof:
Let $y_i = frac{x_i-mu}{sigma}, then E(y_i) = 0, Var(y_i) = E(y_i^2) = 1, and U = frac{1}{sqrt{n}}sum_{i=1}{n}y_i$
$Rightarrow$ MGF of U is then $M_u(t) = E(e^{tU}) = E[e^{ty_1/sqrt{n}+ty_2/sqrt{n}…+ty_n/sqrt{n}}]$
$=E[e^{ty_1/sqrt{n}}]E[e^{ty_2/sqrt{n}}]…E[e^{ty_n/sqrt{n}}]$
$=M_{y_1}(t/sqrt{n})M_{y_2}(t/sqrt{n})…M_{y_n}(t/sqrt{n})$
$=[M_y(t/sqrt{n})]^n$

(from Taylor expand around t=0)
$= 1 +E(y_i)frac{t}{sqrt{n}}+E(y_i^2)frac{t^2}{2n}+R$
$= 1 + 0 + frac{t^2}{2n}+R$
$= M_u(t) = [1+frac{t^2}{2}cdot frac{1}{n}+R]^n$
$Rightarrow M_y(t/ {sqrt{n}}) = M_y(0)frac {(t/ sqrt{n})^0}{0!}+M_y’(0)frac{(t/ sqrt{n})^1}{1!}+M_y’’(0)frac{(t/ sqrt{n})^2}{2!}+R$

Recall
$$lim limits_{n rightarrow infty}(1+frac{x}{n})^2 = e^x$$

$Rightarrow lim limits_{n rightarrow infty}M_u(t) = lim limits_{n rightarrow infty}[1+frac{t^2}{2}frac{1}{n}+R]^n = e^{t^2/2}$
$Rightarrow$ this is the MGF of $N(0,1)$.

$Rightarrow$ MGF of U $rightarrow MGF of N(0,1)$, since MGF uniquely determines the distribution, $U rightarrow N(0,1)$.

Ex. Let $x_1,x_2…x_n overset {text{i.i.d}}{sim} Bernouli(p)$, whihc is equal $Binomial(1,p)$.
$E(x_i) = p, Var(x_i)=p(1-p)$
By CLT, when n is large
$Rightarrow frac{bar x - p}{sqrt{p(1-p)}/ sqrt{n}} overset{text{arrpox}}{sim}N(0,1)$
or $bar x overset{text{arrpox}}{sim} N(p, frac{p(1-p)}{n})$
or $sum_{i=1}^{n}x_ioverset{text{arrpox}}{sim}N(np, np(1-p))$
and $sum_{i=1}^{n}x_i sim Binomial(n,p)$ which is the exact distribution.