Definition of Gaussian distribution
A Gaussian (or normal) distribution with mean \(\mu \in \mathbb{R}\) and variance \(\sigma^2 > 0\) is the distribution with density
\[f(x) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(-\frac{(x - \mu)^2}{2\sigma^2}\right)\]
We denote \(\mathcal{N}(\mu, \sigma^2)\) for this distribution. When \(\mu = 0\) and \(\sigma^2 = 1\), we call it the standard normal distribution.
Properties
I generate \(X \sim \mathcal{N}(0,1)\).
We observe \(0.37\). Could this come from a \(\mathcal{N}(0,1)\)?
We observe \(3.82\). Could this come from a \(\mathcal{N}(0,1)\)?
We observe \(-0.91\). And this??
\(H_0\): \(X \sim \mathcal N(0,1)\) VS \(H_1\): \(X \sim \mathcal N(\mu, 1)\), \(\mu \neq 0\)
quantile(Normal(0,1), 0.975) = 1.962 * (1 - cdf(Normal(0,1), abs(3.82))) ≈ 0.0001We observe \((X_1, \dots, X_n)\) iid real valued random variables.
CLT
Let \(S_n = \sum_{i=1}^n X_i\) with \((X_1, \dots, X_n)\) iid (\(L^2\)) then \[ \frac{S_n - \mathbb E[S_n]}{\sqrt{\mathrm{Var}(S_n)}} \approx \mathcal N(0,1) \text{ when } n \to \infty \]
Rule of thumb: \(n \geq 30\) (!!! be careful about this rule)
This is an equality when \(X_i\)’s are Gaussian \(\mathcal N(\mu, \sigma^2)\)
Fix \(p \in (0,1)\). Then, \(\frac{\mathrm{Bin}(n,p) - np}{\sqrt{np(1-p)}} \approx \mathcal N(0,1)\) when \(n \to \infty\)
\(n\) should be \(\gg \frac{1}{p}\) (not \(30\)!!!)
Good Approx for (\(n=100\), \(p=0.2\))
Bad Approx for (\(n=100\), \(p=0.01\))
Definition of Chi-square distribution
A chi-squared distribution with degree of freedom \(k\), is the distribution of
\[X = \sum_{i=1}^k Z_i^2\]
where the \((Z_1, \dots, Z_k)\) are iid \(\mathcal N(0,1)\). We denote \(\chi^2(k)\) for this distribution.
Properties
\(X = \sum_{i=1}^k Z_i^2\)
\[\mathbb{E}[X] = \sum_{i=1}^k \mathbb{E}[Z_i^2] = \sum_{i=1}^k 1 = k\]
\[\mathbb{V}[X] = \sum_{i=1}^k \mathbb{V}[Z_i^2] = k* (3-1) = 2k\]
The \(Z_i^2\) are iid with mean \(\mu = 1\) and variance \(\sigma^2 = 2\). By the CLT:
\[\frac{X - \mathbb E[X]}{\mathbb V(X)} = \frac{X - k}{\sqrt{2k}} \xrightarrow{\mathcal{L}} \mathcal{N}(0,1) \quad \text{as } k \to +\infty\]
Rearranging:
\[X \approx k + \sqrt{2k}\,\mathcal{N}(0,1) \qquad \blacksquare\]
I generate \(X \sim \chi^2(53)\).
We observe \(112.7\). Could this come from a \(\chi^2(53)\)?
We observe \(50.1\). Could this come from a \(\chi^2(53)\)?
We observe \(15.4\). And this??
\(H_0\): \(X \sim \chi^2(53)\) VS \(H_1\): \(X \not\sim \chi^2(53)\)
quantile(Chisq(53), 0.025) = 34.78quantile(Chisq(53), 0.975) = 74.472 * min(cdf(Chisq(53), 112.7), 1 - cdf(Chisq(53), 112.7)) ≈ 0Definition of Student distribution
A Student distribution with degree of freedom \(k\), is the distribution of
\[T = \frac{Z}{\sqrt{U/k}}\]
where \(Z\) and \(U\) are independent, with \(Z \sim \mathcal N(0,1)\) and \(U \sim \chi^2(k)\). We denote \(\mathcal T(k)\) for this distribution.
Properties
Mean: For \(k > 1\), since \(Z\) and \(U\) are independent, \[\mathbb{E}[T] = \mathbb{E}[Z] \cdot \mathbb{E}\!\left[\frac{1}{\sqrt{U/k}}\right] = 0\] because \(\mathbb{E}[Z] = 0\).
Asymptotic normality: Write \(T = \frac{Z}{\sqrt{U/k}}\). By the law of large numbers, \(U/k = \frac{1}{k}\sum_{i=1}^k Z_i^2 \xrightarrow{\text{a.s.}} \mathbb{E}[Z_1^2] = 1\) as \(k \to \infty\). Since \(Z\) is independent of \(U\), we conclude by Slutsky’s theorem.
I generate \(T \sim \mathcal{T}(10)\).
We observe \(-5.2\). Could this be unusually small for a \(\mathcal{T}(10)\)?
We observe \(3.45\). Could this be unusually small for a \(\mathcal{T}(10)\)?
We observe \(-0.15\). And this??
\(H_0\): \(T \sim \mathcal T(10)\) VS \(H_1\): \(T \sim \mathcal T(10) + \mu\) with \(\mu < 0\)
quantile(TDist(10), 0.05) = -1.81cdf(TDist(10), -3.45) = 0.003Definition of Fisher distribution
A Fisher distribution with degrees of freedom \((k_1, k_2)\), is the distribution of
\(F = \frac{U_1/k_1}{U_2/k_2}\)
where \(U_1\) and \(U_2\) are independent, with \(U_1 \sim \chi^2(k_1)\) and \(U_2 \sim \chi^2(k_2)\). We denote \(\mathcal F(k_1, k_2)\) for this distribution.
Properties
Using CLT approximation \(U_1 \sim k_1 + \sqrt{2k_1} Z_1\) and \(U_2 \sim k_1 + \sqrt{2k_2} Z_2\)
\[F = \frac{U_1/k_1}{U_2/k_2} = \frac{1 + \sqrt{\frac{2}{k_1}}\,Z_1}{1 + \sqrt{\frac{2}{k_2}}\,Z_2} \approx 1 + \sqrt{\frac{2}{k_1}}\,Z_1 - \sqrt{\frac{2}{k_2}}\,Z_2\]
Since \(U_1\) and \(U_2\) are independent, the variance of the right-hand side is \(\frac{2}{k_1} + \frac{2}{k_2}\), so:
\[F \approx 1 + \sqrt{\frac{2}{k_1} + \frac{2}{k_2}}\,\mathcal{N}(0,1)\]
I generate \(F \sim \mathcal{F}(5, 20)\).
We observe \(1.12\). Could this be unusually large for a \(\mathcal{F}(5,20)\)?
We observe \(4.87\). Could this be unusually large for a \(\mathcal{F}(5,20)\)?
We observe \(0.95\). And this??
\(H_0\): \(F \sim \mathcal F(5,20)\) VS \(H_1\): \(F\) is stochastically larger than \(\mathcal F(5,20)\)
quantile(FDist(5,20), 0.95) = 2.711 - cdf(FDist(5,20), 4.87) = 0.004Note
Let \(Y \sim \mathcal{N}(0, I_n)\). Let \(E\) and \(F\) be two orthogonal subspaces of \(\mathbb{R}^n\), i.e., \(E \perp F\), with dimensions \(\dim(E) = p\) and \(\dim(F) = q\). Denote by \(\Pi_E\) and \(\Pi_F\) the orthogonal projections onto \(E\) and \(F\) respectively. Then:
Independence: \(\Pi_E Y\) and \(\Pi_F Y\) are independent Gaussian vectors.
Chi-squared distributions: \(\|\Pi_E Y\|^2 \sim \chi^2(p)\) and \(\|\Pi_F Y\|^2 \sim \chi^2(q)\).
Pythagorean decomposition: If \(\mathbb{R}^n = E \oplus F\) (i.e., \(p + q = n\)), then \(\|Y\|^2 = \|\Pi_E Y\|^2 + \|\Pi_F Y\|^2\)
see also the proof