We observe \((X_1, \dots, X_n)\) iid real valued random variables.
CLT
Test problems
\[ \begin{aligned} H_0: \mu = \mu_0 ~~~~ &\text{ or } ~~~ H_1: \mu > \mu_0 ~~~ \text{(right-tailed)}\\ H_0: \mu = \mu_0 ~~~ &\text{ or } ~~~ H_1: \mu < \mu_0 ~~~ \text{(left-tailed)}\\ H_0: \mu = \mu_0 ~~~ &\text{ or } ~~~ H_1: \mu \neq \mu_0 ~~~ \text{(two-tailed)}\\ \end{aligned} \]
Tests
\[ \begin{aligned} \frac{\sqrt{n}(\overline X-\mu_0)}{\sigma} > t_{1-\alpha} ~~~ \text{(right-tailed)}\\ \frac{\sqrt{n}(\overline X-\mu_0)}{\sigma} < t_{\alpha} ~~~ \text{(left-tailed)}\\ \left|\frac{\sqrt{n}(\overline X-\mu_0)}{\sigma}\right| > t_{1-\tfrac{\alpha}{2}}~~~ \text{(two-tailed)}\\ \end{aligned} \]
Fisher’s Quote
The value for which \(p=0.05\), or 1 in 20, is 1.96 or nearly 2 ; it is convenient to take this point as a limit in judging whether a deviation is to be considered significant or not.
Multiple VS multiple test problem: \[ H_0: \{\mu_0,\sigma > 0\} \text{ or } H_1: \{\mu \neq \mu_0,\sigma > 0\} \;. \]
\(\psi(X) = \frac{\sqrt{n}(\overline X-\mu_0)}{\sigma}\) no longer test statistic.
Idea: replace \(\sigma\) by its estimator \[ \hat \sigma(X) = \sqrt{\frac{1}{n-1}\sum_{i=1}^n(X_i - \mu_0)^2} \; .\]
This gives \[ \psi(X) = \frac{\sqrt{n}(\overline X-\mu_0)}{\hat \sigma} \; . \]
Is \(\psi(X)\) pivotal under \(H_0\) ? What is its distribution ?
Chi-squared distribution \(\chi^2(k)\)
Student distribution \(\mathcal T(k)\)
Theorem
Assume \(X_i\) are iid \(\mathcal N(\mu_0, \sigma^2)\).
Multiple VS multiple test problem \(X=(X_1, \dots, X_n)\): \[ H_0: \{\mu_0,\sigma > 0\} \text{ or } H_1: \{\mu \neq \mu_0,\sigma > 0\} \;. \]
(Student) T-test statistic: \[\psi(X) = \frac{\sqrt{n}(\overline X-\mu_0)}{\hat \sigma(X)} \sim \mathcal T(n-1)\]
Note
quantile(Chisq(n-1), 1-alpha)
1-cdf(Chisq(n-1), xobs)
quantile(Chisq(n-1), alpha)
cdf(Chisq(n-1), xobs)
We observe \((X_1, \dots, X_{n_1})\) iid \(\mathcal N(\mu_1, \sigma_1^2)\) and \((Y_1, \dots, Y_{n_2})\) iid \(\mathcal N(\mu_2, \sigma_2^2)\).
\(\sigma_1\), \(\sigma_2\) are known, \(\mu_1\), \(\mu_2\) are unknown
Test problem: \(H_0: \mu_1 = \mu_2 ~~~\text{or} ~~~H_1: \mu_1 \neq \mu_2\)
Idea: normalize \(\overline X - \overline Y\): \[ \psi(X,Y)=\frac{\overline X - \overline Y}{\sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}} \]
Two-tailed test for testing means: \[ T(X,Y)=\left|\frac{\overline X - \overline Y}{\sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}}\right| \geq t_{1-\alpha/2} \; , \]
\(t_{1-\alpha/2}\) is the \((1-\alpha/2)\)-quantile of a Gaussian distribution
Objective. Test if a new medication is efficient to lower cholesterol level
Experiment.
Test Problem.
Test Statistic. \(\psi(X,Y)=\frac{\overline X - \overline Y}{\sqrt{\frac{\sigma^2}{n_1} + \frac{\sigma^2}{n_2}}}\)
Data. \(\overline X = 24.5\) mg/dL and \(\overline Y = 21.3\) mg/dL. Hence \(\psi(X,Y)= 5.5\).
Conclusion. Do not reject, and do not use this medication!
Fisher distribution \(\mathcal F(k_1,k_2)\)
Proposition
Student T-Test for two populations with equal variance
\(\hat \sigma^2 = \frac{1}{n_1 + n_2 - 2}\left(\sum_{i=1}^{n_1}(X_i - \overline X)^2 + \sum_{i=1}^{n_2}(Y_i - \overline Y)^2 \right)\)
Normalize \(\overline X - \overline Y\): \[\psi(X,Y) = \frac{\overline X - \overline Y}{\sqrt{\hat \sigma^2\left(\frac{1}{n_1} + \frac{1}{n_2}\right)}} \sim \mathcal T(n_1+n_2 - 2) \; .\]
\(\psi(X,Y)\) is pivotal because \(\sigma_1 = \sigma_2\).
Student Welch test statistic
\[\psi(X, Y) = \frac{\overline X - \overline Y}{\sqrt{\frac{\hat \sigma_1^2}{n_1} + \frac{\hat \sigma_2^2}{n_2}}}\]
CLT
Example: binomials
Good Approx for (\(n=100\), \(p=0.2\))
Bad Approx for (\(n=100\), \(p=0.01\))
Test Statistic
\[ \psi(X,Y) = \frac{\hat p_1 - \hat p_2}{\sqrt{\hat p ( 1-\hat p)\left(\frac{1}{n_1} + \frac{1}{n_2}\right)}} \; .\]
Non-Smokers | Smokers | Total | |
---|---|---|---|
YES | 351 | 41 | 392 |
NO | 254 | 195 | 449 |
Total | 605 | 154 | 800 |
1-cdf(Normal(0,1), 8.99)