p-value = probability of observing a result as extreme (or more) assuming \(H_0\) is true
Example: drug trial shows 8 mmHg blood pressure drop
| p-value | Interpretation |
|---|---|
| p < 0.05 | Reject \(H_0\) → drug likely works |
| p ≥ 0.05 | Do not reject \(H_0\) → not enough evidence |
⚠️ A small p-value does not prove \(H_1\) is true — it only says \(H_0\) is unlikely given the data.
| Decision | \(H_0\) True | \(H_1\) True |
|---|---|---|
| \(T=0\) | True Negative (TN) |
False Negative (FN)
|
| \(T=1\) |
False Positive (FP)
|
True Positive (TP) |
⚠️ Same data, two conclusions
\(H_0: p_6 = 1/6\) vs \(H_1: p_6 > 1/6\)
Here: do not reject \(H_0\) (\(H_0\) is “likely”)
\(H_0:\) dice is fair vs \(H_1: \exists k: p_k > 1/6\)
Here: reject \(H_0\) (\(H_0\) is “unlikely”)
\(\alpha\)-quantile \(q_{\alpha}\): \(\int_{-\infty}^{q_{\alpha}} p(x)dx = \alpha\)
\(\Leftrightarrow \mathbb P(X \leq q_{\alpha}) = \alpha\)
Consider a probability measure \(P\) on \(\mathbb R\) with \(X \sim P\).
\(\alpha\)-quantile \(q_{\alpha}\): \(\inf\{q \in \mathbb R:~\sum_{x_i \leq q}p(x_i) \geq \alpha\}\)
Gaussian \(\mathcal N(\mu,\sigma)\): \[p(x) = \frac{1}{\sqrt{2\pi \sigma^2}}e^{-\frac{(x-\mu)^2}{2\sigma^2}}\]
Approximation of sum of iid RV (Central Limit Theorem)
Gaussian \(\mathcal N(\mu,\sigma)\): \(p(x) = \frac{1}{\sqrt{2\pi \sigma^2}}e^{-\frac{(x-\mu)^2}{2\sigma^2}}\)
Approximation of sum of iid RV (CLT)
Binomial \(\mathrm{Bin}(n,q)\): \(p(x)= \binom{n}{x}q^x (1-q)^{n-x}\)
Number of successes among \(n\) Bernoulli \(q\)
Exponential \(\mathcal E(\lambda)\): \(p(x) = \lambda e^{-\lambda x}\)
Waiting time for an atomic clock of rate \(\lambda\)
Geometric \(\mathcal{G}(q)\): \(p(x)= q(1-q)^{x-1}\)
Index of first success for iid Bernoulli \(q\)
Gamma \(\Gamma(k, \lambda)\): \(p(x) = \frac{\lambda^k x^{k-1}e^{-\lambda x}}{(k-1)!}\)
Waiting time for \(k\) atomic clocks of rate \(\lambda\)
Poisson \(\mathcal{P}(\lambda)\): \(p(x)=\frac{\lambda^x}{x!}e^{-\lambda}\)
Number of ticks before time \(1\) of clock \(\lambda\)
We observe some data \(X\) in a measurable space \((\mathcal X, \mathcal A)\).
Example: \(\mathcal X = \mathbb R^n\), \(X= (X_1, \dots, X_n)\).
Goal: estimate a function of \(P_{\theta}\), e.g. \(\int x \, dP_{\theta}\) or \(\int x^2 \, dP_{\theta}\)
Goal: decide between \(H_0: \theta \in \Theta_0\) or \(H_1: \theta \in \Theta_1\)
Goal: decide between \(H_0: \theta \in \Theta_0\) or \(H_1: \theta \in \Theta_1\)
Example (Multiple VS Multiple Parametric):
A Decision Rule or Test \(T\) is a measurable function: \[T : \mathcal X \to \{0,1\}\]
A Test Statistic \(\psi\) is a measurable function: \[\psi : \mathcal X \to \mathbb R\]
For a test \(T\) with statistic \(\psi\), the rejection region \(\mathcal R \subset \mathbb R\) is: \[\mathcal R = \{\psi(x) \in \mathbb R:~ T(x)=1\}\]
→ Set of values of the statistic that lead to rejecting \(H_0\)
For a test \(T\), the critical region \(\mathcal C \subset \mathcal X\) is: \[\mathcal C = \{x \in \mathcal X:~ T(x)=1\}\]
→ Set of observations that lead to rejecting \(H_0\)
⚠️ These terms are sometimes used interchangeably, but we distinguish them in this course.
\[ \begin{aligned} T(x) &= \mathbf{1}\{\psi(x) > t\}:~~~~~~\mathcal R = (t,+\infty)\\ T(x) &= \mathbf{1}\{\psi(x) < t\}:~~~~~~\mathcal R = (-\infty,t)\\ T(x) &= \mathbf{1}\{|\psi(x)| > t\}:~~~~~~\mathcal R = (-\infty,t)\cup (t, +\infty)\\ T(x) &= \mathbf{1}\{\psi(x) \not \in [t_1, t_2]\}:~~~~~~\mathcal R = (-\infty,t_1)\cup (t_2, +\infty)\; \end{aligned} \]
We observe \(X \in \mathcal X=\mathbb R^n\). We fix two known distributions \(P\) and \(Q\).
The problem:
\(H_0: X \sim P\) or \(H_1: X \sim Q\)
Warning
We know \(P\) and \(Q\) but not whether \(X \sim P\) or \(X \sim Q\)
Level and Power
Consider a simple VS simple problem. The Level of a test \(T\) is defined as
\[\alpha = P(T(X)=1) = P(X \in \mathcal C) \quad \text{(type-1 error)}\]
Its power is defined as
\[\beta = Q(T(X)=1) = Q(X \in \mathcal C)\]
\(1 - \beta\) is the type-2 error
| Decision | \(H_0: X \sim P\) | \(H_1: X \sim Q\) |
|---|---|---|
| \(T=0\) | ✓ \(1-\alpha\) | ✗ \(1-\beta\) |
| \(T=1\) | ✗ \(\alpha\) | ✓ \(\beta\) |
Unbiased: \(\beta \geq \alpha\)
\(\alpha = 0\) for trivial test \(T(x)=0\)… but \(\beta = 0\) too!
Consider the simple VS simple problem \(H_0\): \(X \sim P\) VS \(H_1\): \(X \sim Q\).
Question: which test \(T\) maximizes \(\beta\) at fixed \(\alpha\)?
Likelihood ratio statistic
\[\psi(x)=\frac{dQ}{dP}(x) = \frac{q(x)}{p(x)}\]
Likelihood ratio test
\[T^*(x)=\mathbf 1\left\{\frac{q(x)}{p(x)} > t_{\alpha}\right\}\]
where \(t_{\alpha}\) is the \(\alpha\)-quantile: \[\mathbb P_{X \sim P}\left(\frac{q(X)}{p(X)} > t_{\alpha}\right) = \alpha\]
Neyman-Pearson’s Theorem
The Likelihood Ratio Test of level \(\alpha\) maximizes the power among all tests of level \(\alpha\).
Recall: \(T^*(x)=\mathbf 1\left\{\frac{q(x)}{p(x)} > t_{\alpha}\right\}\) where \(P(T^*(X)=1) = \alpha\)
Equivalent to Log-Likelihood Ratio Test:
\[T^*(x)=\mathbf 1\left\{\log\left(\frac{q(x)}{p(x)}\right) > \log(t_{\alpha})\right\}\]
Let \(T\) be any test with level \(\alpha\). We want to show \(\beta^* \geq \beta\).
\[\beta^* - \beta = Q(T^*=1) - Q(T=1) = \int (T^* - T) dQ\]
\[= \int (T^* - T) \frac{q}{p} dP\]
\[\beta^* - \beta = \int (T^* - T) \frac{q}{p} dP\]
On \(\{T^*=1\}\): \(\frac{q}{p} > t_\alpha\) and \((T^* - T) \geq 0\)
On \(\{T^*=0\}\): \(\frac{q}{p} \leq t_\alpha\) and \((T^* - T) \leq 0\)
\[\Rightarrow \beta^* - \beta \geq t_\alpha \int (T^* - T) dP = t_\alpha (\alpha - \alpha) = 0 ~~~ \square\]
Example: \(X \sim \mathcal N(\theta, 1)\) with \(H_0: \theta=\theta_0\) and \(H_1: \theta=\theta_1\)
Log-likelihood ratio: \[\log\frac{q(x)}{p(x)} = (\theta_1 - \theta_0)x + \frac{\theta_0^2 -\theta_1^2}{2}\]
If \(\theta_1 > \theta_0\), the optimal test is: \[T(x) = \mathbf 1\{ x > t \}\]
Let \(P_{\theta} = \mathcal N(\theta,1)\). Observe \(n\) iid data \(X = (X_1, \dots, X_n)\).
Density of \(P^{\otimes n}_{\theta}\): \[\frac{d P^{\otimes n}_{\theta}}{dx} = \frac{1}{\sqrt{2\pi}^n}\exp\left(-\frac{\|x\|^2}{2} + n\theta \overline x - \frac{n\theta^2}{2}\right)\]
Log-Likelihood Ratio Test:
\(T(x) = \mathbf 1\{\overline x > t_{\alpha}\}\) if \(\theta_1 > \theta_0\)
\(T(x) = \mathbf 1\{\overline x < t_{\alpha}\}\) otherwise
Definition
A set of distribution \(\{P_{\theta}\}\) is an exponential family if there exists real valued functions \(a,b,c,d\) such that: \[p_{\theta}(x) = a(\theta)b(x) \exp(c(\theta)d(x))\]
Likelihood Ratio Test for Exponential Families
Consider the testing problem \(H_0: X \sim P_{\theta_0}^{\otimes n}\) vs \(H_1: X \sim P_{\theta_1}^{\otimes n}\)
Then, the likelihood ratio test is
\[T(X) = \mathbf 1\left\{\frac{1}{n}\sum_{i=1}^n d(X_i) > t\right\}\]
We observe \(X = (X_1, \dots, X_n)\). Consider the following testing problem:
\[H_0: X \sim P_{\theta_0}^{\otimes n} \quad \text{or} \quad H_1: X \sim P_{\theta_1}^{\otimes n}.\]
\[\frac{dP_{\theta_1}^{\otimes n}}{dP_{\theta_0}^{\otimes n}} = \left(\frac{a(\theta_1)}{a(\theta_0)}\right)^n \exp\left((c(\theta_1) - c(\theta_0)) \sum_{i=1}^n d(x_i)\right).\]
\[T(X) = \mathbf{1}\left\{\frac{1}{n}\sum_{i=1}^n d(X_i) > t\right\}. \quad (\text{calibrate } t)\]
Likelihood Ratio Test: \[T(X) = \mathbf 1\left\{\frac{1}{n}\sum_{i=1}^n d(X_i) > t\right\}\] (calibrate \(t\) to achieve level \(\alpha\))
Poisson distribution: \(p_\lambda(x) = \frac{\lambda^x}{x!}e^{-\lambda}\)
Rewrite:
\[p_\lambda(x) = \underbrace{e^{-\lambda}}_{a(\lambda)} \cdot \underbrace{\frac{1}{x!}}_{b(x)} \cdot \exp\left(\underbrace{\log\lambda}_{c(\lambda)} \cdot \underbrace{x}_{d(x)}\right)\]
→ Poisson is an exponential family with \(d(x) = x\)
Binomial distribution: \(p_q(x) = \binom{n}{x}q^x(1-q)^{n-x}\)
Rewrite:
\[p_q(x) = \underbrace{(1-q)^n}_{a(q)} \cdot \underbrace{\binom{n}{x}}_{b(x)} \cdot \exp\left(\underbrace{\log\frac{q}{1-q}}_{c(q)} \cdot \underbrace{x}_{d(x)}\right)\]
\(H_0\): \(N \sim \mathcal P(12)\) vs \(H_1\): \(N \sim \mathcal P(16)\)
\[T(N)=\mathbf 1\left\{N > t_{\alpha}\right\}\]
Computing \(t_{0.05}\):
quantile(Poisson(12), 0.95) → \(18\)1-cdf(Poisson(12), 17) → \(0.063\)1-cdf(Poisson(12), 18) → \(0.038\)→ Reject \(H_0\) if \(N \geq 19\)
\(H_0 = \mathcal P_0=\{P_{\theta}, \theta \in \Theta_0 \}\) is not a singleton
\(H_1 = \mathcal P_0=\{P_{\theta}, \theta \in \Theta_0 \}\) is not a singleton
Warning
No meaning of \(\mathbb P_{H_0}(X \in A)\) or \(\mathbb P_{H_1}(X \in A)\)
Note
The Level of a test \(T\) is defined as
\[\alpha = \sup_{\theta \in \Theta_0}P_{\theta}(T(X)=1)\]
Its power function is defined as
\[\beta: \Theta_1 \to [0,1]:~ \beta(\theta) = P_{\theta}(T(X)=1)\]
\(T\) is unbiased if \(\beta(\theta) \geq \alpha\) for all \(\theta \in \Theta_1\)
If \(T_1\), \(T_2\) are two tests of level \(\alpha_1\), \(\alpha_2\):
\(T_2\) is uniformly more powerful (UMP) than \(T_1\) if:
\(T^*\) is UMP\(_{\alpha}\) if it is UMP over any other test of level \(\alpha\)
Assumption: \(\Theta_0 \cup \Theta_1 \subset \mathbb R\)
Right-tailed (unilatéral droit): \(H_0: \theta \leq \theta_0\) vs \(H_1: \theta > \theta_0\)
Left-tailed (unilatéral gauche): \(H_0: \theta \geq \theta_0\) vs \(H_1: \theta < \theta_0\)
Simple/Multiple: \(H_0: \theta = \theta_0\) vs \(H_1: \theta \neq \theta_0\)
Multiple/Multiple: \(H_0: \theta \in [\theta_1, \theta_2]\) vs \(H_1: \theta \not\in [\theta_1, \theta_2]\)
Theorem
Assume \(p_{\theta}(x) = a(\theta)b(x)\exp(c(\theta)d(x))\) with \(c\) non-decreasing.
For a one-tailed test, there exists a UMP\(_\alpha\) test. It is of the form:
\[T = \mathbf 1\left\{\sum d(X_i) > t \right\}\]
for right-tailed (\(H_1: \theta > \theta_0\)), we just reverse the inequality.
Same if \(c\) is non-increasing instead
Here, \(\Theta_0\) is not necessarily a singleton. \(\mathbb P_{H_0}(X \in A)\) has no meaning without any further assumption.
Pivotal Test Statistic
\(\psi: \mathcal X \to \mathbb R\) is pivotal if the distribution of \(\psi(X)\) under \(H_0\) does not depend on \(\theta \in \Theta_0\):
for any \(\theta, \theta' \in \Theta_0\), and any event \(A\), \[ \mathbb P_{\theta}(\psi(X) \in A) = \mathbb P_{\theta'}(\psi(X) \in A) \; .\]
Since \(\psi\) is pivotal, we may write \(\mathbb{P}_{H_0}(\cdot)\) unambiguously for probabilities involving \(\psi(X)\) under \(H_0\).
If \(X=(X_1, \dots, X_n)\) are iid \(\mathcal N(0, \sigma)\), the distribution of \[ \psi(X) = \frac{\sum_{i=1}^n X_i}{\sqrt{\sum_{i=1}^n X_i^2}}\] does not depend on \(\sigma\).
Indeed, writing \(X_i = \sigma Z_i\) with \(Z_i \overset{\text{iid}}{\sim} \mathcal N(0,1)\), \[ \psi(X) = \frac{\sum_{i=1}^n \sigma Z_i}{\sqrt{\sum_{i=1}^n \sigma^2 Z_i^2}} = \frac{\sigma \sum_{i=1}^n Z_i}{\sigma\sqrt{\sum_{i=1}^n Z_i^2}} = \frac{\sum_{i=1}^n Z_i}{\sqrt{\sum_{i=1}^n Z_i^2}} \; .\]
P-value: definition
We define \(p_{value}(x_{\mathrm{obs}}) =\mathbb P(\psi(X) \geq x_{\mathrm{obs}})\) for a right-tailed test.
For a two-tailed test, \(p_{value}(x_{\mathrm{obs}}) =2\min(\mathbb P(\psi(X) \geq x_{\mathrm{obs}}),\mathbb P(\psi(X) \leq x_{\mathrm{obs}}))\)
We consider a test that rejects \(H_0\) for large values of \(\psi\). At level \(\alpha \in (0,1)\), the rejection region is
\[ \mathcal{R}_\alpha = \bigl\{ x \in \mathcal{X} : \psi(x) > c_\alpha \bigr\}, \] where \(c_\alpha\) is the critical value satisfying \(\mathbb{P}_{H_0}(\psi(X) > c_\alpha) = \alpha\).
Property
The p-value is the smallest level \(\alpha\) at which we reject \(H_0\): \[ p(x) = \inf\bigl\{\alpha \in (0,1) : x \in \mathcal{R}_\alpha\bigr\}. \]
\(X_1, \dots, X_n \overset{\text{iid}}{\sim} \mathcal N(\mu, \sigma_0)\) with \(\sigma_0\) known. Test \(H_0: \mu = 0\) vs \(H_1: \mu \neq 0\).
The pivotal statistic is \[ \psi(X) = \frac{\sqrt{n}\,\overline{X}}{\sigma_0} \sim \mathcal N(0,1) \quad \text{under } H_0 \; .\]
\(n=25\), \(\sigma_0 = 2\), \(\overline{x}_{\mathrm{obs}} = 0.9\).
\[\psi(x_{\mathrm{obs}}) = \frac{\sqrt{25} \times 0.9}{2} = 2.25\]
Two-tailed p-value: \(p_{value} = 2\,\mathbb P(Z \geq 2.25) = 2 \times 0.0122 = 0.0244\).
Since \(p_{value} = 0.0244 \leq \alpha = 0.05\), we reject \(H_0\).
\(X_1, \dots, X_n \overset{\text{iid}}{\sim} \mathcal N(\mu, \sigma)\) with \(\sigma\) unknown. Test \(H_0: \mu = 0\) vs \(H_1: \mu > 0\).
we use the following test statistic: \[ \psi(X) = \frac{\sqrt{n}\,\overline{X}}{S} \sim t_{n-1} \quad \text{under } H_0\] where \(S^2 = \frac{1}{n-1}\sum_{i=1}^n (X_i - \overline{X})^2\). The parameter \(\sigma\) cancels out (as in the previous slide), so \(\psi\) is pivotal over \(\Theta_0 = \{(\mu, \sigma): \mu = 0,\, \sigma > 0\}\).
Numerical example: \(n=10\), \(\overline{x}_{\mathrm{obs}} = 1.5\), \(s = 2.1\).
\[\psi(x_{\mathrm{obs}}) = \frac{\sqrt{10} \times 1.5}{2.1} = 2.26\]
Right-tailed p-value: \(p_{value} = \mathbb P(T_9 \geq 2.26) = 0.025\).
Since \(p_{value} = 0.025 \leq \alpha = 0.05\), we reject \(H_0\).
P-value under \(H_0\)
Under \(H_0\), for a left or right tailed test, for a pivotal test statistic \(\psi\), \(p_{value}(X)\) has a uniform distribution \(\mathcal U([0,1])\).
Hence, if the data really follow the null hypothesis and if we test at level \(5\%\),
doing \(1000\) experiment will lead to reject on average \(50\) times
Let \(F\) denote the cdf of \(\psi(X)\) under \(H_0\). Assume for simplicity it is strictly increasing. Then \[p_{value}(X) = 1 - F(\psi(X)) \; .\]
For any \(t \in [0,1]\),
\[\mathbb P_{H_0}(p_{value}(X) \leq t) = \mathbb P_{H_0}(1 - F(\psi(X)) \leq t) = \mathbb P_{H_0}(\psi(X) \geq F^{-1}(1-t)) \; .\]
Hence,
\[\mathbb P_{H_0}(p_{value}(X) \leq t) = 1- F(F^{-1}(1-t))= t\]