Goodness of Fit Tests
Multinomials
- \frac{n!}{k_1!\dots k_m!} = \binom{n}{k_1,\dots, k_m} is a multinomial coefficient
- In this course: n \gg m.
- m \gg n corresponds to a high-dimensional setting.
\chi^2 Goodness of Fit
\chi^2 Goodness of Fit Test
- We observe (X_1, \dots, X_m) \sim \mathrm{Mult}(n, q). This corresponds to n counts: X_1 + \dots + X_m = n
- q = (q_1, \dots, q_m) corresponds to probabilities of getting color 1, \dots, m
- Let p = (p_1, \dots, p_m) be a known vector s.t. p_1 + \dots + p_m = 1.
- H_0:~ q = p ~~~\text{or}~~~ H_1: q \neq p \; .
Example: Bag of Sweets
Color | Observed Counts | Expected Counts |
---|---|---|
Red | X_1=50 | n_1=40 |
Green | X_2=30 | n_2=35 |
Yellow | X_3=20 | n_3=25 |
- \psi(X) = \sum_{i=1}^m\frac{(X_i-n_i)^2}{n_i} \approx 2.5 +0.71+1 \approx 4.21
- \mathrm{cdf}(\chi^2(2), 4.21) \approx 0.878 (p_{value} \approx 0.222)
- Conclusion: do not reject H_0
Comparison to a Theoretical Disrtribution
Histograms
Normalization
Can be normalized in counts (default), frequency, or density (area under the curve = 1)
\chi^2 Goodness of Fit to a given distribution
Example: Goodness of Fit to a Poisson distribution
H_0: X_i iid \mathcal P(2)
= [1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 4, 3, 0, 1, 1, 2, 3, 0, 1, 0, 0, 2, 1, 0, 1, 0, 0, 2, 0, 0] X
0 | 1 | 2 | \geq 3 | Total | |
---|---|---|---|---|---|
Counts | 16 | 8 | 3 | 3 | 30 |
Theoretical Counts | 4.06 | 8.1 | 8.1 | 9.7 |