TD3: Goodness of Fit

Exercise 0: \(2\sigma\)-Game

For each distribution \(P\), compute an approximation of the \(2\sigma\) interval \([\mu - 2\sigma,\, \mu + 2\sigma]\) and indicate whether \(x_\mathrm{obs}\) falls inside or outside. For \(\chi^2\), \(t\), and \(F\), use the Gaussian approximation.

  • \(P = \mathcal{N}(3,\, 2^2)\), \(x_\mathrm{obs} = 8\)
  • \(P = \mathcal{N}(-5,\, 3^2)\), \(x_\mathrm{obs} = 0.5\)
  • \(P = \mathcal{P}(9)\), \(x_\mathrm{obs} = 16\)
  • \(P = \mathcal{P}(25)\), \(x_\mathrm{obs} = 18\)
  • \(P = \mathcal{E}(2)\), \(x_\mathrm{obs} = 0.6\)
  • \(P = \mathcal{E}(0.5)\), \(x_\mathrm{obs} = 3.5\)
  • \(P = \chi^2(10)\), \(x_\mathrm{obs} = 19\)
  • \(P = F(10,\, 30)\), \(x_\mathrm{obs} = 1.9\)

Exercise 1

We want to test if a die is biased. It is rolled \(1000\) times, and the number of occurrences for each face is recorded. The data is as follows:

1 2 3 4 5 6
Counts 159 168 167 160 175 171
  1. Formulate the hypothesis testing problem.
  2. Compute the expected counts under \(H_0\), give the degree of freedom \(d\) of the chi-squared test statistic and give the approximated p-value, using the cdf of \(\chi^2(d)\):

Exercise 2

In a survey of \(825\) families with \(3\) children, the number of boys was recorded:

\[ \begin{array}{|c|c|c|c|c|c|} \hline \text{Number of Boys} & 0 & 1 & 2 & 3 & \text{Total} \\ \hline \text{Number of Families} & 71 & 297 & 336 & 121 & 825 \\ \hline \end{array} \]

We assume under \(H_0\) that the genders of children in successive births within a family are independent categorical variables and that the probability \(p\) of having a boy remains constant.

  1. Determine the distribution of the number of boys in a family with 3 children as a function of \(p\).
  2. Estimate \(p\) using a maximum likelihood estimator.
  3. Test the goodness of fit to the distribution obtained in question 1.

Exercise 3

We observe

X = [0, 1, 0, 0, 0, 0, 0, 0.5, 1, 1, 1, 0.7, 0.9, 1, 1, 1, 1, 0, 0.1, 0, 1]

We assume that the entries of \(X\) are iid of distribution \(P\).

We consider the following hypothesis testing problem:

\(H_0\): \(P = \mathcal{B}(0.5)\) (Bernoulli) \(\quad\) VS \(\quad\) \(H_1\): \(P \neq \mathcal{B}(0.5)\).

  1. Examine the data carefully. What can you say about the observations, the iid assumption, \(H_0\) and \(H_1\)?
  2. Draw on the same graph the CDF of a Bernoulli \(\mathcal{B}(0.5)\) and the empirical CDF of the observed data \(X\).
  3. Apply the Kolmogorov-Smirnov test at level \(0.1\). To do so, use this table.
  4. Comment on the result.