Density, Likelihood and Radon Nikodym

Notation

In this note, \((\Omega,\mathbb P)\) denote a common probability measured space for all the random variables introduced in this note. We also write \(\mathbb E\) for the expectation. Let \(X\) be a random variable with values in a measurable space \((\mathcal X, \mathcal A)\). Let say for simplicity that \(\mathcal X = \mathbb R^n\).

We say that \(X\) has distribution \(P\) if \(\mathbb P(X \in A)=P(A)\) for any measurable set \(A\). For clarity, we sometimes write \(\mathbb P_{X \sim P}(X \in A)\), which means “the probability of \(X\) being in \(A\) if \(X\) follows distribution \(P\)”. Sometimes, we do the slight abuse of notation by writing that \(P(A) = P(X \in A)\).

\(P\) can be seen as the “pushforward” measure of the common probability measure \(\mathbb P\) by random variable \(X\), since by definition, \(\mathbb P(X \in A) = \mathbb P(\{\omega \in \Omega, X(\omega) \in A\})= \mathbb P(X^{-1}(A))\).

Continuous Densities

A measure \(P\) has density \(p\) with respect to the Lebesgue measure if for any event \(A\) (which is simply a measurable set of \(\mathbb R^n\)), \[P(A)= \int_{x \in \mathbb R^n}\mathbf 1\{x \in A\}p(x)dx \; .\] \(p(x)\) is sometimes called the likelihood of a random variables that has distribution \(P\) at point \(x\). An equivalent condition is that for any “kind” real valued function \(f\) (e.g. continuous with bounded support), \[\mathbb E_{X \sim P}[f(X)] \stackrel{\mathrm{def}}{=} \int_{x \in \mathbb R^n}f(x)dP =\int_{x \in \mathbb R^n}f(x)p(x)dx \; .\] We write that \[dP(x) = p(x)dx ~~~ \text{or}~~~~ \tfrac{dP}{dx}(x) = p(x) \; ,\] and \(\tfrac{dP}{dx}\) is called the Radon-Nikodym derivative of \(P\) with respect to the Lebesgue measure. The intuition of this abstract notation is the following. If \(x \in \mathbb R^n\) and \(h\) is a small quantity that goes to \(0\), \(dP\) represents the measure of the interval \([x, x+h]\), with respect to the measure \(P\). Then, \(d P(x) = P([x, x+h]) = \int_x^{x+h}p(x)dx \sim p(x)h\).

The Counting Measure for Discrete Random Variables

Random variables in \(\mathbb{R}\) that take on a finite number of values are referred to as discrete random variables, and they do not have a density with respect to the Lebesgue measure. However, this case is much simpler and is handled within measure theory using the counting measure. As its name indicates, the counting measure \(\mu\) on \(\mathcal X=\mathbb R^n\) counts the elements of a given set \(A\): \[ \mu(A) = |A| \enspace .\] In particular, \(\mu(A)\) is infinite if \(A\) is an infinite set.

Let \(X\) be a discrete random variable that takes values in \(\{x_1, \dots, x_N\}\), e.g. a Bernoulli, Binomial or Poisson random variable, and let \(P\) be its probability distribution. Let \(p(x_i)\) be the probability that \(X = x_i\), that is \(p(x_i) = P(\{x_i\}) = P(X = x_i) \in [0,1]\). In this discrete case, the probability \(p(x_i)\) represents the likelihood of the value \(x_i\) for the random variable \(X\). While \(X\) has not a density with respect to the Lebesgue measure, it has density \(p\) with respect to the counting measure \(\mu\), that is \[\mathbb E_{X \sim P}[f(X)] =\int_{x \in \mathbb R^n}f(x)p(x)d\mu, ~~~~~~~ \frac{dP}{d\mu}(x) = p(x) \; .\] \[ \] ### The General Radon Nikodym Theorem

The Radon Nikodym Theorem tells us that any probability \(P\) admits a density with respect to a given measure \(\nu\) if it is absolutely continuous with respect to \(\nu\), that is \[ \nu(A) = 0 \implies P(A) = 0 \; .\] In this case, the density is the Radon Nikodym derivative \(\frac{dP}{d\nu}\) of \(P\) with respect to \(\nu\) and satisfies

\[ \mathbb E_{X \sim P}[f(X)] =\int_{x \in \mathbb R^n}f(x)\frac{dP}{d\nu}(x)d\nu \; .\] Informally, the \(d\nu\) simplify so that \(\frac{dP}{d\nu}d\nu\) = \(dP\).

Generalized Likelihood Ratio

If \(P\) and \(Q\) are two probability measures such that \(P\) is absolutely continuous with respect to \(Q\), then the Radon Nikodym derivative \(\frac{dP}{d\nu}\) of \(P\) with respect to \(Q\) is a generalized likelihood ratio.

If \(P\) and \(Q\) are both absolutely continuous with respect to another measure \(\nu\) (for example the Lebesgue measure), then the generalized likelihood ratio can be written \[ \frac{dP}{dQ} = \frac{\frac{dP}{d\nu}}{\frac{dQ}{d\nu}} \; . \]

In particular, if \(P\) and \(Q\) have positive densities \(p\) and \(q\) with respect to the Lebesgue measure, that is \(\frac{dP}{dx} = p(x)\) and \(\frac{dQ}{dx} = q(x)\), then

\[ \frac{dP}{dQ}(x) = \frac{p(x)}{q(x)} \; . \]

In particular, the likelihood ratio does not depend on the reference measure (here Lebesgue).

Change of Measure

If \(\mathbb E_{P}\) (resp. \(\mathbb E_{Q}\)) denotes the expectation when the random variable \(X\) follows distribution \(P\) (resp. \(Q\)), then for any measurable and bounded function \(f\),

\[\mathbb E_{P}[f(X)] = \mathbb E_{Q}\left[f(X) \frac{dP}{dQ}(X)\right] \; .\] In other words, we simply replace the real random variable \(f(X)\) by the random variable \(f(X) \frac{dP}{dQ}(X)\) when we observe \(X\) under \(Q\) instead of \(P\). This results directly follows from Radon-Nikodym: \[ \int_{x \in R^n} f(x) dP(x) = \int_{x \in R^n} f(x) \frac{dP}{dQ}(x) dQ(x) \; . \]