Introduction to the Generalized Linear Model

\(\newcommand{\VS}{\quad \mathrm{VS} \quad}\) \(\newcommand{\and}{\quad \mathrm{and} \quad}\) \(\newcommand{\E}{\mathbb E}\) \(\newcommand{\P}{\mathbb P}\) \(\newcommand{\Var}{\mathbb V}\) \(\newcommand{\Cov}{\mathrm{Cov}}\) \(\newcommand{\1}{\mathbf 1}\)

Limits of the Linear Model

Recalling the Linear Model

. . .

We observe \(Y = (Y_1, \dots, Y_n)\) and \(X = (X^{(1)} , . . . , X^{(p)}) \in \mathbb R^{n \times p}\),

. . .

In the Linear Model, We assume that for some unknown \(\beta\), \(\sigma^2\) where \(\varepsilon \sim \mathcal N(0, \sigma^2 I_n)\),

\[Y = X\beta + \varepsilon\]

. . .

The hypothesis can be written in the form \(\mathbb E[Y|X] = X\beta\)

. . .

The OLS estimator of \(\beta\) is \(\hat \beta= (X^TX)^{-1}X^TY\)

When this Linearity is Reasonable

. . .

The hypothesis \(\E(Y|X) = X^T\beta\) in linear regression models implies that \(\E(Y|X)\) can take any real value.

. . .

This is not a restriction when:

\(Y|X\) follows a Gaussian distribution
\(Y|X\) follows any other “nice” continuous distribution on \(\mathbb{R}\)

When it is not

. . .

The linear assumption is inappropriate for certain variables \(Y\), particularly when \(Y\) is qualitative or discrete.

When it is not

. . .

Binary outcomes (\(Y = 0\) or \(1\)):

disease presence

. . .

Categorical outcomes (\(Y \in \{A_1, \ldots, A_k\}\)):

Transportation choice: \(\{\) Car, Bus, Bike \(\}\)

. . .

Count data (\(Y \in \mathbb{N}\)):

Number of traffic accident per month

Objectives of the GLM

Key Differences by Response Type

. . .

In all examples, the objective remains to link \(Y\) to \(X = (X^{(1)}, \ldots, X^{(p)})\) through modeling \(\E(Y|X)\).

. . .

However, \(\E(Y|X)\) has different interpretations depending on the situation:

Binary \(Y\): \(Y = 0\) or \(1\)
Categorical \(Y\): \(Y \in \{A_1, \ldots, A_k\}\)
Count data: \(Y \in \mathbb{N}\)

. . .

In all these cases, the linear model \(\E(Y|X) = X^T\beta\) is inappropriate.

Objectives of the GLM

. . .

We model \(\E(Y|X)\) differently using generalized linear models.

As in linear regression, we focus on:

Specific effects: The individual effect of a given regressor, (all other things being equal)
Explanation: Understanding relationships
Prediction: Forecasting outcomes

Modeling \(\E(Y|X)\): Three Fundamental Cases

Three Fundamental Cases

. . .

We detail the modeling challenges for \(\E(Y|X)\) in three fundamental cases:

Case 1: Binary: \(Y\) is binary (takes values 0 or 1)
Case 2: Categorical: \(Y \in \{A_1, \ldots, A_k\}\) (general qualitative variable)
Case 3: Count: \(Y \in \mathbb{N}\) (count variable)

Case 1: Binary Case

. . .

Without loss of generality, \(Y \in \{0, 1\}\)

. . .

If \(Y\) models membership in a category \(A\), this is equivalent to studying the variable \(Y = \mathbf{1}_A\)

. . .

The distribution of \(Y\) given \(X = x\) is entirely determined by \(p(x) = P(Y = 1|X = x)\)

. . .

We deduce \(P(Y = 0|X = x) = 1 - p(x)\)

. . .

\(Y|X = x\) follows a Bernoulli distribution with parameter \(p(x)\)

. . .

\(\E(Y|X = x) = p(x)\)

. . .

key constraint: \(p(x) \in [0, 1]\)

Modelling \(p(x)\)

. . .

\[E(Y|X = x) = P(Y = 1|X = x) = p(x) \in [0, 1]\]

. . .

What NOT to do: \(p(x) = x^T\beta\) (for some \(\beta \in \mathbb{R}^p\) to be estimated)

. . .

Proposed approach: We can propose a model of the type:

\[p(x) = f(x^T\beta)\]

where \(f\) is a function from \(\mathbb{R}\) to \([0, 1]\)

. . .

Benefits: Coherent model that depends only on \(\beta\)

Case 2: Categorical \(Y\)

. . .

If \(Y\) represents membership in \(k\) different classes \(A_1, \ldots, A_k\), its distribution is determined by the probabilities:

\[p_j(x) = P(Y \in A_j|X = x), \quad \text{for } j = 1, \ldots, k\]

. . .

Constraint: \(\sum_{j=1}^{k} p_j(x) = 1\) (If \(k = 2\), this reduces to the previous case)

. . .

Case 2: Categorical \(Y\)

. . .

\(Y = (\mathbf{1}_{A_1}, \ldots, \mathbf{1}_{A_k})\) follows a multinomial distribution and:

\[\E(Y|X = x) = \begin{pmatrix} p_1(x) \\ \vdots \\ p_k(x) \end{pmatrix}\]

. . .

Case 2: Model for Categorical \(Y\)

. . .

To model \(\E(Y|X = x)\), it suffices to model \(p_1(x), \ldots, p_{k-1}(x)\) since \(p_k(x) = 1 - \sum_{j=1}^{k-1} p_j(x)\)

. . .

Proposed model: As in the binary case, we can propose:

\[p_j(x) = f(x^T\beta_j), \quad j = 1, \ldots, k-1\]

where \(f: \mathbb{R} \to [0,1]\)

. . .

Parameters: There will be \(k-1\) unknown parameters to estimate, each in \(\mathbb{R}^p\)

Case 3: Count Y - Non-negative Integer Values

. . .

If \(Y\) takes integer values, we have for all \(x\), \(E(Y|X = x) \geq 0\)

. . .

Coherent choice: A coherent approach is:

\[\E(Y|X = x) = f(x^T\beta)\]

where \(f\) is a function from \(\mathbb{R}\) to \([0, +\infty)\)

. . .

Example of possible choice for f: The exponential function

Model Formulation

. . .

Let \(g\) be a strictly monotonic function, called the link function

. . .

A generalized linear model (GLM) establishes a relationship of the type:

\[g(\E(Y|X = x)) = x^T\beta\]

Equivalently,

\[\E(Y|X = x) = g^{-1}(x^T\beta)\]

. . .

Remarks on Model

In the previous examples, \(g^{-1}\) was denoted \(f\)
We generally assume that the distribution \(Y|X\) belongs to the exponential family, with unknown parameter \(\beta\)
This allows us to compute the likelihood and facilitates inference

General Objectives

. . .

In a GLM model, the goal is to estimate \(\beta \in \mathbb{R}^p\)

. . .

Using \(n\) independent observations of \((Y, X)\), we use maximum likelihood estimation (the distribution of \(Y|X\) being known up to \(\beta\))

. . .

The link function \(g\) is not to be estimated: we choose it according to the nature of the data.

. . .

Inference and diagnostic tools are available (as in linear regression)

Remark on the Intercept

. . .

Among the explanatory variables \(X^{(1)}, \ldots, X^{(p)}\), we often assume that \(X^{(1)}=\1\) to account for the presence of a constant. Thus: \[X\beta = \beta_1 X^{(1)} + \cdots + \beta_p X^{(p)}\]

. . .

Alternative notation: Sometimes indexed differently to write \(\beta_0 + \beta_1 X^{(1)} + \cdots + \beta_p X^{(p)}\)

Example 1: Linear Regression Model

. . .

Link function: We recover linear regression by taking the identity link function \(g(t) = t\)

. . .

Expected value: Then:

\[E(Y|X = x) = g^{-1}(x^T\beta) = x^T\beta\]

. . .

In the Gaussian linear model:

\[Y|X \sim \mathcal{N}(X\beta, \sigma^2)\]

. . .

Linear regression is therefore a special case of GLM models!

Example 2: Binary Case \(Y \in \{0, 1\}\)

. . .

Link function requirement: The link function \(g\) must satisfy:

\[E(Y|X = x) = g^{-1}(x^T\beta) \in [0, 1]\]

. . .

Since \(Y\in \{0,1\}\), \(Y|X\) follows a Bernoulli distribution

\[Y|X \sim \mathcal{B}(g^{-1}(X^T\beta))\]

Example 2: Binary Case \(Y \in \{0, 1\}\)

. . .

\[Y|X \sim \mathcal{B}(g^{-1}(X^T\beta))\]

Possible choices for \(g^{-1}\): A CDF of a continuous distrib. on \(\mathbb{R}\)

. . .

Standard choice for \(g^{-1}\): The CDF of a logistic distribution:

\[g^{-1}(t) = \frac{e^t}{1 + e^t} \quad \text{i.e.} \quad g(t) = \ln\left(\frac{t}{1-t}\right) = \text{logit}(t)\]

. . .

This leads to the logistic model, the most important model in this chapter

Example 3: Count Data \(Y\in \mathbb N\)

. . .

Link function: \(g(t) = \ln(t)\), \(g^{-1}(t) = e^t\) gives:

\[E(Y|X = x) = g^{-1}(x^T\beta) = e^{x^T\beta}\]

. . .

For the distribution of \(Y|X\), defined on \(\mathbb{N}\), we often assume it follows a Poisson distribution (exp. familily)

. . .

In this context:

\[Y|X \sim \mathcal{P}(e^{X^T\beta})\]

Summary

There are 2 choices to make when setting up a GLM model:

The distribution of \(Y|X\)
The link function \(g\) defining \(E(Y|X) = g^{-1}(X^T\beta)\)

. . .

Key insight: The second choice is linked to the first

The 3 Common Cases

. . .

Binary (\(Y \in \{0, 1\}\)):

→ \(Y|X\): it’s a Bernoulli distribution
→ By default \(g = \text{logit}\) (see later)

. . .

Multi-category (\(Y \in \{A_1, \ldots, A_k\}\)):

→ \(Y|X\): it’s a multinomial distribution
→ By default \(g = \text{logit}\)

. . .

Count (\(Y \in \mathbb{N}\)):

→ \(Y|X\): Poisson (often) or negative binomial
→ Choice of \(g\): by default \(g = \ln\)

logistic model

Limits of the Linear Model

Recalling the Linear Model

When this Linearity is Reasonable

When it is not

When it is not

Objectives of the GLM

Key Differences by Response Type

Objectives of the GLM

Modeling \(\E(Y|X)\): Three Fundamental Cases

Three Fundamental Cases

Case 1: Binary Case

Modelling \(p(x)\)

Case 2: Categorical \(Y\)

Case 2: Categorical \(Y\)

Case 2: Model for Categorical \(Y\)

Case 3: Count Y - Non-negative Integer Values

Model Formulation

Remarks on Model

General Objectives

Remark on the Intercept

Example 1: Linear Regression Model

Example 2: Binary Case \(Y \in \{0, 1\}\)

Example 2: Binary Case \(Y \in \{0, 1\}\)

Example 3: Count Data \(Y\in \mathbb N\)

Summary

The 3 Common Cases

Next