We observe \(n\) individuals, and variables \(Y \in \mathbb R^n\) and \((X^{(1)}, \dots, X^{(p)}) \in \mathbb R^{n \times p}\).
In other words, we observe
\(Y= (Y_1, \dots, Y_n) \in \mathbb R^n\)
\(X^{(1)} = (X^{(1)}_1, \dots, X^{(1)}_n) \in \mathbb R^n\)
\(X^{(2)} = (X^{(2)}_1, \dots, X^{(2)}_n) \in \mathbb R^n\)
…
\(X^{(p)} = (X^{(p)}_1, \dots, X^{(p)}_n)\in \mathbb R^n\)
We observe \(n\) individuals, and variables \(Y \in \mathbb R^n\) and \((X^{(1)}, \dots, X^{(p)}) \in \mathbb R^{n \times p}\).
We assume that
\[Y_i = F(X^{(1)}_i, X^{(2)}_i, \dots, X^{(p)}_i, \varepsilon_i)\]
where \(\varepsilon = (\varepsilon_1, \dots, \varepsilon_n) \in \mathbb R^n\) are iid random noise
\(\varepsilon\) is not observed
\(F\) is unknown
–> Too ambitious, risk of overfitting
We observe \(n\) individuals, and variables \(Y \in \mathbb R^n\) and \((X^{(1)}, \dots, X^{(p)}) \in \mathbb R^{n \times p}\).
We assume that
\[Y = \beta_1 X^{(1)}+ \beta_2 X^{(2)}+ \dots+ \beta_p X^{(p)}+ \varepsilon\]
That is, we know that \(F\) is of the form \(F(x_1, \dots, x_p, \varepsilon) = \beta_1 x_1+ \beta_2 x_2+ \dots+ \beta_p x_p+ \varepsilon\)
\(Y\) and the \(X^{(k)}\)’s are vectors in \(\mathbb R^n\).
For all \(i\),
\[Y_i = \beta_1 X^{(1)}_i+ \beta_2 X^{(2)}_i+ \dots+ \beta_p X^{(p)}_i+ \varepsilon_i\]
We assume that
\(Y\) and the \(X^{(k)}\)’s are vectors in \(\mathbb R^n\).
If we set \(X^{(1)}= (1, \dots, 1)\), then the model rewrites
\[Y_i = \beta_1+ \beta_2 X^{(2)}_i+ \dots+ \beta_p X^{(p)}_i+ \varepsilon_i\]
We write \(Y = (Y_1, \dots, Y_n)\) and \(X^{(k)} = (X^{(k)}_1, \dots, X^{(k)}_n)\) as columns:
\(\newcommand{\VS}{\quad \mathrm{VS} \quad}\) \(\newcommand{\and}{\quad \mathrm{and} \quad}\) \(\newcommand{\E}{\mathbb E}\) \(\newcommand{\P}{\mathbb P}\) \(\newcommand{\Var}{\mathbb V}\)
\[Y = \begin{pmatrix} Y_1 \\ \vdots \\ Y_n \end{pmatrix} \and X^{(k)}=\begin{pmatrix} X^{(k)}_1 \\ \vdots \\ X^{(k)}_n \end{pmatrix}\]
To get a matrix form, we write \(X_{ik} = X^{(k)}_i\). Then:
\(\newcommand{\VS}{\quad \mathrm{VS} \quad}\) \(\newcommand{\and}{\quad \mathrm{and} \quad}\)
\[Y = \begin{pmatrix} Y_1 \\ \vdots \\ Y_n \end{pmatrix} \and X^{(k)}=\begin{pmatrix} X_{1k} \\ \vdots \\ X_{n,k} \end{pmatrix}\]
To get a matrix form, we write \(X\) for the matrix \((X_{ik}) \in \mathbb R^{n \times p}\)
That is, \(X = (X^{(1)}, \dots, X^{(p)})\)
And:
\[Y = \begin{pmatrix} Y_1 \\ \vdots \\ Y_n \end{pmatrix} \and X=\begin{pmatrix} &X_{1,1} &\dots &X_{1,p} \\ &\vdots &~ &\vdots \\ &X_{n,1} &\dots &X_{n,p} \end{pmatrix}\]
Let \(\beta = (\beta_1, \dots, \beta_p) \in \mathbb R^p\) be unknown parameters, and \(\varepsilon = (\varepsilon_1, \dots, \varepsilon_n)\) be iid noise.
In column notation:
\[\beta = \begin{pmatrix} \beta_1 \\ \vdots \\ \beta_p \end{pmatrix} \and \varepsilon=\begin{pmatrix} \varepsilon_{1} \\ \vdots \\ \varepsilon_{n} \end{pmatrix}\]
We observe \(Y = (Y_1, \dots, Y_n) \in \mathbb R^n\) and \(X \in \mathbb R^{n \times p}\)
We assume that
\[Y = X \beta + \varepsilon\]
where
Recall that \(X \in \mathbb R^{n \times p}\)
We assume that \(rk(X)=p\).
This implies \(p \leq n\)
If this condition is not satisfied:
It means that there is a linear relation between the \(X^{(k)}\)!
It means that \(X\alpha=\alpha_1X^{(1)} + \dots + \alpha_p X^{(p)}=0\) for some \(\alpha \in \mathbb R^p\setminus \{0\}\)
We can take infinitely many possible \(\beta\), since for \(t \in \mathbb R\),
\[ X(\beta + t\alpha) = X\beta \]