In today's lecture, we will begin our discussion on Linear Regression and briefly discuss the concept of regularization.

1. Linear Regression

Linear Regression is a linear method to predict a real-valued output for a given vector of inputs. It is defined as

$$ f(x) = w_0 + \sum_{i=1}^{D} w_i x_i. $$

Alternatively, if do not have a bias term $w_0$, it can be represented as

$$ \begin{aligned} f(x) =& \sum_{i=1}^{D} w_i x_i \\ =& w ^\top x. \end{aligned} $$

Here $w$ is the weight vector and $x$ is a vector of inputs.

Given the above model formulation, our objective is to utilize the training data, to learn a predictor. Assume that we are provided with a datastet

$$ (x^{(1)}, y^{(1)}), (x^{(2)}, y^{(2)}), \dots , (x^{(N)}, y^{(N)}) $$

where, the input and the output are defined as, $x^{(n)} \in \mathbb{R}^D$ and $y^{(n)} \in \mathbb{R}$ respectively.

Our strategy for linear regression will be as follows:

First, we define a quantitative measure of how well a given set of parameters $w$ fits to the a dataset. This is called a loss function.
Second, we find the parameters $w$ that minimize that measure on the training data.
Third we hope that those parameters perform well on test data.

To improve the generalization performance we will modify this procedure to include regularization and model selection methods.

2. Quality of Fit

Here are two common ways to measure how well the loss fits to training data.

Mean Absolute Error

$$ \mathrm{MAE} = \frac{1}{N} \sum_{n=1}^{N} \vert y^{(n)} - f(x^{(n)}) \vert $$

Mean Square Error