In today's lecture, we will begin our discussion on Linear Regression and briefly discuss the concept of regularization.
Linear Regression is a linear method to predict a real-valued output for a given vector of inputs. It is defined as
$$ f(x) = w_0 + \sum_{i=1}^{D} w_i x_i. $$
Alternatively, if do not have a bias term $w_0$, it can be represented as
$$ \begin{aligned} f(x) =& \sum_{i=1}^{D} w_i x_i \\ =& w ^\top x. \end{aligned} $$
Here $w$ is the weight vector and $x$ is a vector of inputs.
Given the above model formulation, our objective is to utilize the training data, to learn a predictor. Assume that we are provided with a datastet
$$ (x^{(1)}, y^{(1)}), (x^{(2)}, y^{(2)}), \dots , (x^{(N)}, y^{(N)}) $$
where, the input and the output are defined as, $x^{(n)} \in \mathbb{R}^D$ and $y^{(n)} \in \mathbb{R}$ respectively.
Our strategy for linear regression will be as follows:
To improve the generalization performance we will modify this procedure to include regularization and model selection methods.
Here are two common ways to measure how well the loss fits to training data.
Mean Absolute Error
$$ \mathrm{MAE} = \frac{1}{N} \sum_{n=1}^{N} \vert y^{(n)} - f(x^{(n)}) \vert $$
Mean Square Error