This course is divided into five modules.
Regression, and classification are supervised learning. This means you have some set of input variables you are using to predict some other output variables. In unsupervised learning, all you have is input variables, and you are looking for structure within them. Probabilistic methods and kernel methods address both unsupervised and supervised learning, though we'll mostly see supervised in this class.
To begin, we give a brief overview of all five modules of the course.
Imagine that we have data from the last year of this course. We measure each student's Math and Python skills at the beginning of the class along with their final grade at the end. We can picture this in the following graph where Math and Python are plotted along the x and y axes, and the final grade of each student is labeled.
Figure 1. Students' grades in machine learning class depending on Math and Python skills. The horizontal axis is a student's math skill and the vertical axis is a student's python skill when they joined the course. The output (final grade in the class) is indicated near each point.
The goal of regression is to find a function $f$ such that
$$
\mathrm{grade} \approx f(\mathrm{math},\mathrm{python}). $$
We can think of this as the problem of "curve fitting". This is a descriptive name, but always remember that we can have more than one input dimension. In this case, we want to find a two-dimensional surface that approximates the data.
So how could we do this? Let's talk about two simple regression methods.
The first is linear regression. Here, our goal would be to find three numbers $w_0$, $w_1$, $w_2$ such that
$$ \mathrm{grade} \approx w_0+w_1 \times\mathrm{math}+w_2 \times\mathrm{python}. $$
(Technically we should probably call this "affine regression" but never mind.) How exactly to find these numbers, we will talk later. But basically, one would take the data set, search for parameters $w_0$, $w_1$, $w_2$ that fit the training data well, and then use them to predict new data. That's linear regression.
The second simple method is nearest neighbors. Imagine that there is a new student with the math and python skill corresponding to the red dot shown in the following graph.