In this lecture, we will finish learning about maximum likelihood theory, and start learning about Markov chain Monte Carlo (McMC).

1. Maximum Likelihood Theory

Recap

In the last lecture, we learned some important properties of the maximum likelihood (ML) estimators. Let $\hat{\theta_n}$ be the ML estimator learned with dataset $x^{(1)},\dots,x^{(n)}$, such that

$$ \hat\theta_{n} = \argmax_{\theta} L_n(\theta). $$

And by applying first-order Taylor series expansion and Central Limit Theorem (CLT), we know that the ML estimator is asymptotically normally distributed such that

$$ \begin{aligned} \sqrt{n}(\hat\theta_n - \theta^) \xrightarrow{D} \mathcal{N}(0, I(\theta^)^{-1}). \end{aligned} $$

And $I(\theta)$ is Fisher information which is defined as

$$ \begin{aligned} I(\theta) &= -\mathbb E_{p_\theta(\mathsf{x})}[\nabla_\theta^2 \log p_\theta(\mathsf{x})]. \end{aligned} $$

And lastly for exponential families, we have fisher information also as

$$ \begin{aligned} I(\theta) = \nabla_\theta^2 A(\theta).\\ \end{aligned} $$

Example: Bernoulli

The bernoulli distribution defined as

$$ \begin{aligned} p_\theta(x) =\begin{cases} \theta, &x = 1 \\ 1-\theta, &x = 0 \end{cases}. \end{aligned} $$

To understand the ML estimator for the bernoulli distribution, we need to calculate the fisher information which happens to be the only thing we need to calculate. So we can use $I(\theta) = -\mathbb E_{p_\theta(\mathsf{x})}[\nabla_\theta^2 \log p_\theta(\mathsf{x})]$ to calculate the fisher information as

$$ \begin{aligned} I(\theta) &= -\mathbb E_{p_\theta(\mathsf{x})} \frac{\partial^2}{\partial \theta^2} \log p_\theta (\mathsf{x})\\ &= - \theta \frac{\partial^2}{\partial \theta^2} \log \theta - (1 - \theta) \frac{\partial^2}{\partial \theta^2} \log (1 - \theta) \\ &= \frac{1}{\theta} + \frac{1}{1-\theta} = \frac{1}{\theta(1-\theta)}. \end{aligned} $$

Since we found the Fisher information, we can plot it as in the figure below.

https://s3-us-west-2.amazonaws.com/secure.notion-static.com/c405c2d3-91af-4557-8bd4-93fc0c664303/lec15_fig1.svg

https://s3-us-west-2.amazonaws.com/secure.notion-static.com/5a99a366-8975-4713-905a-158fed2f720b/lec15_fig1_v2.svg

And as we have learned the ML solution for bernoulli distribution is: $\hat{\theta}_n = \hat{\mathbb{E}} [\mathsf{x}]$, we can start asking some questions about the ML estimator for the bernoulli distribution.