CS688 - message-passing on a chain.ipynb
We will learn the Message-Passing algorithm with the help of a specific example as defined in the below graph (message-passing on a chain).
The "chain" example is an undirected model with 4 variables. And if we recall the factorized form for an undirected model, we have univariate terms and pair-wise terms in the "chain" example. In this specific example, we have the univariate term as
$$ \begin{aligned}
\phi_i (x_i) = \begin{cases} 1, x_i = 0 \\ 2, x_i = 1 \end{cases}
\end{aligned}. $$
And we have the pair-wise term as
$$ \begin{aligned}
\phi_{i,j} (x_i, x_j) = \begin{cases} 1, x_i \neq x_j \\ 2, x_i = x_j \end{cases}
\end{aligned}. $$
Note: From the univariate terms and the pair-wise terms in this example, we know that this model "encourages" the individual variables to be 1, and "encourages" the connecting pairs to be of the same value. And if we expand the factorized form of the joint distribution, we have the joint probability $p(x)$ as
$$ \begin{aligned}
p(x) &= \frac{1}{Z} \prod_i \phi_i(x_i) \prod_{(i, j) \in \mathrm{pairs}} \phi_{i, j} (x_i, x_j) \\ &=\frac{1}{Z} \phi_1(x_1) \phi_2(x_2) \phi_3(x_3) \phi_4(x_4) \phi_{1,2}(x_1, x_2) \phi_{2,3}(x_2, x_3) \phi_{3,4}(x_3, x_4),
\end{aligned} $$
where $Z$ is the normalizing constant.
Note: In the factorized form of $p(x)$, there is no terms involving more than 2 variables such as $\phi_{i,j,k}$ because in this "chain" example, we only have cliques of size 2. And we have the univariate terms written out for the benefit of our demonstration/derivation of the algorithm. However, we do have the option to "absorb" the univariate terms into the pair-wise terms to match the "max-clique factorization" form we have in Hammersley-Clifford Theorem.
To demonstrate the message-passing algorithm, it is easier to start with the computation of $Z$. To compute it with a brute approach, we will have
$$ Z = \sum_{x_1} \sum_{x_2} \sum_{x_3} \sum_{x_4} \phi_1(x_1) \phi_2(x_2) \phi_3(x_3) \phi_4(x_4) \phi_{1,2}(x_1, x_2) \phi_{2,3}(x_2, x_3) \phi_{3,4}(x_3, x_4) $$
Question: Is there a way to simply the computation of $Z$ and how?
It turns out that we can move the summation around without changing the final evaluation of $Z$, and by doing so, we can isolate/group terms involving only $\mathsf{x}4$ into $\sum{x_4} \phi_4(x_4) \phi_{3,4}(x_3, x_4)$, and we can compute that as a function of $\mathsf{x}_3$ and call it the message passed from $\mathsf{x}_4$ to $\mathsf{x}3$, and write it with a notation as $m{4 \rightarrow 3} (x_3)$. And following the same logic, we will be able to calculate the "message" passed along the chain from $\mathsf{x}_4$ all the way to $\mathsf{x}_1$, the detailed derivation is as