Today we will introduce the first of our two types of graphical models, the directed model. Before we can define them, we need to introduce the main ideas we will need.

1. Setup

DAGs

Recall from last time that a directed acyclic graph (DAG) is a set of nodes along with a set of parents for each node, such that there are no directed cycles. It's critical to be clear on the distinction between a cycle and a directed cycle. Recall our example graph from last time.

A directed acyclic graph (DAG).

A directed acyclic graph (DAG).

This has a cycle (1-2-3-4-1). However, it has no directed cycles since you can never get back to the same node if you follow the arrows along the direction they point. So the above graph is a DAG. In contrast, consider the following graph.

A directed cyclic graph.

A directed cyclic graph.

Here you could go around and around in a loop forever. So this is not a DAG.

Along with the idea of DAGs, our definition of a directed graphical model will use two other ideas: Topological orderings, and the chain rule of probability.

Topological Orderings

Claim. If you have a DAG, then it is possible to choose a numbering of the nodes of the graph such that all parents have lower numbers than their children. This numbering is called a topological ordering.

It is important to know that in a general DAG, parents do not need to have lower numbers. The point here is that it is always possible to choose such numbering so that parents do. We will use this property below.

The Chain Rule of the Probability

The chain rule of probability is just a way of writing a joint distribution as a product of a bunch of conditional distributions. Given some distribution, we can always write that

$$ \begin{aligned}

p(x_1, x_2, x_3, \cdots, x_D) = p(x_1)p(x_2|x_1) p(x_3|x_1, x_2) \cdots p(x_D| x_1, x_2, \cdots, x_{D-1}). \end{aligned}

$$

This is always true, and doesn't require any assumptions. We can write the same equation in a more compact form as

$$ \begin{aligned}

p(x_1, x_2, x_3, \cdots, x_D) = \prod_{i=1}^{D} p(x_i| x_1, x_2, \cdots, x_{i-1}). \end{aligned} $$

2. Directed Models