Sampling Techniques

TagsCS 228Inference

Sampling?

In the following sections, we will be using sampling methods to do approximate inference. Often, this requires you to sample from a distribution. We will first talk generally about when things are tractable and when they are not, and then we will talk about specific structures.

Tractability

Generally speaking, if you have two sets of variables X,EX, E and we want P(E)P(E), it is not tractable because you have to marginalize over XX. We can estimate this distribution using Monte Carlo. We will talk about this later. Essentially you will need to sample from a different distribution. But how do you do that?

Bayes network

In general, Bayes networks are pretty easy to sample from. Fully observed, you can also find the probabilities very easily of the joint distribution.

Special case: when it’s a tree (and no observation)

If it’s a tree, just start by sampling at the root and propagating the values downwards by filling in the sampled values into the CPDs.

No observation

More generally, you just have to sort the nodes topologically and then do an in-order sampling.

With observed nodes

If you need to compute P(xe)=P(x,e)P(e)P(x | e) = \frac{P(x, e)}{P(e)}, which can be intractable. It’s even worse with sampling, because you need to propagate the observation backwards. For example, if you had XYZEX → Y → Z → E, then P(ZE)=P(EZ)P(Z)/ZP(EZ)P(Z)P(Z | E) = P(E | Z)P(Z) / \sum_Z P(E | Z)P(Z). None of these are easy to compute at all!

Markov Random Field

MRFs are pretty hard to sample from, but we have a few techniques!

Brute force

Find the joint factor, marginalize out one variable, sample, plug back in, and then sample another one, etc. This is intractable, but it’s the baseline algorithm

Special case: when there are no cycles

If there are no cycles (i.e. it’s a tree), you can do the following algorithm

  1. Run message passing up to the root
  1. Sample from the root
  1. Use the root value and the upward messages to get distributions for the root’s childrens. Sample from those
  1. Propagate this recursively until you reach the leaves

Note that you can’t just do message passing and sample from each individual distribution, because that assumes independence. Rather, you must propagate the sampled values down the line

With observed nodes

The key insight is that messages stop at observed nodes. In other words, τAB\tau_{A→B} is typically expressed in the sum-product notation, but if AA is observed, then we only need the binary factor ϕ(A,B)\phi(A, B) and plug in ϕ(B)=ϕ(A=a,B)\phi’(B) = \phi(A = a, B).

At the risk of sounding a bit repetitive, if you are given a non-cyclical MRF, you can run message passing but every time you encounter an evidence node, you replace the message with a direct plug-in of the observation.

When you reach the root, sample as usual. Now, every sampled value becomes part of the “evidence”. You plug this sampled values into the factors of the children, and then you can sample from the children, and so on and so forth.

The diagram below illustrates sampling with two evidence nodes. The third picture shows how a sampling operation turns a variable into an evidence node.

Junction trees

The reason why we create junction trees is that the tree structure lends itself quite well with marginalization and sampling, while the cyclical structure is less convenient.

If we do have a junction tree, you would augment the sampling process in this way: