Example 1 (Glauber Dynamics for Ising Model)

States:

S = {\underline{σ} = (σ_{1}, \dots, σ_{m}) ∣ σ_{j} = + 1, - 1}

Target:

π, π (\underline{σ}) = \frac{b ( σ )}{Z} > 0 \forall \underline{σ} \in S

Notation:

\underline{σ}^{j, + 1} \underline{σ}^{j, - 1} = (σ_{1}, \dots, σ_{j - 1}, + 1, σ_{j + 1}, \dots, σ_{m}) = (σ_{1}, \dots, σ_{j - 1}, - 1, σ_{j + 1}, \dots, σ_{m})

Here we have a high dimensional state space. The goal is to sample from a distribution $π$ proportional to the weight function $b$ . The constant $Z$ , of course, is the partition function, which is often impossible to compute. We use the following notation to represent updates to the entire system.

Glauber Dynamics (Special Case of Gibbs Sampler)

The idea is to use conditional distributions to determine the probability of spin $σ_{j}$ or $V_{j}$ taking value $τ$ where $τ \in {+ 1, - 1}$ . For simplicity, let $\underline{σ} = \underline{V}$ and $\underline{V} \sim π$ , so $\underline{V} = (V_{1}, \dots, V_{m}) \in S$ . Consider the condition distribution of $V_{j}$ given by $V_{i}$ for $i \neq = j$ :

P (V_{j} = τ ∣ V_{1}, \dots V_{j - 1}, V_{j + 1}, \dots, V_{m}) = \frac{π ( V _{1} , \dots , V _{j - 1} , τ , V _{j + 1} , \dots , V _{m} )}{π ( V _{1} , \dots , + 1 , \dots , V _{m} ) + π ( V _{1} , \dots , - 1 , \dots , V _{m} )}

where the $+ 1, - 1$ in the denominator denotes the $j$ th entry. This is only nonzero for $τ = + 1, - 1$ . Let

π_{j} (τ ∣ σ_{1}, \dots, σ_{m - 1}) = P (V_{j} = τ ∣ (V_{1}, \dots, V_{j - 1}, V_{j + 1}, \dots, V_{m}) = (σ_{1}, \dots, σ_{m - 1}))

for $\underline{V} \sim π$ . Caution: Some conventions use $π (\cdot ∣ \cdot)$ instead of $π_{j} (\cdot ∣ \cdot)$ .

Glauber Dynamics (Random-scan Gibbs sampler)

Given $X_{k} = \underline{x}_{k} = (x_{k, 1}, \dots, x_{k, m}) \in S$ .

Choose $J$ uniformly at random from ${1, 2, \dots, m}$ .
Sample $Y$ from the conditional distribution $π_{J} (\cdot ∣ x_{k, 1}, \dots, x_{k, J - 1}, x_{k, J + 1}, \dots, x_{k, m})$
Set $X_{k + 1} \leftarrow (x_{k, 1}, \dots, x_{k, J - 1}, Y, x_{k, J + 1}, \dots, x_{k, m})$ .

Algorithmic Implementation (Discrete State Space)

For a discrete system like the Ising Model where $σ_{j} \in {- 1, + 1}$ , Step 2 is implemented by calculating the explicit scalar probabilities for the two possible states of the $J$ -th coordinate.

Assume $X_{k} = \underline{σ} \in S$ .

Set $α_{+}$ as the probability of the $J$ -th spin being $+ 1$ :

α_{+} = π_{J} (+ 1 ∣ σ_{- J}) = \frac{π ( σ ^{J, + 1} )}{π ( σ ^{J, - 1} ) + π ( σ ^{J, + 1} )}

Set $α_{-}$ as the probability of the $J$ -th spin being $- 1$ :

α_{-} = π_{J} (- 1 ∣ σ_{- J}) = \frac{π ( σ ^{J, - 1} )}{π ( σ ^{J, - 1} ) + π ( σ ^{J, + 1} )}

Note $α_{+} + α_{-} = 1$ . The partition function $Z$ cancels out in these fractions.

Branch execution:

Set $X_{k + 1} \leftarrow \underline{σ}^{J, + 1}$ with probability $α_{+}$ .
Set $X_{k + 1} \leftarrow \underline{σ}^{J, - 1}$ with probability $α_{-}$ .

Here, this requires a discrete state space, where $σ_{j} \in {- 1, + 1}$ .

Example 2

Each “spin” is in ${- 1, + 1}$ . Set $J$ th spin conditional on the value of every other spin. For example, let $m = 2$ , such that

π (- 1, - 1) = \frac{1}{Z}, π (+ 1, - 1) = \frac{2}{Z}, π (- 1, + 1) = \frac{3}{Z}, π (+ 1, + 1) = \frac{5}{Z}

Suppose $X_{k} = (- 1, - 1)$ and $J = 2$ . Then

\underline{σ}^{J, + 1} \underline{σ}^{J, - 1} = (- 1, + 1) = (- 1, - 1)

such that

α_{+} α_{-} = \frac{3}{1 + 3} = \frac{3}{4} = \frac{1}{1 + 3} = \frac{1}{4}

We can see that $α_{+} + α_{-} = 1$ . Then,

P (X_{k + 1} = (- 1, + 1) ∣ X_{k} = (- 1, - 1)) = \frac{1}{2} \cdot \frac{3}{4} = \frac{3}{8}

where the first product term is the probability of selecting particle $2$ , or $1/ m$ , and the second is $α_{+}$ .

Example 3

Assume we have the same setup as Example 2. Then the conditional distribution $π_{2}$ is

π_{2} (- 1 ∣ + 1) = P (entry 2 = - 1 ∣ entry 1 = + 1) = \frac{π ( + 1 , - 1 )}{π ( + 1 , - 1 ) + π ( + 1 , + 1 )} = \frac{2}{7}

Likewise,

π_{2} (+ 1 ∣ + 1) = \frac{π ( + 1 , + 1 )}{π ( + 1 , - 1 ) + π ( + 1 , + 1 )} = \frac{5}{7}

Remarks:

$π_{2} (\cdot ∣ + 1)$ is a PMF on ${- 1, + 1}$ , just like $π_{1} (\cdot ∣ - 1), π_{1} (\cdot ∣ + 1), π_{2} (\cdot ∣ - 1)$
Conditional distributions do not require knowledge on the normalization $Z$ .

General Case of the Gibbs Sampler

The goal of this algorithm is to sample from complex, high-dimensional probability distributions that are difficult to compute directly.

Ising model	$⇝$	General case
$\underline{σ} = (σ_{1}, \dots, σ_{m})$	$⇝$	$\underline{x} = (x_{1}, \dots, x_{m})$
$σ_{j} \in {- 1, + 1}$	$⇝$	$x_{j} \in R$
$π (\underline{σ}) = π (σ_{1}, \dots, σ_{m})$	$⇝$	$f (\underline{x}) = f (x_{1}, \dots, x_{m})$ (unknown normalization)
$π (σ_{j} ∣ σ_{1}, \dots, σ_{j - 1}, σ_{j + 1}, \dots, σ_{m})$	$⇝$	$f (x_{j} ∣ x_{1}, \dots, x_{j - 1}, x_{j + 1}, \dots, x_{m})$
PMF on $σ_{j} \in {- 1, + 1}$	$⇝$	PDF on $x_{j} \in R$

The conditional distribution does not require the global normalization constant. The continuous conditional distribution is computed as:

f_{j} (y ∣ x_{1}, \dots, x_{m - 1}) = \frac{f ( x _{1} , \dots , x _{j - 1} , y , x _{j + 1} , \dots , x _{m} )}{\int _{R} f ( x _{1} , \dots , x _{j - 1} , y , x _{j + 1} , \dots , x _{m} ) d y}

So $f_{j} (\cdot ∣ x_{1}, \dots, x_{m - 1})$ is a PDF on $R$ . Note that the denominator is an integral, in the same way that in the discrete space, we sum the possibilities for that flip. This denotes the space of all possibilities that $x_{j}$ can become. In our case, we care about when $P (x_{j} = y)$ specifically.

Example 4

Let

f (x, y) = c exp (- (x^{2} + 1) (y^{2} + 1)), (x, y) \in R^{2}

$c$ is a normalization, so

\int_{R^{2}} c exp (- (x^{2} + 1) (y^{2} + 1)) d x d y = 1

What are the conditional distributions? We see that

f_{1} (x ∣ y) = \frac{c exp ( - ( x ^{2} + 1 ) ( y ^{2} + 1 ) )}{\int _{R} f ( x , y ) d x} = c_{1} (y) exp (- (y^{2} + 1) x^{2})

by treating $y$ as a constant and expanding out $f (x, y)$ . Here, we apply the continuous conditional distribution from General Case of the Gibbs Sampler. Then, $c_{1} (y)$ is such that

\int_{- \infty}^{\infty} c_{1} (y) exp (- (y^{2} + 1) x^{2}) d x = 1

for all $y \in R$ . In other words, $c_{1} (y)$ is the normalization of the PDF

f_{1} (x ∣ y) \propto exp (- (y^{2} + 1) x^{2})

Recall that $Z \sim N (0, σ^{2})$ has a PDF of

f_{Z} (z) \propto exp (\frac{- 1}{2 σ ^{2}} z^{2})

This implies that $f_{1} (x ∣ y)$ is the PDF of $N (0, \frac{1}{2 ( y ^{2} + 1 )})$ . Similarly, $f_{2} (y ∣ x)$ is the PDF of $N (0, \frac{1}{2 ( x ^{2} + 1 )})$ .

Algorithm (Random-Scan Gibbs Via Normal Distribution)

The previous examples gives the following insight: We can model the probabilities as part of the Normal distribution.

Given $(X_{k}, Y_{k}) \in R^{2}$ .

Choose $J$ uniformly at random from ${1, 2}$ . Note that we are in $R^{2}$ .
Generate $Z \sim N (0, 1)$ . Recall Box-Muller Method.
If $J = 1$ , set $W = \frac{1}{2 ( Y _{k}^{2} + 1 )} Z (i.e. W \sim f_{1} (\cdot ∣ Y_{k}))$ and set $(X_{k + 1}, Y_{k + 1}) \leftarrow (W, Y_{k})$ .
If $J = 2$ , set $W = \frac{1}{2 ( X _{k}^{2} + 1 )} Z (i.e. W \sim f_{2} (\cdot ∣ X_{k}))$ and set $(X_{k + 1}, Y_{k + 1}) \leftarrow (X_{k}, W)$ .

Algorithm (Systematic-Scan Gibbs Sampler)

Given $\underline{X}_{k} \in R^{m}$ :

Sample $Y_{1}$ from $f_{1} (\cdot ∣ X_{k, 2}, \dots, X_{k, m})$
For $i = 2$ to $m$ : Sample $Y_{i}$ from $f_{i} (\cdot ∣ \in R^{m - 1} Y_{1}, \dots, Y_{i - 1}, X_{k, i + 1}, \dots, X_{k, m})$
Set $\underline{X}_{k + 1} = (Y_{1}, \dots, Y_{m}) \in R^{m}$

Output: $\underline{X}_{0}, \underline{X}_{1}, \dots \in R^{m}$

Remarks:

The Systematic-Scan chooses coordinates $j = 1, 2, \dots, m$ in strict order.
The Random-Scan Gibbs Sampler is a special case of the Metropolis algorithm.

Metropolis Algorithm (Random-Walk Sampler)

We can extend the Metropolis Algorithm.

Goal: Sample from target PDF $f (\underline{x})$ where $\underline{x} \in R^{m}$ .
Given $\underline{X}_{k} \in R^{m}$ ,

Generate $\underline{Z} = (Z_{1}, \dots, Z_{m}) \in R^{m}$ where $Z_{1}, \dots, Z_{m} \sim N (0, 1)$ iid.

I.e., $\underline{Z} \sim N (\underline{0}, Σ)$ is Multivariate Normal. This is also our “noise”, or a random directional perturbation.
Set proposal state $\underline{Y} = \underline{X}_{k} + \underline{Z}$ .

The proposal transition matrix $q (\underline{X}_{k}, \cdot)$ is a PDF of $N (\underline{X}_{k}, I_{m})$
Set $α = min (1, \frac{f ( Y )}{f ( X _{k} )})$ and generate Uniform $U \sim U [0, 1]$ .
- Set $\underline{X}_{k + 1} \leftarrow \underline{Y}$ if $U \leq α$ .
- Set $\underline{X}_{k + 1} \leftarrow \underline{X}_{k}$ if $U < α$ .

Remarks:

The output $\underline{X}_{0}, \underline{X}_{1}, \dots$ is a Markov Chain on a continuous state space $R^{m}$ .
It is “sensitive” to the typical “jump” size $\underline{Z}$ .
What if happens if we take jumps $ℓ Z$ and let $ℓ \to 0$ ? We get SDEs!
$ℓ$ is the spatial exploration rate of th random walk.

Remarks

We first define the Ising Model and use Glauber Dynamics algorithm to figure out how to sample from these difficult distributions. This is the most restricted class of MCMC: a discrete-space, single-coordinate update mechanism.
We generalize this via the Metropolis Algorithm to generalize transitions by permitting global state proposals via a transition matrix $Q$ . At first, we let the transitions be symmetric, to show detailed balance.
We further generalized this to show asymmetry via the Metropolis Hastings Algorithm by having a ratio to penalize directional bias in the proposal distribution, forcing detailed balance.
We expand the problem domain from discrete states ${- 1, + 1}$ to the infinite continuous domain of $R$ . The target PMF $π (\underline{σ})$ becomes PDF $f (\underline{x})$ . We generalize the discrete scalar fractions with continuous integrals to compute the conditional $f_{j} (y ∣ x_{- j})$ . The core mechanism is to update one dimension at a time while holding all others constant to bypass the global normalization integral.
We can generalize to multidimensional continuous transitions using additive noise in Metropolis Algorithm (Random-Walk Sampler). The proposal mechanism becomes a continuous Gaussian transition kernel.
Finally, we can further generalize this to continuous time steps by introducing the scaling factor $ℓ$ to modify the variance of the Gaussian proposal jump to $ℓ \underline{Z}$ . By taking the limit $ℓ \to 0$ and the discrete time step interval $Δ t \to 0$ proportionally, the discrete Markov Chain converges into a continuous-time mathematical model. This leads us to Stochastic Differential Equations (SDEs) over $t \in R^{+}$ .

Explorer

kyle's notes

Gibbs Sampler

Example 1 (Glauber Dynamics for Ising Model)

Glauber Dynamics (Special Case of Gibbs Sampler)

Glauber Dynamics (Random-scan Gibbs sampler)

Algorithmic Implementation (Discrete State Space)

Example 2

Example 3

General Case of the Gibbs Sampler

Example 4

Algorithm (Random-Scan Gibbs Via Normal Distribution)

Algorithm (Systematic-Scan Gibbs Sampler)

Metropolis Algorithm (Random-Walk Sampler)

Remarks

Table of Contents

Graph View

Backlinks