The goal is to construct a Markov Chain ${X_{k}}$ with a stationary distribution $π$ .

Metropolis Algorithm

First, we construct the proposal transition matrix $Q = [q (i, j)]$ where $Q$ is symmetric. Then, choose an initial state $X_{0} \in S$ randomly. Given $X_{k} = i \in S$ ,

Sample $Y \in S$ from $q (i, -)$ , i.e. $P (Y = y ∣ X_{k} = i) = q (i, y)$ for all $y \in S$ . We assume $Y = j$ is observed.
Let $α = α_{ij} = min (1, \frac{π ( j )}{π ( i )}) \in [0, 1]$ and generate $U \sim U [0, 1]$ .
If $U \leq α$ then accept $Y$ and $X_{k + 1} \leftarrow Y$ .
If $U > α$ then reject $Y$ and $X_{k + 1} \leftarrow X_{k}$ .

This means we accept $Y$ with probability $α$ . The output is a sequence $X_{0}, X_{1}, \dots$

Remarks About Computation

${X_{n}}_{n = 0}^{\infty}$ is a Markov Chain with $lim_{n \to \infty} X_{n} \sim π$ .
We can choose $Q$ to be simple, like uniform jumps, where $q (i, j) = 1/ N$ .
We only need to now the ratio $π (j) / π (i)$ , so we don’t need to know the normalization value $Z$ .
$Y$ is a proposal state.
In practice, $α = α_{ij}$ is easy to calculate.
This algorithm works for non-symmetric $Q$ and $α = α_{ij} = min (1, \frac{π ( j ) q ( j , i )}{π ( i ) q ( i , j )})$

Example 1 (1D Ising Model)

Recall the Ising Model, where we have $m$ particles each with spin $+$ or $-$ . So,

S = {σ = (σ_{1}, \dots, σ_{m}) ∣ σ_{i} = \pm 1, 1 \leq i \leq m} = {- 1, + 1}^{m}

or the set of length $m$ strings of alphabet $- 1, + 1$ . Obviously $∣ S ∣ = 2^{m}$ .

Let

H (σ) = - J i = 1 \sum m - 1 σ_{i} σ_{i + 1} - h i = 1 \sum m σ_{i}

where $h > 0$ ¹. $H (σ)$ represents the “energy” of state $σ$ .

Our target distribution² is

π_{β} (σ) = \frac{1}{Z _{β}} exp (- β H (σ)) \forall σ \in S

where $β > 0$ is the inverse temperature $(β = 1/ k_{β} T)$ , and $Z_{β}$ is normalization (partition function). This is hard to compute.

Note that a large $π_{β} (σ) ⟺$ small $H (σ) ⟺$ many $σ_{j} = + 1$ and many

σ_{j} σ_{j + 1} = {> 0 < 0 if J > 0 if J < 0

Here, we can apply the Metropolis Algorithm. Choose $Q$ given by the random spin flips. For example, if $m = 2$ :

"\\usepackage{tikz-cd}\n\\usetikzlibrary{arrows.meta,calc}\n\\tikzset{curve/.style={settings={#1},to path={(\\tikztostart) .. controls ($(\\tikztostart)!\\pv{pos}!(\\tikztotarget)!\\pv{height}!270:(\\tikztotarget)$) and ($(\\tikztostart)!1-\\pv{pos}!(\\tikztotarget)!\\pv{height}!270:(\\tikztotarget)$) .. (\\tikztotarget)\\tikztonodes}}, settings/.code={\\tikzset{quiver/.cd,#1}\\def\\pv##1{\\pgfkeysvalueof{/tikz/quiver/##1}}}, quiver/.cd,pos/.initial=0.35,height/.initial=0}\n\\begin{document}\n% https://q.uiver.app/#q=WzAsNCxbMSwwLCIrKyJdLFsyLDEsIistIl0sWzEsMiwiLS0iXSxbMCwxLCItKyJdLFszLDAsIjEvMiIsMSx7ImN1cnZlIjoxfV0sWzAsMywiMS8yIiwyLHsiY3VydmUiOjF9XSxbMSwwLCIxLzIiLDEseyJjdXJ2ZSI6LTF9XSxbMCwxLCIxLzIiLDAseyJjdXJ2ZSI6LTF9XSxbMiwxLCIxLzIiLDEseyJjdXJ2ZSI6LTF9XSxbMSwyLCIxLzIiLDAseyJjdXJ2ZSI6LTF9XSxbMiwzLCIxLzIiLDEseyJjdXJ2ZSI6MX1dLFszLDIsIjEvMiIsMix7ImN1cnZlIjoxfV1d\n\\begin{tikzcd}\n\t& {++} \\\\\n\t{-+} && {+-} \\\\\n\t& {--}\n\t\\arrow[\"{1/2}\"', curve={height=6pt}, from=1-2, to=2-1]\n\t\\arrow[\"{1/2}\", curve={height=-6pt}, from=1-2, to=2-3]\n\t\\arrow[\"{1/2}\"{description}, curve={height=6pt}, from=2-1, to=1-2]\n\t\\arrow[\"{1/2}\"', curve={height=6pt}, from=2-1, to=3-2]\n\t\\arrow[\"{1/2}\"{description}, curve={height=-6pt}, from=2-3, to=1-2]\n\t\\arrow[\"{1/2}\", curve={height=-6pt}, from=2-3, to=3-2]\n\t\\arrow[\"{1/2}\"{description}, curve={height=6pt}, from=3-2, to=2-1]\n\t\\arrow[\"{1/2}\"{description}, curve={height=-6pt}, from=3-2, to=2-3]\n\\end{tikzcd}\n\\end{document}"

source code

we get the following. If $S = {+ +, + -, - +, - -}$ , then the transition matrix is:

P = 0 1/2 1/2 0 1/2 00 1/2 1/2 00 1/2 0 1/2 1/2 0

which is symmetric. Choose $X_{0} \in S$ given $X_{k} = (σ_{1}, \dots, σ_{m}) \in S$ . Some notes,

$J$ represents the interaction strength between particles. It can be treated as a constant.
$h$ is the external magnetic field on the particles. We can treat it as a constant.

Algorithm:

Choose a site $i$ for $1 \leq i \leq m$ uniformly at random among $1, 2, \dots, m$ .
Flip the $i$ th site $σ_{i} \to - σ_{i}$ to obtain the proposal state: $\overset{σ}{^} = (σ_{1}, \dots, σ_{i - 1}, - σ_{i}, σ_{i + 1}, \dots, σ_{m}) \in S$ Note that $q (σ, \overset{σ}{^}) = q (\overset{σ}{^}, σ) = 1/ m$ . This shows $Q$ is symmetric.
Compute $\frac{π ( σ ^ )}{π ( σ )} = exp [- β (H (\overset{σ}{^}) - H (σ))] = exp (- β Δ H)$ Recall that we do this to avoid the partition function.
Three cases for $Δ H$ :
1. If $1 < i < m$ , (we picked a particle in the middle of the chain) then $H (\overset{σ}{^}) - H (σ) = - J (σ_{i - 1} (- σ_{i}) - σ_{i} σ_{i + 1}) + h σ_{i} - [- J (σ_{i - 1} σ_{i} + σ_{i} σ_{i + 1}) - h σ_{i}] = 2 J σ_{i} (σ_{i - 1} + σ_{i + 1}) + 2 h σ_{i}$
2. If $i = 1$ (the left boundary), then $H (\overset{σ}{^}) - H (σ) = - J (- σ_{1}) σ_{2} + h σ_{1} - [- J σ_{1} σ_{2} - h σ_{1}] = 2 J σ_{1} σ_{2} + 2 h σ_{1}$
3. If $i = m$ (the right boundary), similarly $H (\overset{σ}{^}) - H (σ) = 2 J σ_{m - 1} σ_{m} + 2 h σ_{m}$ Thus, $Δ H = H (\overset{σ}{^}) - H (σ) = ⎩ ⎨ ⎧ 2 J σ_{i} (σ_{i - 1} + σ_{i + 1}) + 2 h σ_{i} 2 J σ_{1} σ_{2} + 2 h σ_{1} 2 J σ_{m - 1} σ_{m} + 2 h σ_{m} 1 < i < m i = 1 i = m$
4. We can then accept $\overset{σ}{^}$ with probability $α = min (1, \frac{π ( σ ^ )}{π ( σ )}) = min (1, e^{- β Δ H})$
5. Thus,
  1. If $Δ H \leq 0$ then $α = 1$ and set $X_{k + 1} \leftarrow \overset{σ}{^}$ . (Here, the system lost energy, i.e. became more stable, so the second term is positive and thus greater than $1$ . We will accept).
  2. If $Δ H > 0$ then $α = exp (- β Δ H)$ . (This is the opposite of above.)
    1. Generate $U \sim U [0, 1]$ .
    2. If $U \leq exp (- β Δ H)$ , accept $\overset{σ}{^}$ and $X_{k + 1} \leftarrow \overset{σ}{^}$
    3. If $U > exp (- β Δ H)$ , reject $\overset{σ}{^}$ and $X_{k + 1} \leftarrow σ$ .

Why does it work?

In step 2, we do not calculate $H$ since it is computationally inefficient. However, we can calculate the change, $Δ H$ . Every term in $H (σ)$ cancels out but the changed term. This means we only need to find the sum of a few terms.
The Ising model only involves interactions with the nearest neighbors. This is why we see changes in $σ_{i - 1}$ and $σ_{i + 1}$ , reducing the computations needed from $O (m)$ to $O (1)$ .

Theorem (Metropolis Satisfies Detailed Balance)

The transition matrix $P = [p (i, j)]$ of the Metropolis Chain is $p (i, j) = ⎩ ⎨ ⎧ q (i, j) \cdot min (1, \frac{π ( j )}{π ( i )}) 1 - \sum_{k \in S ∖ {i}} q (i, k) \cdot min (1, \frac{π ( j )}{π ( i )}) i \neq = j i = j$
1. The idea in case $1$ is that we propose moving from $i \to j$ with probability $q (i, j)$ in Metropolis, but we accept this movement $α (i, j)$ times. This is just the probability of these two events happening together.
2. In case $2$ , accounts for the probability of rejection. Indeed, $p (i, i)$ is the sum of chances we tried to move but failed.
$π$ and $P$ satisfy detailed balance: $π (i) \cdot p (i, j) = π (j) \cdot p (j, i)$ for all $i, j \in S$ . (Hence $π$ is a stationary distribution.)
If $\forall i, j \in S, q (i, j) > 0$ and $\forall i \in S, p (i, i) > 0$ , then $π$ is unique, and $n \to \infty lim π_{0} P^{n} = π$

Proof:

$(1)$ : For $i \neq = j,$

p (i, j) = P (propose j ∣ i) \cdot P (accept j ∣ proposed j) = q (i, j) \cdot min (1, \frac{π ( j )}{π ( i )})

Then since $P$ is a stochastic matrix, for any state $i$ , the sum $\sum_{j \in S} p (i, j) = 1$ by law of total probability. This implies

p (i, i) = 1 - k \in S k \neq = i \sum p (i, k) = 1 - k \in S k \neq = i \sum q (i, k) \cdot min (1, \frac{π ( k )}{π ( i )})

which represents the “rejection probability”. The chain stays at state $i$ if either

The proposal $q (i, i)$ suggests staying at $i$ ,
or if the the proposal $j \neq = i$ was made, but was rejected with probability $1 - α (i, j)$ .

$(2)$ : We want to show that

\forall i, j \in S, π (i) \cdot p (i, j) = π (j) \cdot p (j, i)

Trivially, this is true for $i = j$ . Suppose $i \neq = j$ . Then

π (i) \cdot p (i, j) = π (i) \cdot q (i, j) \cdot min (1, \frac{π ( j )}{π ( i )}) = q (i, j) \cdot min (π (i), π (j)) = q (j, i) \cdot min (π (j), π (i)) = π (j) \cdot q (j, i) \cdot min (1, \frac{π ( i )}{π ( j )}) = π (j) \cdot p (j, i)

However, this is only true because we let $q (i, j) = q (j, i)$ , i.e. the matrix $Q$ must be symmetric. Thus Metropolis satisfies detailed balance, and $π$ is stationary.

$(3)$ : The condition $q (i, j) > 0$ ensures the chain is irreducible (any state can reach any other in one step). The condition $p (i, i) > 0$ ensures the chain is aperiodic. An irreducible, aperiodic chain on a finite state space is regular. By the FTMC, a regular matrix has a unique stationary distribution $π$ and $lim_{n \to \infty} π_{0} P^{n} = π$ .

Remark (Metropolis on Ising Model)

Using Metropolis on the Ising Model, we had

S = {σ = (σ_{1}, \dots, σ_{m}), σ_{j} \in {+ 1, - 1}}

where sign flips would give us

(σ_{1}, \dots, σ_{j}, \dots, σ_{m}) \mapsto (σ_{1}, \dots, - σ_{j}, \dots, σ_{m})

which defines our proposal probability as

q (σ, σ^{'}) = {1/ m 0 σ \to σ^{'} is 1 flip otherwise

We can show that $q$ gives a regular Markov Chain. The idea is that any $2$ states in $S$ are at most $m$ flips apart.

Metropolis-Hastings Algorithm

We can run Metropolis with an asymmetric proposal matrix $Q$ . The purpose of this is to relax the condition that $Q$ must be symmetric in Theorem (Metropolis Satisfies Detailed Balance). This allows us to correct the model over-sampling states frequently proposed by $Q$ regardless of the actual probability in the target distribution $π$ .

Given $X_{k} \in S$ , choose $Y$ from $q (X_{k}, \cdot)$ .
Accept $Y$ with probability

α = min (1, \frac{π ( Y )}{π ( X _{k} )} \cdot \frac{q ( Y , X _{k} )}{q ( X _{k} , Y )})

The output is a Markov Chain of $X_{0}, X_{1}, \dots \in S$ .

Convergence is hard to prove. Heuristically, consider that $R (n) = Cov (X_{n}, X_{0})$ . So as $n \to \infty$ , $R (n) \to 0$ .

This is the Hamiltonian. ↩
This is the Boltzmann Distribution. ↩

Explorer

kyle's notes

Metropolis Algorithm

Metropolis Algorithm

Remarks About Computation

Example 1 (1D Ising Model)

Theorem (Metropolis Satisfies Detailed Balance)

Remark (Metropolis on Ising Model)

Metropolis-Hastings Algorithm

Table of Contents

Graph View

Backlinks

Explorer

Metropolis Algorithm

Metropolis Algorithm

Remarks About Computation

Example 1 (1D Ising Model)

Theorem (Metropolis Satisfies Detailed Balance)

Remark (Metropolis on Ising Model)

Metropolis-Hastings Algorithm

Footnotes

Table of Contents

Graph View

Backlinks