Example 1 (Stationary Distributions)

Consider a state Markov Chain with .

and etc. Then

where is the stationary distribution.

Proof:

Let . Note that . Then

which implies that

from Proposition (Stochastic Properties). We get

Example 2 (Starting Distributions Don’t Matter)

Suppose we have states and and

With being defined as usual. Then

where .

Set . Then such that

Because the sum of the starting distribution must equal , it does not matter what distribution we start from. We will always return to stationary vector .

Remark

  1. For any state , the probability of a system being that state at time approaches a fixed value as goes to infinity. Essentially, after enough time, the specific starting state is irrelevant.
  2. Once the distribution reaches the long-term state , it stays there. The distribution is invariant under the transition matrix . So we get that is a row eigenvector of .

Theorem (Average Transition Behavior)

For a Markov Chain,

We are essentially asking:

If we simulated all possible Markov Chains from state , what fraction of them would be in state on average?

The idea is that even if we bounce around , the average of those probabilities settles to .

Proof:

as . So, the mean fraction of the time spent in state is shown by .

Theorem (Time Average of the Indicator)

What if we ran every Markov Chain, but every time we reached state at some time , we counted it. What fraction of the time do we spend on state ? More formally, what is

where

If then this value converges to for all .

Theorem (Law of Large Numbers for Markov Chains)

Instead of counting visits, we can count the “reward” of visiting specific states.

Remarks (Markov Chain Computation)

  • We can view Markov Chains as evolutions of probability vectors (PMFs): . But this is hard to compute for large state spaces (when is large).
  • Markov Chains are a random sequence of states , which is often easier to simulate.
  • Recall from Discrete Case that we can generate discrete random variables where for and .

Algorithm (Generate/Simulate Markov Chains)

Input: Transition matrix and an initial distribution .

Output: A Markov Chain where .

  1. Generate .
  2. For to :
    1. Generate from , the PMF over given by the ‘th row of .
  3. Repeat this times to get a sample from .

The key idea is to generate the distribution by simulating Markov Chain with a stationary distribution . Often is known up to a normalizing constant. By Theorem (Average Transition Behavior), Theorem (Time Average of the Indicator), and Theorem (Law of Large Numbers for Markov Chains) this algorithm converges to .

Remark (Discreteness)

This algorithm also works for discrete time and states. Let be discrete random variables, e.g. . Then the target PMF of is

where

Stationary distribution has a special PMF form:

for all , where are known and is an unknown normalization. In particular,

You can think of as the “score” or weight of each state. If , then state is more likely than state . is how we normalize the PMF .

In practice, is hard to compute, as it is the sum of terms. is called the partition function in statistical physics.