Example 1 (Stationary Distributions)
Consider a state Markov Chain with .
and etc. Then
where is the stationary distribution.
Proof:
Let . Note that . Then
which implies that
from Proposition (Stochastic Properties). We get
Example 2 (Starting Distributions Don’t Matter)
Suppose we have states and and
With being defined as usual. Then
where .
Set . Then such that
Because the sum of the starting distribution must equal , it does not matter what distribution we start from. We will always return to stationary vector .
Remark
- For any state , the probability of a system being that state at time approaches a fixed value as goes to infinity. Essentially, after enough time, the specific starting state is irrelevant.
- Once the distribution reaches the long-term state , it stays there. The distribution is invariant under the transition matrix . So we get that is a row eigenvector of .
Theorem (Average Transition Behavior)
For a Markov Chain,
We are essentially asking:
If we simulated all possible Markov Chains from state , what fraction of them would be in state on average?
The idea is that even if we bounce around , the average of those probabilities settles to .
Proof:
as . So, the mean fraction of the time spent in state is shown by .
Theorem (Time Average of the Indicator)
What if we ran every Markov Chain, but every time we reached state at some time , we counted it. What fraction of the time do we spend on state ? More formally, what is
where
If then this value converges to for all .
Theorem (Law of Large Numbers for Markov Chains)
Instead of counting visits, we can count the “reward” of visiting specific states.
Remarks (Markov Chain Computation)
- We can view Markov Chains as evolutions of probability vectors (PMFs): . But this is hard to compute for large state spaces (when is large).
- Markov Chains are a random sequence of states , which is often easier to simulate.
- Recall from Discrete Case that we can generate discrete random variables where for and .
Algorithm (Generate/Simulate Markov Chains)
Input: Transition matrix and an initial distribution .
Output: A Markov Chain where .
- Generate .
- For to :
- Generate from , the PMF over given by the ‘th row of .
- Repeat this times to get a sample from .
The key idea is to generate the distribution by simulating Markov Chain with a stationary distribution . Often is known up to a normalizing constant. By Theorem (Average Transition Behavior), Theorem (Time Average of the Indicator), and Theorem (Law of Large Numbers for Markov Chains) this algorithm converges to .
Remark (Discreteness)
This algorithm also works for discrete time and states. Let be discrete random variables, e.g. . Then the target PMF of is
where
Stationary distribution has a special PMF form:
for all , where are known and is an unknown normalization. In particular,
You can think of as the “score” or weight of each state. If , then state is more likely than state . is how we normalize the PMF .
In practice, is hard to compute, as it is the sum of terms. is called the partition function in statistical physics.