Markov Chains: Stochastic Dynamics and Steady States

A Markov Chain is a stochastic process that satisfies the **Markov Property**: the future state depends only on the current state, and not on the sequence of events that preceded it. This "memoryless" property allows for the modeling of complex systems through the lens of linear algebra and high-dimensional geometry.

---

1. Quantitative Foundations: Matrices and Measures

A discrete-time Markov chain is defined by a state space $S$ and a **Transition Matrix** $P$.

1.1. The Transition Matrix ($P$)

If $P_{ij}$ is the probability of moving from state $i$ to state $j$:

$$P_{ij} = P(X_{n+1} = j \mid X_n = i)$$

Every row of $P$ must sum to 1 (it is a right-stochastic matrix).

1.2. Multi-Step Prediction

To find the probability distribution after $k$ steps, we take the initial distribution vector $\pi_0$ and multiply by the $k$-th power of the transition matrix:

$$\pi_k = \pi_0 P^k$$

---

2. Spatial and Geometric Intuition

2.1. The Unit Simplex: The Geometry of Distribution

All possible probability distributions for a system with $n$ states live on an **$(n-1)$-dimensional unit simplex**.

* In 3D, this is the flat triangle connecting $(1,0,0)$, $(0,1,0)$, and $(0,0,1)$.

* **Intuition:** Each application of the matrix $P$ is a linear transformation that "squashes" this simplex. Over time, for an ergodic chain, the entire simplex collapses toward a single point.

2.2. The Fixed Point: The Steady State ($\pi$)

The steady state (stationary distribution) is the vector $\pi$ that satisfies:

$$\pi P = \pi$$

**Linear Algebra Intuition:** $\pi$ is the **left-eigenvector** of $P$ corresponding to the eigenvalue $\lambda = 1$.

* The "speed" at which the system reaches equilibrium is governed by the **spectral gap**: the difference between the largest eigenvalue ($\lambda_1 = 1$) and the second largest eigenvalue ($|\lambda_2|$). A smaller $\lambda_2$ means faster convergence (mixing).

2.3. Random Walks as Flow

On an undirected graph, a random walk is a Markov chain. The steady state probability of being at a node is geometrically proportional to its **degree** (number of connections):

$$\pi(v) = \frac{\text{deg}(v)}{2|E|}$$

Higher degree nodes act as "gravity wells" or "hubs" for the random walk.

---

3. Hidden Markov Models (HMMs): The Trellis Geometry

In an HMM, the states are "hidden," and we only observe noisy emissions.

3.1. The Trellis and Shortest Paths

We visualize HMMs as a **trellis diagram**—a Directed Acyclic Graph (DAG) where time flows horizontally and states are vertical.

* **Viterbi Algorithm:** Finding the most likely hidden sequence is mathematically equivalent to finding the **shortest path** through the trellis in **tropical geometry** (min-plus algebra).

3.2. Acoustic Corridors (Speech Recognition)

In speech recognition, phonemes are modeled as 3-state HMMs. Geometrically, these define a **high-dimensional corridor** in the acoustic feature space. If a speaker's audio trajectory stays within the "corridor" of a word's HMM, that word is triggered.

---

4. Real-World Applications

4.1. Google PageRank (The Damped Walk)

PageRank models the web as a massive Markov chain.

* **Transition:** Following links ($1/\text{outdegree}$).

* **Damping:** A "teleport" factor (usually 0.15) that allows the "surfer" to jump to a random page.

* **Ranking:** The PageRank of a page is simply its value in the stationary distribution $\pi$.

4.2. MCMC (Markov Chain Monte Carlo)

MCMC is used to sample from distributions that are too complex to integrate directly (e.g., in Bayesian physics or genetics).

* **Method:** We *construct* a Markov chain whose stationary distribution is exactly the target distribution. By running the chain long enough ("burn-in"), we eventually start generating samples that represent the target "volume" in high-dimensional space.

4.3. Queueing Theory (M/M/1)

Markov chains model the length of a queue.

* **States:** Number of customers ($0, 1, 2, \dots$).

* **Transition:** Arrival rate $\lambda$ (+1) and service rate $\mu$ (-1).

* **Stability:** If $\lambda < \mu$, the chain has a stationary distribution, giving the expected wait time and system utilization.

---

5. Quantitative Summary Table

| Concept | Matrix/Algebra View | Geometric Meaning |

| :--- | :--- | :--- |

| **State Vector** | $\pi \in \mathbb{R}^n, \sum \pi_i = 1$ | A point on the probability simplex. |

| **Ergodicity** | Irreducible + Aperiodic | Every state can eventually "see" every other. |

| **Mixing Time** | $t_{mix} \approx 1/(1 - |\lambda_2|)$ | How fast the simplex collapses to a point. |

| **Forward-Backward** | $\alpha_t(i) \beta_t(i)$ | The intersection of "past" and "future" information flows. |

| **Bakis Model** | Upper triangular $P$ | A "one-way" path through time (common in speech). |

6. Worked Example: The Gambler's Ruin

Consider a gambler with $\$10$ playing a game with $\$1$ stakes and a 50% win probability. The "house" has infinite money.

1. **Transition:** $P(n \to n+1) = 0.5, P(n \to n-1) = 0.5$.

2. **Absorption:** The state $0$ (ruin) is an **absorbing state**.

3. **Result:** Since the house is infinite, the gambler's walk is a Markov chain that eventually (with probability 1) hits the absorbing boundary at 0. This illustrates that without a "teleport" or damping factor, some chains collapse into "dead ends."

See Also

- [Probability Theory](ProbabilityTheory)

- [Bayesian Reasoning](BayesianReasoning)

- [Graph Theory Deep Dive](GraphTheoryDeepDive)

- [Information Theory](InformationTheory)