Fundamental Limit Theorem for Regular Chains

The fundamental limit theorem for regular Markov chains states that if [math]\mat{P}[/math] is a regular transition matrix then

[[math]] \lim_{n \to \infty} \mat {P}^n = \mat {W}\ , [[/math]]

where [math]\mat{W}[/math] is a matrix with each row equal to the unique fixed probability row vector [math]\mat{w}[/math] for [math]\mat{P}[/math]. In this section we shall give two very different proofs of this theorem.

Our first proof is carried out by showing that, for any column vector [math]\mat{y}[/math], [math]\mat{P}^n \mat {y}[/math] tends to a constant vector. As indicated in Ergodic Markov Chains, this will show that [math]\mat{P}^n[/math] converges to a matrix with constant columns or, equivalently, to a matrix with all rows the same.

The following lemma says that if an [math]r[/math]-by-[math]r[/math] transition matrix has no zero entries, and [math]\mat {y}[/math] is any column vector with [math]r[/math] entries, then the vector [math]\mat {P}\mat{y}[/math] has entries which are “closer together” than the entries are in [math]\mat {y}[/math].

Lemma

Let [math]\mat{P}[/math] be an [math]r[/math]-by-[math]r[/math] transition matrix with no zero entries. Let [math]d[/math] be the smallest entry of the matrix. Let [math]\mat{y}[/math] be a column vector with [math]r[/math] components, the largest of which is [math]M_0[/math] and the smallest [math]m_0[/math]. Let [math]M_1[/math] and [math]m_1[/math] be the largest and smallest component, respectively, of the vector [math]\mat {P} \mat {y}[/math]. Then

[[math]] M_1 - m_1 \leq (1 - 2d)(M_0 - m_0)\ . [[/math]]

Show Proof

In the discussion following Theorem, it was noted that each entry in the vector [math]\mat {P}\mat{y}[/math] is a weighted average of the entries in [math]\mat {y}[/math]. The largest weighted average that could be obtained in the present case would occur if all but one of the entries of [math]\mat {y}[/math] have value [math]M_0[/math] and one entry has value [math]m_0[/math], and this one small entry is weighted by the smallest possible weight, namely [math]d[/math]. In this case, the weighted average would equal

[[math]] dm_0 + (1-d)M_0\ . [[/math]]

Similarly, the smallest possible weighted average equals

[[math]] dM_0 + (1-d)m_0\ . [[/math]]

Thus,

[[math]] \begin{eqnarray*} M_1 - m_1 &\le& \Bigl(dm_0 + (1-d)M_0\Bigr) - \Bigl(dM_0 + (1-d)m_0\Bigr) \\ &=& (1 - 2d)(M_0 - m_0)\ . \end{eqnarray*} [[/math]]

This completes the proof of the lemma.

■

We turn now to the proof of the fundamental limit theorem for regular Markov chains.

Theorem

(Fundamental Limit Theorem for Regular Chains) If [math]\mat{P}[/math] is the transition matrix for a regular Markov chain, then

[[math]] \lim_{n \to \infty} \mat {P}^n = \mat {W}\ , [[/math]]

where [math]\mat {W}[/math] is matrix with all rows equal. Furthermore, all entries in [math]\mat{W}[/math] are strictly positive.

Show Proof

We prove this theorem for the special case that [math]\mat{P}[/math] has no 0 entries. The extension to the general case is indicated in Exercise Exercise. Let [math]\mat {y}[/math] be any [math]r[/math]-component column vector, where [math]r[/math] is the number of states of the chain. We assume that [math]r \gt 1[/math], since otherwise the theorem is trivial. Let [math]M_n[/math] and [math]m_n[/math] be, respectively, the maximum and minimum components of the vector [math]\mat {P}^n \mat { y}[/math]. The vector [math]\mat {P}^n \mat {y}[/math] is obtained from the vector [math]\mat {P}^{n - 1} \mat {y}[/math] by multiplying on the left by the matrix [math]\mat{P}[/math]. Hence each component of [math]\mat {P}^n \mat {y}[/math] is an average of the components of [math]\mat {P}^{n - 1} \mat {y}[/math]. Thus

[[math]] M_0 \geq M_1 \geq M_2 \geq\cdots [[/math]]

and

[[math]] m_0 \leq m_1 \leq m_2 \leq\cdots\ . [[/math]]

Each sequence is monotone and bounded:

[[math]] m_0 \leq m_n \leq M_n \leq M_0\ . [[/math]]

Hence, each of these sequences will have a limit as [math]n[/math] tends to infinity.

Let [math]M[/math] be the limit of [math]M_n[/math] and [math]m[/math] the limit of [math]m_n[/math]. We know that [math]m \leq M[/math]. We shall prove that [math]M - m = 0[/math]. This will be the case if [math]M_n - m_n[/math] tends to 0. Let [math]d[/math] be the smallest element of [math]\mat{P}[/math]. Since all entries of [math]\mat{P}[/math] are strictly positive, we have [math]d \gt 0[/math]. By our lemma

[[math]] M_n - m_n \leq (1 - 2d)(M_{n - 1} - m_{n - 1})\ . [[/math]]

From this we see that

[[math]] M_n - m_n \leq (1 - 2d)^n(M_0 - m_0)\ . [[/math]]

Since [math]r \ge 2[/math], we must have [math]d \leq 1/2[/math], so [math]0 \leq 1 - 2d \lt 1[/math], so the difference [math]M_n - m_n[/math] tends to 0 as [math]n[/math] tends to infinity. Since every component of [math]\mat {P}^n \mat {y}[/math] lies between [math]m_n[/math] and [math]M_n[/math], each component must approach the same number [math]u = M = m[/math]. This shows that

[[math]] \lim_{n \to \infty} \mat {P}^n \mat {y} = \mat{u}\ , \label{eq 11.4.4} [[/math]]

where [math]\mat{u}[/math] is a column vector all of whose components equal [math]u[/math].

Now let [math]\mat{y}[/math] be the vector with [math]j[/math]th component equal to 1 and all other components equal to 0. Then [math]\mat {P}^n \mat {y}[/math] is the [math]j[/math]th column of [math]\mat {P}^n[/math]. Doing this for each [math]j[/math] proves that the columns of [math]\mat {P}^n[/math] approach constant column vectors. That is, the rows of [math]\mat {P}^n[/math] approach a common row vector [math]\mat{w}[/math], or,

[[math]] \lim_{n \to \infty} \mat {P}^n = \mat {W}\ . \label{eq 11.4.5} [[/math]]

It remains to show that all entries in [math]\mat{W}[/math] are strictly positive. As before, let [math]\mat{y}[/math] be the vector with [math]j[/math]th component equal to 1 and all other components equal to 0. Then [math]\mat{P}\mat{y}[/math] is the [math]j[/math]th column of [math]\mat{P}[/math], and this column has all entries strictly positive. The minimum component of the vector [math]\mat{P}\mat{y}[/math] was defined to be [math]m_1[/math], hence [math]m_1 \gt 0[/math]. Since [math]m_1 \le m[/math], we have [math]m \gt 0[/math]. Note finally that this value of [math]m[/math] is just the [math]j[/math]th component of [math]\mat{w}[/math], so all components of [math]\mat{w}[/math] are strictly positive.

■

Doeblin's Proof

We give now a very different proof of the main part of the fundamental limit theorem for regular Markov chains. This proof was first given by Doeblin,^{[Notes 1]} a brilliant young mathematician who was killed in his twenties in the Second World War.

Theorem

Let [math]\mat {P}[/math] be the transition matrix for a regular Markov chain with fixed vector [math]\mat {w}[/math]. Then for any initial probability vector [math]\mat {u}[/math], [math]\mat {uP}^n \rightarrow \mat {w}[/math] as [math]n \rightarrow \infty.[/math]

Show Proof

Let [math] X_0,\ X_1,\ \ldots[/math] be a Markov chain with transition matrix [math]\mat {P}[/math] started in state [math]s_i[/math]. Let [math]Y_0,\ Y_1,\ \ldots[/math] be a Markov chain with transition probability [math]\mat {P}[/math] started with initial probabilities given by [math]\mat {w}[/math]. The [math]X[/math] and [math]Y[/math] processes are run independently of each other.

We consider also a third Markov chain [math]\mat{P}^*[/math] which consists of watching both the [math]X[/math] and [math]Y[/math] processes. The states for [math]\mat{P}^*[/math] are pairs [math](s_i, s_j)[/math]. The transition probabilities are given by

[[math]] \mat{P}^{*}[(i,j),(k,l)] = \mat{P}(i,k) \cdot \mat{P}(j,l)\ . [[/math]]

Since [math]\mat{P}[/math] is regular there is an [math]N[/math] such that [math]\mat{P}^{N}(i,j) \gt 0[/math] for all [math]i[/math] and [math]j[/math]. Thus for the [math]\mat{P}^*[/math] chain it is also possible to go from any state [math](s_i, s_j)[/math] to any other state [math](s_k,s_l)[/math] in at most [math]N[/math] steps. That is [math]\mat{P}^*[/math] is also a regular Markov chain.

We know that a regular Markov chain will reach any state in a finite time. Let [math]T[/math] be the first time the the chain [math]\mat{P}^*[/math] is in a state of the form [math](s_k,s_k)[/math]. In other words, [math]T[/math] is the first time that the [math]X[/math] and the [math]Y[/math] processes are in the same state. Then we have shown that

[[math]] P[T \gt n] \rightarrow 0 \;\;\mbox{as}\;\; n \rightarrow \infty\ . [[/math]]

If we watch the [math]X[/math] and [math]Y[/math] processes after the first time they are in the same state we would not predict any difference in their long range behavior. Since this will happen no matter how we started these two processes, it seems clear that the long range behaviour should not depend upon the starting state. We now show that this is true.

We first note that if [math]n \ge T[/math], then since [math]X[/math] and [math]Y[/math] are both in the same state at time [math]T[/math],

[[math]] P(X_n = j\ |\ n \ge T) = P(Y_n = j\ |\ n \ge T)\ . [[/math]]

If we multiply both sides of this equation by [math]P(n \ge T)[/math], we obtain

[[math]] \begin{equation} P(X_n = j,\ n \ge T) = P(Y_n = j,\ n \ge T)\ . \label{eq 11.4.1} \end{equation} [[/math]]

We know that for all [math]n[/math],

[[math]] P(Y_n = j) = w_j\ . [[/math]]

But

[[math]] P(Y_n = j) = P(Y_n = j,\ n \ge T) + P(Y_n = j,\ n \lt T)\ , [[/math]]

and the second summand on the right-hand side of this equation goes to 0 as [math]n[/math] goes to [math]\infty[/math], since [math]P(n \lt T)[/math] goes to 0 as [math]n[/math] goes to [math]\infty[/math]. So,

[[math]] P(Y_n = j,\ n \ge T) \rightarrow w_j\ , [[/math]]

as [math]n[/math] goes to [math]\infty[/math]. From Equation, we see that

[[math]] P(X_n = j,\ n \ge T) \rightarrow w_j\ , [[/math]]

as [math]n[/math] goes to [math]\infty[/math]. But by similar reasoning to that used above, the difference between this last expression and [math]P(X_n = j)[/math] goes to 0 as [math]n[/math] goes to [math]\infty[/math]. Therefore,

[[math]] P(X_n = j) \rightarrow w_j\ , [[/math]]

as [math]n[/math] goes to [math]\infty[/math]. This completes the proof.

■

In the above proof, we have said nothing about the rate at which the distributions of the [math]X_n[/math]'s approach the fixed distribution [math]\mat {w}[/math]. In fact, it can be shown that^{[Notes 2]}

[[math]] \sum ^{r}_{j = 1} \mid P(X_{n} = j) - w_j \mid \leq 2 P(T \gt n)\ . [[/math]]

The left-hand side of this inequality can be viewed as the distance between the distribution of the Markov chain after [math]n[/math] steps, starting in state [math]s_i[/math], and the limiting distribution [math]\mat {w}[/math].

General references

Doyle, Peter G. (2006). "Grinstead and Snell's Introduction to Probability" (PDF). Retrieved June 6, 2024.

Notes

W. Doeblin, “Exposé de la Théorie des Chaines Simple Constantes de Markov à un Nombre Fini d'Etats,” Rev.\ Mach.\ de l'Union Interbalkanique, vol. 2 (1937), pp. 77--105.
T. Lindvall, Lectures on the Coupling Method (New York: Wiley 1992).

[1] W. Doeblin, “Exposé de la Théorie des Chaines Simple Constantes de Markov à un Nombre Fini d'Etats,” Rev.\ Mach.\ de l'Union Interbalkanique, vol. 2 (1937), pp. 77--105.

[2] T. Lindvall, Lectures on the Coupling Method (New York: Wiley 1992).

[Notes 1]

[Notes 2]