guide:D71bfb6804: Difference between revisions
No edit summary |
mNo edit summary |
||
(One intermediate revision by one other user not shown) | |||
Line 1: | Line 1: | ||
<div class="d-none"><math> | |||
\newcommand{\NA}{{\rm NA}} | |||
\newcommand{\mat}[1]{{\bf#1}} | |||
\newcommand{\exref}[1]{\ref{##1}} | |||
\newcommand{\secstoprocess}{\all} | |||
\newcommand{\NA}{{\rm NA}} | |||
\newcommand{\mathds}{\mathbb}</math></div> | |||
The fundamental limit theorem for regular Markov chains states that if | |||
<math>\mat{P}</math> is | |||
a regular transition matrix then | |||
<math display="block"> | |||
\lim_{n \to \infty} \mat {P}^n = \mat {W}\ , | |||
</math> | |||
where <math>\mat{W}</math> is a matrix with each row equal to the unique fixed probability | |||
row vector | |||
<math>\mat{w}</math> for <math>\mat{P}</math>. In this section we shall give two very different | |||
proofs of this | |||
theorem. | |||
Our first proof is carried out by showing that, for any column vector | |||
<math>\mat{y}</math>, | |||
<math>\mat{P}^n \mat {y}</math> tends to a constant vector. As indicated in [[guide:E5c38a1e8a|Ergodic Markov Chains]], this will show that <math>\mat{P}^n</math> converges to a matrix with constant columns or, | |||
equivalently, to a matrix with all rows the same. | |||
The following lemma says that if an <math>r</math>-by-<math>r</math> transition matrix has no zero | |||
entries, | |||
and <math>\mat {y}</math> is any column vector with <math>r</math> entries, then the vector <math>\mat | |||
{P}\mat{y}</math> has | |||
entries which are “closer together” than the entries are in <math>\mat {y}</math>. | |||
{{proofcard|Lemma|lemma-1|Let <math>\mat{P}</math> be an <math>r</math>-by-<math>r</math> transition matrix with no zero entries. Let <math>d</math> | |||
be | |||
the smallest entry of the matrix. Let <math>\mat{y}</math> be a column vector with <math>r</math> | |||
components, the largest of which is <math>M_0</math> and the smallest <math>m_0</math>. Let | |||
<math>M_1</math> and <math>m_1</math> be the largest and smallest component, respectively, of the | |||
vector <math>\mat {P} \mat {y}</math>. Then | |||
<math display="block"> | |||
M_1 - m_1 \leq (1 - 2d)(M_0 - m_0)\ . | |||
</math>|In the discussion following [[guide:E5c38a1e8a#thm 11.3.6 |Theorem]], it was noted that each | |||
entry in the vector | |||
<math>\mat {P}\mat{y}</math> is a weighted average of the entries in <math>\mat {y}</math>. The | |||
largest weighted | |||
average that could be obtained in the present case would occur if all but one | |||
of the entries of | |||
<math>\mat {y}</math> have value <math>M_0</math> and one entry has value <math>m_0</math>, and this one small | |||
entry is weighted | |||
by the smallest possible weight, namely <math>d</math>. In this case, the weighted | |||
average | |||
would equal | |||
<math display="block"> | |||
dm_0 + (1-d)M_0\ . | |||
</math> | |||
Similarly, the smallest possible weighted average equals | |||
<math display="block"> | |||
dM_0 + (1-d)m_0\ . | |||
</math> | |||
Thus, | |||
<math display="block"> | |||
\begin{eqnarray*} | |||
M_1 - m_1 &\le& \Bigl(dm_0 + (1-d)M_0\Bigr) - \Bigl(dM_0 + (1-d)m_0\Bigr) \\ | |||
&=& (1 - 2d)(M_0 - m_0)\ . | |||
\end{eqnarray*} | |||
</math> | |||
This completes the proof of the lemma.}} | |||
We turn now to the proof of the fundamental limit theorem for regular Markov | |||
chains. | |||
{{proofcard|Theorem|theorem-1|''' (Fundamental Limit Theorem for Regular Chains)''' | |||
If <math>\mat{P}</math> is the transition matrix for a regular Markov chain, then | |||
<math display="block"> | |||
\lim_{n \to \infty} \mat {P}^n = \mat {W}\ , | |||
</math> | |||
where <math>\mat {W}</math> is matrix with all rows equal. Furthermore, all entries in | |||
<math>\mat{W}</math> are | |||
strictly positive.|We prove this theorem for the special case that <math>\mat{P}</math> has no 0 entries. | |||
The extension | |||
to the general case is indicated in Exercise [[exercise:Ed0d60ec05 |Exercise]]. Let <math>\mat {y}</math> be any | |||
<math>r</math>-component column vector, where <math>r</math> is the number of states of the chain. | |||
We assume that | |||
<math>r > 1</math>, since otherwise the theorem is trivial. Let | |||
<math>M_n</math> and | |||
<math>m_n</math> be, respectively, the maximum and minimum components of the | |||
vector <math>\mat {P}^n \mat { y}</math>. The vector <math>\mat {P}^n \mat {y}</math> is obtained | |||
from the | |||
vector <math>\mat {P}^{n - 1} | |||
\mat {y}</math> by multiplying on the left by the matrix <math>\mat{P}</math>. Hence each | |||
component | |||
of <math>\mat {P}^n \mat {y}</math> is an average of the components of <math>\mat {P}^{n - 1} | |||
\mat {y}</math>. Thus | |||
<math display="block"> | |||
M_0 \geq M_1 \geq M_2 \geq\cdots | |||
</math> | |||
and | |||
<math display="block"> | |||
m_0 \leq m_1 \leq m_2 \leq\cdots\ . | |||
</math> | |||
Each sequence is monotone and bounded: | |||
<math display="block"> | |||
m_0 \leq m_n \leq M_n \leq M_0\ . | |||
</math> | |||
Hence, each of these sequences will have a limit as <math>n</math> tends to infinity. | |||
Let <math>M</math> be the limit of <math>M_n</math> and <math>m</math> the limit of <math>m_n</math>. We know that <math>m \leq | |||
M</math>. We shall prove that <math>M - m = 0</math>. This will be the case if <math>M_n - m_n</math> | |||
tends to 0. Let <math>d</math> be the smallest element of <math>\mat{P}</math>. Since all entries | |||
of <math>\mat{P}</math> | |||
are strictly positive, we have <math>d > 0</math>. By our lemma | |||
<math display="block"> | |||
M_n - m_n \leq (1 - 2d)(M_{n - 1} - m_{n - 1})\ . | |||
</math> | |||
From this we see that | |||
<math display="block"> | |||
M_n - m_n \leq (1 - 2d)^n(M_0 - m_0)\ . | |||
</math> | |||
Since <math>r \ge 2</math>, we must have <math>d \leq 1/2</math>, so <math>0 \leq 1 - 2d < 1</math>, so | |||
the difference <math>M_n - m_n</math> tends to 0 as <math>n</math> tends to infinity. Since every | |||
component of <math>\mat {P}^n \mat {y}</math> lies between <math>m_n</math> and <math>M_n</math>, each component | |||
must | |||
approach the same number <math>u = M = m</math>. | |||
This shows that | |||
<span id{{=}}"eq 11.4.4"/> | |||
<math display="block"> | |||
\lim_{n \to \infty} \mat {P}^n \mat {y} = \mat{u}\ , \label{eq 11.4.4} | |||
</math> | |||
where <math>\mat{u}</math> is a column vector all of whose components equal <math>u</math>. | |||
Now let <math>\mat{y}</math> be the vector with <math>j</math>th component equal to 1 and all other | |||
components equal to | |||
0. Then <math>\mat {P}^n \mat {y}</math> is the | |||
<math>j</math>th column of <math>\mat {P}^n</math>. Doing this for each <math>j</math> proves that the columns | |||
of <math>\mat {P}^n</math> approach constant column vectors. That is, the rows of <math>\mat | |||
{P}^n</math> | |||
approach a common row vector <math>\mat{w}</math>, or, | |||
<span id{{=}}"eq 11.4.5"/> | |||
<math display="block"> | |||
\lim_{n \to \infty} \mat {P}^n = \mat {W}\ . \label{eq 11.4.5} | |||
</math> | |||
It remains to show that all entries in <math>\mat{W}</math> are strictly positive. As | |||
before, let | |||
<math>\mat{y}</math> be the vector with <math>j</math>th component equal to 1 and all other | |||
components equal to 0. | |||
Then <math>\mat{P}\mat{y}</math> is the <math>j</math>th column of <math>\mat{P}</math>, and this column has all | |||
entries strictly | |||
positive. The minimum component of the vector <math>\mat{P}\mat{y}</math> was defined to | |||
be <math>m_1</math>, hence | |||
<math>m_1 > 0</math>. Since <math>m_1 \le m</math>, we have <math>m > 0</math>. Note finally that this value | |||
of <math>m</math> is just the | |||
<math>j</math>th component of <math>\mat{w}</math>, so all components of <math>\mat{w}</math> are strictly | |||
positive.}} | |||
===Doeblin's Proof=== | |||
We give now a very different proof of the main part of the fundamental limit | |||
theorem for | |||
regular Markov chains. This proof was first given by Doeblin,<ref group="Notes" >W. Doeblin, “Exposé de la Théorie des Chaines Simple Constantes | |||
de Markov à un | |||
Nombre Fini d'Etats,” ''Rev.\ Mach.\ de l'Union Interbalkanique,'' vol. 2 | |||
(1937), pp. 77--105.</ref> a brilliant young mathematician who was killed in | |||
his twenties in the Second World War. | |||
{{proofcard|Theorem|thm_11.4.1|Let <math>\mat {P}</math> be the transition matrix for a regular Markov chain with fixed | |||
vector | |||
<math>\mat {w}</math>. Then for any initial probability vector <math>\mat {u}</math>, | |||
<math>\mat {uP}^n \rightarrow \mat {w}</math> as <math>n \rightarrow \infty.</math>|Let <math> X_0,\ X_1,\ \ldots</math> be a Markov | |||
chain with transition matrix <math>\mat {P}</math> started in state <math>s_i</math>. Let <math>Y_0,\ | |||
Y_1,\ \ldots</math> be a Markov | |||
chain with transition probability <math>\mat {P}</math> started with initial probabilities | |||
given by | |||
<math>\mat {w}</math>. The <math>X</math> and <math>Y</math> processes are run independently of each other. | |||
We consider also a third Markov chain <math>\mat{P}^*</math> which consists of watching | |||
both the | |||
<math>X</math> and <math>Y</math> processes. The states for <math>\mat{P}^*</math> | |||
are pairs <math>(s_i, s_j)</math>. The transition probabilities are given by | |||
<math display="block"> | |||
\mat{P}^{*}[(i,j),(k,l)] = \mat{P}(i,k) \cdot \mat{P}(j,l)\ . | |||
</math> | |||
Since <math>\mat{P}</math> is regular there is an <math>N</math> such that <math>\mat{P}^{N}(i,j) > 0</math> for | |||
all <math>i</math> | |||
and | |||
<math>j</math>. Thus for the | |||
<math>\mat{P}^*</math> chain it is also possible to go from any state <math>(s_i, s_j)</math> to any | |||
other state | |||
<math>(s_k,s_l)</math> in at most <math>N</math> steps. That is <math>\mat{P}^*</math> is also a regular Markov | |||
chain. | |||
We know that a regular Markov chain will reach any state in a finite time. Let | |||
<math>T</math> be | |||
the first time the the chain <math>\mat{P}^*</math> is in a state of the form <math>(s_k,s_k)</math>. | |||
In other words, | |||
<math>T</math> is the first time that the <math>X</math> and the <math>Y</math> processes are in the same state. | |||
Then we have shown that | |||
<math display="block"> | |||
P[T > n] \rightarrow 0 \;\;\mbox{as}\;\; n \rightarrow \infty\ . | |||
</math> | |||
If we watch the <math>X</math> and <math>Y</math> processes after the first time they are in the same | |||
state we would not predict any difference in their long range behavior. Since | |||
this will | |||
happen no matter how we started these two processes, it seems clear that the | |||
long range | |||
behaviour should not depend upon the starting state. We now show that this is | |||
true. | |||
We first note that if <math>n \ge T</math>, then since <math>X</math> and <math>Y</math> are both in the | |||
same state at time <math>T</math>, | |||
<math display="block"> | |||
P(X_n = j\ |\ n \ge T) = P(Y_n = j\ |\ n \ge T)\ . | |||
</math> | |||
If we multiply both sides of this equation by <math>P(n \ge T)</math>, we obtain | |||
<span id{{=}}"eq 11.4.1"/> | |||
<math display="block"> | |||
\begin{equation} | |||
P(X_n = j,\ n \ge T) = P(Y_n = j,\ n \ge T)\ . | |||
\label{eq 11.4.1} | |||
\end{equation} | |||
</math> | |||
We know that for all <math>n</math>, | |||
<math display="block"> | |||
P(Y_n = j) = w_j\ . | |||
</math> | |||
But | |||
<math display="block"> | |||
P(Y_n = j) = P(Y_n = j,\ n \ge T) + P(Y_n = j,\ n < T)\ , | |||
</math> | |||
and the second summand on the right-hand side of this equation goes to 0 as <math>n</math> | |||
goes to | |||
<math>\infty</math>, since <math>P(n < T)</math> goes to 0 as <math>n</math> goes to <math>\infty</math>. So, | |||
<math display="block"> | |||
P(Y_n = j,\ n \ge T) \rightarrow w_j\ , | |||
</math> | |||
as <math>n</math> goes to <math>\infty</math>. From [[#eq 11.4.1 |Equation]], we see that | |||
<math display="block"> | |||
P(X_n = j,\ n \ge T) \rightarrow w_j\ , | |||
</math> | |||
as <math>n</math> goes to <math>\infty</math>. But by similar reasoning to that used above, the | |||
difference | |||
between this last expression and <math>P(X_n = j)</math> goes to 0 as <math>n</math> goes to | |||
<math>\infty</math>. Therefore, | |||
<math display="block"> | |||
P(X_n = j) \rightarrow w_j\ , | |||
</math> | |||
as <math>n</math> goes to <math>\infty</math>. This completes the proof.}} | |||
In the above proof, we have said nothing about the rate at which the | |||
distributions of the | |||
<math>X_n</math>'s approach the fixed distribution <math>\mat {w}</math>. In fact, it can be shown | |||
that<ref group="Notes" >T. Lindvall, ''Lectures on the Coupling Method'' (New York: | |||
Wiley 1992).</ref> | |||
<math display="block"> | |||
\sum ^{r}_{j = 1} \mid P(X_{n} = j) - w_j \mid \leq 2 | |||
P(T > n)\ . | |||
</math> | |||
The left-hand side of this inequality can be viewed as the distance between the | |||
distribution | |||
of the Markov chain after <math>n</math> steps, starting in state <math>s_i</math>, and the limiting | |||
distribution | |||
<math>\mat {w}</math>. | |||
==General references== | |||
{{cite web |url=https://math.dartmouth.edu/~prob/prob/prob.pdf |title=Grinstead and Snell’s Introduction to Probability |last=Doyle |first=Peter G.|date=2006 |access-date=June 6, 2024}} | |||
==Notes== | |||
{{Reflist|group=Notes}} |
Latest revision as of 22:34, 11 June 2024
The fundamental limit theorem for regular Markov chains states that if [math]\mat{P}[/math] is a regular transition matrix then
where [math]\mat{W}[/math] is a matrix with each row equal to the unique fixed probability row vector [math]\mat{w}[/math] for [math]\mat{P}[/math]. In this section we shall give two very different proofs of this theorem.
Our first proof is carried out by showing that, for any column vector
[math]\mat{y}[/math],
[math]\mat{P}^n \mat {y}[/math] tends to a constant vector. As indicated in Ergodic Markov Chains, this will show that [math]\mat{P}^n[/math] converges to a matrix with constant columns or,
equivalently, to a matrix with all rows the same.
The following lemma says that if an [math]r[/math]-by-[math]r[/math] transition matrix has no zero entries, and [math]\mat {y}[/math] is any column vector with [math]r[/math] entries, then the vector [math]\mat {P}\mat{y}[/math] has entries which are “closer together” than the entries are in [math]\mat {y}[/math].
Let [math]\mat{P}[/math] be an [math]r[/math]-by-[math]r[/math] transition matrix with no zero entries. Let [math]d[/math] be the smallest entry of the matrix. Let [math]\mat{y}[/math] be a column vector with [math]r[/math] components, the largest of which is [math]M_0[/math] and the smallest [math]m_0[/math]. Let [math]M_1[/math] and [math]m_1[/math] be the largest and smallest component, respectively, of the vector [math]\mat {P} \mat {y}[/math]. Then
In the discussion following Theorem, it was noted that each entry in the vector [math]\mat {P}\mat{y}[/math] is a weighted average of the entries in [math]\mat {y}[/math]. The largest weighted average that could be obtained in the present case would occur if all but one of the entries of [math]\mat {y}[/math] have value [math]M_0[/math] and one entry has value [math]m_0[/math], and this one small entry is weighted by the smallest possible weight, namely [math]d[/math]. In this case, the weighted average would equal
We turn now to the proof of the fundamental limit theorem for regular Markov chains.
(Fundamental Limit Theorem for Regular Chains) If [math]\mat{P}[/math] is the transition matrix for a regular Markov chain, then
We prove this theorem for the special case that [math]\mat{P}[/math] has no 0 entries. The extension to the general case is indicated in Exercise Exercise. Let [math]\mat {y}[/math] be any [math]r[/math]-component column vector, where [math]r[/math] is the number of states of the chain. We assume that [math]r \gt 1[/math], since otherwise the theorem is trivial. Let [math]M_n[/math] and [math]m_n[/math] be, respectively, the maximum and minimum components of the vector [math]\mat {P}^n \mat { y}[/math]. The vector [math]\mat {P}^n \mat {y}[/math] is obtained from the vector [math]\mat {P}^{n - 1} \mat {y}[/math] by multiplying on the left by the matrix [math]\mat{P}[/math]. Hence each component of [math]\mat {P}^n \mat {y}[/math] is an average of the components of [math]\mat {P}^{n - 1} \mat {y}[/math]. Thus
Let [math]M[/math] be the limit of [math]M_n[/math] and [math]m[/math] the limit of [math]m_n[/math]. We know that [math]m \leq
M[/math]. We shall prove that [math]M - m = 0[/math]. This will be the case if [math]M_n - m_n[/math]
tends to 0. Let [math]d[/math] be the smallest element of [math]\mat{P}[/math]. Since all entries
of [math]\mat{P}[/math]
are strictly positive, we have [math]d \gt 0[/math]. By our lemma
Now let [math]\mat{y}[/math] be the vector with [math]j[/math]th component equal to 1 and all other
components equal to
0. Then [math]\mat {P}^n \mat {y}[/math] is the
[math]j[/math]th column of [math]\mat {P}^n[/math]. Doing this for each [math]j[/math] proves that the columns
of [math]\mat {P}^n[/math] approach constant column vectors. That is, the rows of [math]\mat
{P}^n[/math]
approach a common row vector [math]\mat{w}[/math], or,
It remains to show that all entries in [math]\mat{W}[/math] are strictly positive. As
before, let
[math]\mat{y}[/math] be the vector with [math]j[/math]th component equal to 1 and all other
components equal to 0.
Then [math]\mat{P}\mat{y}[/math] is the [math]j[/math]th column of [math]\mat{P}[/math], and this column has all
entries strictly
positive. The minimum component of the vector [math]\mat{P}\mat{y}[/math] was defined to
be [math]m_1[/math], hence
[math]m_1 \gt 0[/math]. Since [math]m_1 \le m[/math], we have [math]m \gt 0[/math]. Note finally that this value
of [math]m[/math] is just the
[math]j[/math]th component of [math]\mat{w}[/math], so all components of [math]\mat{w}[/math] are strictly
positive.
Doeblin's Proof
We give now a very different proof of the main part of the fundamental limit theorem for regular Markov chains. This proof was first given by Doeblin,[Notes 1] a brilliant young mathematician who was killed in his twenties in the Second World War.
Let [math]\mat {P}[/math] be the transition matrix for a regular Markov chain with fixed vector [math]\mat {w}[/math]. Then for any initial probability vector [math]\mat {u}[/math], [math]\mat {uP}^n \rightarrow \mat {w}[/math] as [math]n \rightarrow \infty.[/math]
Show ProofLet [math] X_0,\ X_1,\ \ldots[/math] be a Markov chain with transition matrix [math]\mat {P}[/math] started in state [math]s_i[/math]. Let [math]Y_0,\ Y_1,\ \ldots[/math] be a Markov chain with transition probability [math]\mat {P}[/math] started with initial probabilities given by [math]\mat {w}[/math]. The [math]X[/math] and [math]Y[/math] processes are run independently of each other.
We consider also a third Markov chain [math]\mat{P}^*[/math] which consists of watching
both the
[math]X[/math] and [math]Y[/math] processes. The states for [math]\mat{P}^*[/math]
are pairs [math](s_i, s_j)[/math]. The transition probabilities are given by
Since [math]\mat{P}[/math] is regular there is an [math]N[/math] such that [math]\mat{P}^{N}(i,j) \gt 0[/math] for all [math]i[/math] and [math]j[/math]. Thus for the [math]\mat{P}^*[/math] chain it is also possible to go from any state [math](s_i, s_j)[/math] to any other state [math](s_k,s_l)[/math] in at most [math]N[/math] steps. That is [math]\mat{P}^*[/math] is also a regular Markov chain.
We know that a regular Markov chain will reach any state in a finite time. Let
[math]T[/math] be
the first time the the chain [math]\mat{P}^*[/math] is in a state of the form [math](s_k,s_k)[/math].
In other words,
[math]T[/math] is the first time that the [math]X[/math] and the [math]Y[/math] processes are in the same state.
Then we have shown that
We first note that if [math]n \ge T[/math], then since [math]X[/math] and [math]Y[/math] are both in the
same state at time [math]T[/math],
In the above proof, we have said nothing about the rate at which the distributions of the [math]X_n[/math]'s approach the fixed distribution [math]\mat {w}[/math]. In fact, it can be shown that[Notes 2]
The left-hand side of this inequality can be viewed as the distance between the distribution of the Markov chain after [math]n[/math] steps, starting in state [math]s_i[/math], and the limiting distribution [math]\mat {w}[/math].
General references
Doyle, Peter G. (2006). "Grinstead and Snell's Introduction to Probability" (PDF). Retrieved June 6, 2024.