guide:072e3b3e3a: Difference between revisions

Revision as of 01:53, 8 May 2024

In this chapter [math]E[/math] will be a finite or a countable set, endowed with the [math]\sigma[/math]-Algebra [math]\mathcal{P}(E)[/math]. A stochastic matrix on [math]E[/math] is a family [math](Q(x,y))_{x,y\in E}[/math] of real numbers satisfying

[math]0\leq Q(x,y)\leq 1[/math] for all [math]x,y\in E[/math].
[math]\sum_{y\in E}Q(x,y)=1[/math] for all [math]x\in E[/math].

This is a transition probability from [math]E[/math] to [math]E[/math] in the following sense: if for [math]x\in E[/math] and [math]A\subset E[/math] we write

[[math]] \nu(x,A)=\sum_{y\in A}\nu(x,y), [[/math]]

we see that [math]\nu[/math] is a transition kernel probability from [math]E[/math] to [math]E[/math]. Conversely, if we start with such a transition kernel, the formula

[[math]] Q(x,y)=\nu(x,\{y\}) [[/math]]

defines a stochastic matrix on [math]E[/math]. For [math]n\geq 1[/math], we can define [math]Q_n=Q^n[/math]. Indeed, [math]Q_1=Q[/math] and by induction

[[math]] Q_{n+1}(x,y)=\sum_{z\in E}Q_n(x,z)Q(z,y). [[/math]]

One can check that [math]Q_n[/math] is also a stochastic matrix on [math]E[/math]. For [math]n=0[/math] we take [math]Q_0(x,y)=\one_{\{x=y\}}[/math]. For a measurable map [math]f:E\to\R_+[/math] we write [math]Qf[/math] as the function defined by

[[math]] Qf(x)=\sum_{y\in E}Q(x,y)f(y). [[/math]]

Definition (Markov chain)

Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math]Q[/math] be a stochastic matrix on [math]E[/math] and let [math](X_n)_{n\geq 0}[/math] be a stochastic process with values in [math]E[/math]. We say that [math](X_n)_{n\geq 0}[/math] is a Markov chain with transition matrix [math]Q[/math] if for all [math]n\geq 0[/math], the conditional distribution of [math]X_{n+1}[/math] given [math](X_0,X_1,...,X_n)[/math] is [math]Q(X_n,y)[/math], or equivalently if for all [math]x_0,x_1,...,x_n,y\in E[/math]

[[math]] \p[X_{n+1}=y\mid X_0=x_0,X_1=x_1,...,X_{n-1}=x_{n-1},X_n=x_n]=Q(x_n,y) [[/math]]

such that [math]\p[X_0=x_0,....,X_n=x_n] \gt 0[/math].

In general, the conditional distribution of [math]X_{n+1}[/math] given [math]X_0,X_1,...,X_n[/math] depends on all the variables [math]X_0,X_1,...,X_n[/math]. The fact that this conditional distribution only depend on [math]X_n[/math] is called the Markov property.

[math]Q(x,\cdot)[/math], which is the distribution of [math]X_{n+1}[/math] given [math]X_n=x_n[/math] does not depend on [math]n[/math]: this is the homogeneity of the Markov chain.

Proposition

Let [math](\Omega,\F,\p)[/math] be a probability space. A stochastic process [math](X_n)_{n\geq 0}[/math] with values in [math]E[/math] is a Markov chain with transition kernel [math]Q[/math] if and only if for all [math]n\geq 0[/math] and for all [math]x_0,x_1,...,x_n\in E[/math]

[[math]] \begin{equation} \label{markov} \p[X_0=x_0,X_1=x_1,...,X_n=x_n]=\p[X_0=x_0]\prod_{j=1}^nQ(x_{j-1},x_j). \end{equation} [[/math]]

In particular, if [math]\p[X_0=x_0] \gt 0[/math], then

[[math]] \p[X_n=x_n\mid X_0=x_0]=Q_n(x_0,x_n). [[/math]]

Show Proof

If [math](X_n)_{n\geq 0}[/math] is a Markov chain with transition matrix [math]Q[/math], we have

[[math]] \begin{align*} \p[X_0=x_0,...,X_n=x_n,X_{n+1}=x_{n+1}]&=\p[X_0=x_0,...,X_n=x_n]\p[X_{n+1}=x_{n+1}\mid X_0=x_0,...,X_n=x_n]\\ &=\p[X_1=x_1,...,X_n=x_n]\\ &=Q(x_{n+1},x_n). \end{align*} [[/math]]

Thus we can conclude by induction. Conversely, if (markov) is satisfied, then

[[math]] \p[X_{n+1}=y\mid X_0=x_0,...,X_n=x_n]=\frac{\p[X_0=x_0]\prod_{j=0}^{n-1}Q(x_j,x_{j+1})Q(x_n,y)}{\p[X_0=x_0]\prod_{j=0}^{n-1}Q(x_j,x_{j+1})}=Q(x_n,y). [[/math]]

To conclude, we note that

[[math]] Q_n(x_0,x)=\sum_{x_1,...x_{n+1}\in E}\prod_{j=1}^nQ(x_{j-1}x_j). [[/math]]

■

Proposition 15.1. shows that for a Markov chain, [math](X_0,X_1,...,X_n)[/math] is completely determined by the initial distribution (that of [math]X_0[/math]) and the transition matrix [math]Q[/math]. For now we want to note [math]\p[A\mid Z][/math] for [math]\E[\one_A\mid Z][/math].

Proposition

Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math](X_n)_{n\geq 0}[/math] be a Markov chain with transition matrix [math]Q[/math].

For all [math]n\geq 0[/math] and for all measurable maps [math]f:E\to \R_+[/math] we have
[[math]] \E[f(X_{n+1})\mid X_0,X_1,...,X_n]=\E[f(X_{n+1})\mid X_n]=Qf(X_n). [[/math]]
More generally for all [math]\{ i_1,...,i_n\}\subset \{0,1,...,n-1\}[/math], we have
[[math]] \E[f(X_{n+1})\mid X_{i_1},X_{i_2},...,X_{i_n},X_n]=Qf(X_n). [[/math]]
For all [math]n\geq 0, p\geq 1[/math] and for all [math]y_1,...,y_p\in E[/math] we have
[[math]] \p[X_{n+1}=y_1,...,X_{n+p}=y_p\mid X_0,...,X_n]=Q(X_n,y_1)\prod_{j=1}^{p-1}Q(y_j,y_{j+1}) [[/math]]
and hence
[[math]] \p[X_{n+p}=y_p\mid X_n]=Q_p(X_n,y_p). [[/math]]
If we take [math]Y_p=X_{n+p}[/math], then [math](Y_p)_{p\geq 0}[/math] is also a Markov chain with transition matrix [math]Q[/math].

Show Proof

For [math](i)[/math], we have

[[math]] \E[f(X_{n++})\mid X_0,X_1,...,X_n]=\sum_{y\in E}Q(X_n,y)f(y)=Qf(X_n). [[/math]]

Now if [math]\{i_1,...,i_n\}\subset \{ 0,1,...,n-1\}[/math], then^[a]

[[math]] \begin{multline*} \E[f(X_{n+1})\mid X_{i_1},...,X_{i_n},X_n]=\E[\E[f(X_{n+1})\mid X_0,X_1,...,X_n]\mid X_{i_1,...,X_{i_n}},X_n]\\=\E[Qf(X_n)\mid X_{i_1},...,X_{i_n},X_n]=Qf(X_n). \end{multline*} [[/math]]

For [math](ii)[/math], note that it follows immediately from [math]([[#markov |markov]])[/math] that

[[math]] \p[X_{n+1}=y_1,...,X_{n-p}=y_p\mid X_0=x_0,...,X_n=x_n]=Q(x_n,y_1)\prod_{j=1}^{p-1}Q(y_j,y_{j+1}). [[/math]]

The formula for [math]\p[X_{n-p}=y_p\mid X_n][/math] follows for [math]y_1,...,y_{p-1}\in E[/math]. Finally we note that

[[math]] \p[Y_0=,Y_1=y_1,...,Y_p=y_p]=\p[X_n=y_n]\prod_{j=0}^pQ(y_{j-1},y_j) [[/math]]

and we can apply proposition 15.1.

■

Example

Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math](X_n)_{n\geq 0}[/math] be a sequence of independent r.v.'s with values in [math]E[/math], with the same distribution [math]\mu[/math]. Then [math](X_n)_{n\geq 0}[/math] is a Markov chain with transition matrix

[[math]] Q(x,y)=\mu(y) [[/math]]

for all [math]x,y\in E[/math].

Example

Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math]\eta,\xi_1,...,\xi_n,...[/math] be independent r.v.'s in [math]\mathbb{Z}^d[/math]. We assume that [math]\xi_1,...\xi_n,...[/math] have the same distribution [math]\mu[/math]. For [math]n\geq 0[/math] we write

[[math]] X_n=\eta+\xi_1+\dotsm+\xi_n. [[/math]]

Then [math](X_n)_{n\geq 0}[/math] is a Markov chain with transition matrix

[[math]] Q(x,y)=\mu(y-x) [[/math]]

for all [math]x,y\in E[/math]. Indeed,

[[math]] \begin{multline*} \p[X_{n+1}=y\mid X_0=x_0,...,X_n=x_n]=\p[\xi_{n+1}=y-x_n\mid X_0=x_0,...,X_n=x_n]\\=\p[\xi_{n+1}=y-x_n]=\mu(y-x_n). \end{multline*} [[/math]]

If [math](e_1,...,e_d)[/math] is the canonical basis of [math]\R^d[/math], and if [math]\mu(e_i)=\mu(-e_i)=\frac{1}{2d}[/math] for all [math]i\in\{1,...,d\}[/math], then the Markov chain is called a simple random walk on [math]\mathbb{Z}^d[/math].

Example

Let [math]\mathcal{P}_2(E)[/math] be the subsets of [math]E[/math] with two elements and let [math]A\subset\mathcal{P}_2(E)[/math]. For [math]x\in E[/math], we note

[[math]] A_x=\{y\in E\mid \{x,y\}\in A\}. [[/math]]

We assume that [math]A_x\not=\varnothing[/math] for all [math]x\in E[/math]. We then define a transition matrix on [math]E[/math] by setting for [math]x,y\in E[/math]

[[math]] Q(x,y)=\begin{cases}\frac{1}{\vert A_x\vert},&\text{if $\{x,y\}\in A$}\\ 0,&\text{otherwise}\end{cases} [[/math]]

A Markov chain with transition matrix [math]Q[/math] is called a simple random walk on the graph [math](E,A)[/math].

General references

Moshayedi, Nima (2020). "Lectures on Probability Theory". arXiv:2010.16280 [math.PR].

Notes

Recall that if [math]X\in(E,\mathcal{E})[/math], [math]Y\in(F,\F)[/math], we call the conditional distribution of [math]Y[/math] given [math]X[/math] any transition kernel [math]\nu:E\to F[/math] such that for all measurable maps [math]h:(F;\F)\to \R_+[/math], [math]\E[h(y)\mid X]=\int_\R h(y)\nu(X,dy)[/math].

[1] Recall that if [math]X\in(E,\mathcal{E})[/math], [math]Y\in(F,\F)[/math], we call the conditional distribution of [math]Y[/math] given [math]X[/math] any transition kernel [math]\nu:E\to F[/math] such that for all measurable maps [math]h:(F;\F)\to \R_+[/math], [math]\E[h(y)\mid X]=\int_\R h(y)\nu(X,dy)[/math].

[a]

@@ Line 1: / Line 1: @@
+<div class="d-none"><math>
+\newcommand{\R}{\mathbb{R}}
+\newcommand{\A}{\mathcal{A}}
+\newcommand{\B}{\mathcal{B}}
+\newcommand{\N}{\mathbb{N}}
+\newcommand{\C}{\mathbb{C}}
+\newcommand{\Rbar}{\overline{\mathbb{R}}}
+\newcommand{\Bbar}{\overline{\mathcal{B}}}
+\newcommand{\Q}{\mathbb{Q}}
+\newcommand{\E}{\mathbb{E}}
+\newcommand{\p}{\mathbb{P}}
+\newcommand{\one}{\mathds{1}}
+\newcommand{\0}{\mathcal{O}}
+\newcommand{\mat}{\textnormal{Mat}}
+\newcommand{\sign}{\textnormal{sign}}
+\newcommand{\CP}{\mathcal{P}}
+\newcommand{\CT}{\mathcal{T}}
+\newcommand{\CY}{\mathcal{Y}}
+\newcommand{\F}{\mathcal{F}}
+\newcommand{\mathds}{\mathbb}</math></div>
+In this chapter <math>E</math> will be a finite or a countable set, endowed with the <math>\sigma</math>-Algebra <math>\mathcal{P}(E)</math>. A stochastic matrix on <math>E</math> is a family <math>(Q(x,y))_{x,y\in E}</math> of real numbers satisfying
+<ul style{{=}}"list-style-type:lower-roman"><li><math>0\leq  Q(x,y)\leq  1</math> for all <math>x,y\in E</math>.
+</li>
+<li><math>\sum_{y\in E}Q(x,y)=1</math> for all <math>x\in E</math>.
+</li>
+</ul>
+This is a transition probability from <math>E</math> to <math>E</math> in the following sense: if for <math>x\in E</math> and <math>A\subset E</math> we write
+<math display="block">
+\nu(x,A)=\sum_{y\in A}\nu(x,y),
+</math>
+we see that <math>\nu</math> is a transition kernel probability from <math>E</math> to <math>E</math>. Conversely, if we start with such a transition kernel, the formula
+<math display="block">
+Q(x,y)=\nu(x,\{y\})
+</math>
+defines a stochastic matrix on <math>E</math>. For <math>n\geq 1</math>, we can define <math>Q_n=Q^n</math>. Indeed, <math>Q_1=Q</math> and by induction
+<math display="block">
+Q_{n+1}(x,y)=\sum_{z\in E}Q_n(x,z)Q(z,y).
+</math>
+One can check that <math>Q_n</math> is also a stochastic matrix on <math>E</math>. For <math>n=0</math> we take <math>Q_0(x,y)=\one_{\{x=y\}}</math>. For a measurable map <math>f:E\to\R_+</math> we write <math>Qf</math> as the function defined by
+<math display="block">
+Qf(x)=\sum_{y\in E}Q(x,y)f(y).
+</math>
+{{definitioncard|Markov chain|
+Let <math>(\Omega,\F,\p)</math> be a probability space. Let <math>Q</math> be a stochastic matrix on <math>E</math> and let <math>(X_n)_{n\geq 0}</math> be a stochastic process with values in <math>E</math>. We say that <math>(X_n)_{n\geq 0}</math> is a Markov chain with transition matrix <math>Q</math> if for all <math>n\geq 0</math>, the conditional distribution of <math>X_{n+1}</math> given <math>(X_0,X_1,...,X_n)</math> is <math>Q(X_n,y)</math>, or equivalently if for all <math>x_0,x_1,...,x_n,y\in E</math>
+<math display="block">
+\p[X_{n+1}=y\mid X_0=x_0,X_1=x_1,...,X_{n-1}=x_{n-1},X_n=x_n]=Q(x_n,y)
+</math>
+such that <math>\p[X_0=x_0,....,X_n=x_n] > 0</math>.
+}}
+{{alert-info |
+In general, the conditional distribution of <math>X_{n+1}</math> given <math>X_0,X_1,...,X_n</math> depends on all the variables <math>X_0,X_1,...,X_n</math>. The fact that this conditional distribution only depend on <math>X_n</math> is called the Markov property.
+}}
+{{alert-info |
+<math>Q(x,\cdot)</math>, which is the distribution of <math>X_{n+1}</math> given <math>X_n=x_n</math> does not depend on <math>n</math>: this is the homogeneity of the Markov chain.
+}}
+{{proofcard|Proposition|prop-1|Let <math>(\Omega,\F,\p)</math> be a probability space. A stochastic process <math>(X_n)_{n\geq 0}</math> with values in <math>E</math> is a Markov chain with transition kernel <math>Q</math> if and only if for all <math>n\geq 0</math> and for all <math>x_0,x_1,...,x_n\in E</math>
+<math display="block">
+\begin{equation}
+\label{markov}
+\p[X_0=x_0,X_1=x_1,...,X_n=x_n]=\p[X_0=x_0]\prod_{j=1}^nQ(x_{j-1},x_j).
+\end{equation}
+</math>
+In particular, if <math>\p[X_0=x_0] > 0</math>, then
+<math display="block">
+\p[X_n=x_n\mid X_0=x_0]=Q_n(x_0,x_n).
+</math>
+|If <math>(X_n)_{n\geq 0}</math> is a Markov chain with transition matrix <math>Q</math>, we have
+<math display="block">
+\begin{align*}
+\p[X_0=x_0,...,X_n=x_n,X_{n+1}=x_{n+1}]&=\p[X_0=x_0,...,X_n=x_n]\p[X_{n+1}=x_{n+1}\mid X_0=x_0,...,X_n=x_n]\\
+&=\p[X_1=x_1,...,X_n=x_n]\\
+&=Q(x_{n+1},x_n).
+\end{align*}
+</math>
+Thus we can conclude by induction. Conversely, if ([[#markov |markov]]) is satisfied, then
+<math display="block">
+\p[X_{n+1}=y\mid X_0=x_0,...,X_n=x_n]=\frac{\p[X_0=x_0]\prod_{j=0}^{n-1}Q(x_j,x_{j+1})Q(x_n,y)}{\p[X_0=x_0]\prod_{j=0}^{n-1}Q(x_j,x_{j+1})}=Q(x_n,y).
+</math>
+To conclude, we note that
+<math display="block">
+Q_n(x_0,x)=\sum_{x_1,...x_{n+1}\in E}\prod_{j=1}^nQ(x_{j-1}x_j).
+</math>}}
+{{alert-info |
+Proposition 15.1. shows that for a Markov chain, <math>(X_0,X_1,...,X_n)</math> is completely determined by the initial distribution (that of <math>X_0</math>) and the transition matrix <math>Q</math>. For now we want to note <math>\p[A\mid Z]</math> for <math>\E[\one_A\mid Z]</math>.
+}}
+{{proofcard|Proposition|prop-2|Let <math>(\Omega,\F,\p)</math> be a probability space. Let <math>(X_n)_{n\geq 0}</math> be a Markov chain with transition matrix <math>Q</math>.
+<ul style{{=}}"list-style-type:lower-roman"><li>For all <math>n\geq 0</math> and for all measurable maps <math>f:E\to \R_+</math> we have
+<math display="block">
+\E[f(X_{n+1})\mid X_0,X_1,...,X_n]=\E[f(X_{n+1})\mid X_n]=Qf(X_n).
+</math>
+More generally for all <math>\{ i_1,...,i_n\}\subset \{0,1,...,n-1\}</math>, we have
+<math display="block">
+\E[f(X_{n+1})\mid X_{i_1},X_{i_2},...,X_{i_n},X_n]=Qf(X_n).
+</math>
+</li>
+<li>For all <math>n\geq 0, p\geq 1</math> and for all <math>y_1,...,y_p\in E</math> we have
+<math display="block">
+\p[X_{n+1}=y_1,...,X_{n+p}=y_p\mid X_0,...,X_n]=Q(X_n,y_1)\prod_{j=1}^{p-1}Q(y_j,y_{j+1})
+</math>
+and hence
+<math display="block">
+\p[X_{n+p}=y_p\mid X_n]=Q_p(X_n,y_p).
+</math>
+If we take <math>Y_p=X_{n+p}</math>, then <math>(Y_p)_{p\geq 0}</math> is also a Markov chain with transition matrix <math>Q</math>.
+</li>
+</ul>
+|For <math>(i)</math>, we have
+<math display="block">
+\E[f(X_{n++})\mid X_0,X_1,...,X_n]=\sum_{y\in E}Q(X_n,y)f(y)=Qf(X_n).
+</math>
+Now if <math>\{i_1,...,i_n\}\subset \{ 0,1,...,n-1\}</math>, then{{efn|Recall that if <math>X\in(E,\mathcal{E})</math>, <math>Y\in(F,\F)</math>, we call the conditional distribution of <math>Y</math> given <math>X</math> any transition kernel <math>\nu:E\to F</math> such that for all measurable maps <math>h:(F;\F)\to \R_+</math>, <math>\E[h(y)\mid X]=\int_\R h(y)\nu(X,dy)</math>.}}
+<math display="block">
+\begin{multline*}
+\E[f(X_{n+1})\mid X_{i_1},...,X_{i_n},X_n]=\E[\E[f(X_{n+1})\mid X_0,X_1,...,X_n]\mid X_{i_1,...,X_{i_n}},X_n]\\=\E[Qf(X_n)\mid X_{i_1},...,X_{i_n},X_n]=Qf(X_n).
+\end{multline*}
+</math>
+For <math>(ii)</math>, note that it follows immediately from <math>([[#markov |markov]])</math> that
+<math display="block">
+\p[X_{n+1}=y_1,...,X_{n-p}=y_p\mid X_0=x_0,...,X_n=x_n]=Q(x_n,y_1)\prod_{j=1}^{p-1}Q(y_j,y_{j+1}).
+</math>
+The formula for <math>\p[X_{n-p}=y_p\mid X_n]</math> follows for <math>y_1,...,y_{p-1}\in E</math>. Finally we note that
+<math display="block">
+\p[Y_0=,Y_1=y_1,...,Y_p=y_p]=\p[X_n=y_n]\prod_{j=0}^pQ(y_{j-1},y_j)
+</math>
+and we can apply proposition 15.1.}}
+'''Example'''
+Let <math>(\Omega,\F,\p)</math> be a probability space. Let <math>(X_n)_{n\geq 0}</math> be a sequence of independent r.v.'s with values in <math>E</math>, with the same distribution <math>\mu</math>. Then <math>(X_n)_{n\geq 0}</math> is a Markov chain with transition matrix
+<math display="block">
+Q(x,y)=\mu(y)
+</math>
+for all <math>x,y\in E</math>.
+'''Example'''
+Let <math>(\Omega,\F,\p)</math> be a probability space. Let <math>\eta,\xi_1,...,\xi_n,...</math> be independent r.v.'s in <math>\mathbb{Z}^d</math>. We assume that <math>\xi_1,...\xi_n,...</math> have the same distribution <math>\mu</math>. For <math>n\geq 0</math> we write
+<math display="block">
+X_n=\eta+\xi_1+\dotsm+\xi_n.
+</math>
+Then <math>(X_n)_{n\geq 0}</math> is a Markov chain with transition matrix
+<math display="block">
+Q(x,y)=\mu(y-x)
+</math>
+for all <math>x,y\in E</math>. Indeed,
+<math display="block">
+\begin{multline*}
+\p[X_{n+1}=y\mid X_0=x_0,...,X_n=x_n]=\p[\xi_{n+1}=y-x_n\mid X_0=x_0,...,X_n=x_n]\\=\p[\xi_{n+1}=y-x_n]=\mu(y-x_n).
+\end{multline*}
+</math>
+If <math>(e_1,...,e_d)</math> is the canonical basis of <math>\R^d</math>, and if <math>\mu(e_i)=\mu(-e_i)=\frac{1}{2d}</math> for all <math>i\in\{1,...,d\}</math>, then the Markov chain is called a simple random walk on <math>\mathbb{Z}^d</math>.
+'''Example'''
+Let <math>\mathcal{P}_2(E)</math> be the subsets of <math>E</math> with two elements and let <math>A\subset\mathcal{P}_2(E)</math>. For <math>x\in E</math>, we note
+<math display="block">
+A_x=\{y\in E\mid \{x,y\}\in A\}.
+</math>
+We assume that <math>A_x\not=\varnothing</math> for all <math>x\in E</math>. We then define a transition matrix on <math>E</math> by setting for <math>x,y\in E</math>
+<math display="block">
+Q(x,y)=\begin{cases}\frac{1}{\vert A_x\vert},&\text{if $\{x,y\}\in A$}\\ 0,&\text{otherwise}\end{cases}
+</math>
+A Markov chain with transition matrix <math>Q</math> is called a simple random walk on the graph <math>(E,A)</math>.
+==General references==
+{{cite arXiv|last=Moshayedi|first=Nima|year=2020|title=Lectures on Probability Theory|eprint=2010.16280|class=math.PR}}
+==Notes==
+{{notelist}}