guide:D9112ff950: Difference between revisions

From Stochiki
No edit summary
 
mNo edit summary
 
(One intermediate revision by one other user not shown)
Line 1: Line 1:
<div class="d-none"><math>
\newcommand{\R}{\mathbb{R}}
\newcommand{\A}{\mathcal{A}}
\newcommand{\B}{\mathcal{B}}
\newcommand{\N}{\mathbb{N}}
\newcommand{\C}{\mathbb{C}}
\newcommand{\Rbar}{\overline{\mathbb{R}}}
\newcommand{\Bbar}{\overline{\mathcal{B}}}
\newcommand{\Q}{\mathbb{Q}}
\newcommand{\E}{\mathbb{E}}
\newcommand{\p}{\mathbb{P}}
\newcommand{\one}{\mathds{1}}
\newcommand{\0}{\mathcal{O}}
\newcommand{\mat}{\textnormal{Mat}}
\newcommand{\sign}{\textnormal{sign}}
\newcommand{\CP}{\mathcal{P}}
\newcommand{\CT}{\mathcal{T}}
\newcommand{\CY}{\mathcal{Y}}
\newcommand{\F}{\mathcal{F}}
\newcommand{\mathds}{\mathbb}</math></div>
We start with the existence of a Markov chain associated with a given transition matrix.
{{proofcard|Proposition|prop2|Let <math>(\Omega,\F,\p)</math> be a probability space. Let <math>Q</math> be a stochastic matrix on <math>E</math>. There exists a probability space <math>(\Omega',\F',\p')</math> on which there exists, for all <math>x\in E</math>, a stochastic process <math>(X_n^x)_{n\geq 0}</math> which is a Markov chain with transition matrix <math>Q</math>, starting from <math>X_0^x=x</math>.


|We can take <math>\Omega'=[0,1)</math>, with the Borel <math>\sigma</math>-Algebra and the Lebesgue measure. For <math>\omega\in[0,1)</math>, we set
<math display="block">
\omega=\sum_{n=0}^\infty \mathcal{E}_n(\omega)2^{-n-1},
</math>
with <math>\mathcal{E}_n(\omega)\in\{0,1\}</math>. We can further take the sequence <math>(\mathcal{E}_n)_{n\geq 0}</math> of iid r.v.'s with <math>\p[\mathcal{E}_n=1]=\p[\mathcal{E}_n=0]=\frac{1}{2}</math>. If <math>\varphi:\N\times\N\to \N</math> is a bijection, then the r.v.'s <math>\eta_{i,j}=\mathcal{E}_{\varphi(i,j)}</math> for <math>i,j\in\N</math> are also iid. Now define
<math display="block">
U_i=\sum_{j=0}^\infty\eta_{i,j}2^{-j-1}.
</math>
Therefore <math>U_0,U_1,U_2,...</math> are iid. Denote <math>E=\{y_1,y_2,...,y_k,...\}</math> and fix <math>x\in E</math>. We set <math>X_0^x=x</math> and <math>X_1^x=y_k</math> if
<math display="block">
\sum_{i\leq  j < k}Q(x,y_j) < U_1\leq  \sum_{1\leq  j\leq  k}Q(x,y_j),
</math>
such that <math>\p[X_1^x=y]=Q(x,y)</math> for all <math>y\in E</math>. We then proceed by induction <math>X_{n+1}^x=y_k</math> if
<math display="block">
\sum_{1\leq  j < k} Q(X_n^x,y_j) < U_{n+1}\leq  \sum_{1\leq  j\leq  k}Q(X_n^x,y_j).
</math>
Using the independence of the <math>U_i</math>'s, we check that for all <math>k\geq 1</math>, we have
<math display="block">
\begin{multline*}
\p[X_{n+1}^x=y_k\mid X_0^x=x,X_1^x=x_1,...,X_n^x=x_n]\\=\p\left[\sum_{1\leq  j < k}Q(x_n,y_j) < U_{n+1}\leq  \sum_{1\leq  j\leq  k}Q(x_n,y_j)\mid X_0^x=x,...,X_n^x=x_n\right]=Q(x_n,y_k).
\end{multline*}
</math>}}
In the sequel, we shall take <math>\Omega=E^\N</math>. An element <math>\omega\in \Omega</math> is a sequence <math>\omega=(\omega_0,\omega_1,...)</math> of elements in <math>E</math>. Define the coordinate map
<math display="block">
X_n(\omega)=\omega_n.
</math>
We take <math>\F</math> to be the smallest <math>\sigma</math>-Algebra on <math>\Omega</math> which make all <math>X_n</math>'s measurable. This <math>\sigma</math>-Algebra is generated by
<math display="block">
C=\{\omega\in\Omega\mid \omega_0=x_0,\omega_1=x_1,...,\omega_n=x_n\}
</math>
for <math>n\in\N</math> and <math>x_0,...,x_n\in E</math>.
{{proofcard|Lemma|lem2|Let <math>(G,\mathcal{G})</math> be a measurable space and let <math>\psi:G\to\Omega</math> be a map. Then <math>\psi</math> is measurable if and only if for all <math>n\geq 0</math>, <math>X_n\circ\psi</math> is measurable.
|We only need to show that if <math>X_n\circ \psi</math> is measurable for all <math>n\geq 0</math>, then <math>\psi</math> is measurable as well. Consider the <math>\sigma</math>-Algebra <math>\A</math> on <math>\Omega</math> given by
<math display="block">
\A:=\{A\in\F\mid \psi^{-1}(A)\in\mathcal{G}\},
</math>
which contains all the sets of the form <math>X_n^{-1}(y)</math> for <math>y\in E</math>. Hence all the <math>X_n</math>'s are measurable with respect to <math>\A</math> and the claim follows.}}
{{proofcard|Theorem|thm-1|Let <math>(\Omega,\F,\p)</math> be a probability space. Let <math>Q</math> be a stochastic matrix on <math>E</math>. For all <math>x\in E</math>, there exists a unique probability measure <math>\p_x</math>, on <math>\Omega=E^\N</math>, such that the sequence of coordinate functions <math>(X_n)_{n\geq 0}</math> is a Markov chain with transition matrix <math>Q</math> and <math>\p_x[X_0=x]=1</math>.
|From [[#prop2 |proposition]], there exists <math>(\Omega',\F',\p')</math> and <math>(X_n^x)_{n\geq 0}</math> which is a Markov chain with transition matrix <math>Q</math> and such that <math>X_0^x=x</math>. We then define <math>\p_x</math> as the image of <math>\p'</math> by the map
<math display="block">
\begin{align*}
\Omega'&\longrightarrow\Omega\\
\omega'&\longmapsto (X_n^x(\omega'))_{n\geq 0}
\end{align*}
</math>
This map is measurable because of [[#lem2 |lemma]]. We have <math>\p_x[X_0=x]=\p'[X_0^x=x]=1</math> and for <math>x_0,x_1,...,x_n\in E</math> we have
<math display="block">
\begin{multline*}
\p_x[X_0=x_0,X_1=x_1,...,X_n=x_n]=\p'[X_0^x=x_0,X_1^x=x_1,...,X_n^x=x_n]\\=\p'[X_0^x=x_0]\prod_{j=1}^nQ(x_{j-1},x_j)=\p_x[X_0=x_0]\prod_{j=1}Q(x_{j-1},x_j)
\end{multline*}
</math>
Therefore it follows that the sequence of the coordinate functions is a Markov chain. For uniqueness, we note that if <math>\p'_x</math> is another such probability measure, <math>\p_x</math> and <math>\p_x'</math> coincide on cylinders. It then follows from an application of the monotone class theorem that <math>\p_x=\p_x'</math>.}}
{{alert-info |
It follows for all <math>n\geq 0</math> and for all <math>x,y\in E</math>, that we have
<math display="block">
\p_x[X_n=y]=Q_n(x,y).
</math>
}}
{{alert-info |
If <math>\mu</math> is a probability measure on <math>E</math>, we write 
<math display="block">
\p_\mu:=\sum_{x\in E}\mu(x)\p_x,
</math>
which defines a new probability measure on <math>\Omega</math>. By writing an explicit formula for <math>\p_\mu[X_0=x_0,...,X_n=x_n]</math>, one immediately checks that under <math>\p_\mu</math> we get that <math>(X_n)_{n\geq 0}</math> is a Markov chain with transition matrix <math>Q</math>, where <math>X_0</math> has the distribution <math>\mu</math>.
}}
{{alert-info |
Let <math>(X'_n)_{n\geq 0}</math> be a Markov chain with transition matrix <math>Q</math> and initial distribution <math>\mu</math>. Then for all measurable subsets <math>B\subset\Omega=E^\N</math>, we have
<math display="block">
\p[(X'_n)_{n\geq 0}\in B]=\p_\mu[B].
</math>
This is only true when <math>B</math> is a cylinder as in the proof above. This shows that all results proven for the canonical Markov chains are true for a general Markov chain with the same transition matrix.  One of the advantages of using a canonical Markov chain is that one can use translation operators. For all <math>k\in\N</math>, we define
<math display="block">
\Theta_k:\Omega\to\Omega,(\omega_n)_{n\geq 0}\mapsto \Theta_k((\omega_n)_{n\geq 0})=(\omega_{k+n})_{n\geq 0}.
</math>
Lemma 16.2. shows that these applications are measurable. We write <math>\F_n=\sigma(X_0,...,X_n)</math> and we use the notation <math>\E_x</math> when we integrate with respect to <math>\p_x</math>.
}}
{{proofcard|Theorem (Simple Markov property)|thm5|Let <math>(\Omega,\F,\p)</math> be a probability space. Let <math>F</math> and <math>G</math> be two measurable, positive maps on <math>\Omega</math> and let <math>n\geq 0</math>. Assume that <math>F</math> is <math>\F_n</math>-measurable. Then for all <math>x\in E</math>,
<math display="block">
\E_x[FG\circ \Theta_n]=\E_x[F\E_{X_n}[G]].
</math>
Equivalently, we have
<math display="block">
\E_x[G\circ \Theta_n\mid \F_n]=\E_{X_n}[G].
</math>
We say that the conditional distribution of <math>\Theta_n(\omega)</math> given <math>(X_0,X_1,...,X_n)</math> is <math>\p_{X_n}</math>.
{{alert-info |
[[#thm5 |Theorem]] is also true if one replace <math>\E_x</math> with <math>\E_\mu</math>.
}}
|It is enough to prove the first statement. For this we can restrict ourselves to the case where <math>F=\one_{\{ X_0=x_0,X_1=x_1,...,X_n=x_n\}}</math> for <math>x_0,x_1,...,x_n\in E</math>. Let us first consider the case where <math>G=\one_{\{X_0=y_0,...,X_p=y_p\}}</math>. In this case, if <math>y\in E</math>, then
<math display="block">
\E_y[G]=\one_{\{ Y_0=y\}}\prod_{j=1}^pQ(y_{j-1},y_j)
</math>
and
<math display="block">
\begin{multline*}
\E_x[FG\circ \Theta_n]=\p_x[X_0=x_0,...,X_{n}=x_{n},X_n=y_0,X_{n+1}=y_{1},...,X_{n+p}=y_p]\\
=\one_{\{X_0=x\}}\prod_{j=1}^nQ(x_{j-1},x_j)\one_{\{y_0=y_n\}}\prod_{j=1}^pQ(y_{j-1},y_j)
\end{multline*}
</math>
Now it follows from the monotone class theorem that the above holds for <math>G=\one_A</math>, for <math>A\in\F_n</math> and thus we can conclude.}}
{{alert-info |
We would like to make sense of theorem 16.4. for <math>n</math> replaced by a stopping time <math>T</math>. Indeed let us assume that we want to  the problem of knowing whether a Markov chain starting from <math>x</math> visits back <math>x</math>. In other words, let us write
<math display="block">
N_x=\sum_{n=0}^\infty \one_{\{X_n=x\}}
</math>
and ask the question, whether we have that <math>\p_x[N_x=\infty]=1</math>. It is in fact enough to check that it comes back at least once at <math>x</math>. If <math>H_x=\inf\{n\geq 1\mid X_n=x\}</math>, with the convention <math>\inf\varnothing=\infty</math>, we have <math>\p_x[N_x=\infty]=1</math> if and only if <math>\p_x[H_x < \infty]=1</math>, which is trivial. If <math>\p_x[N_x < \infty]=1</math>, and if theorem 16.4. is true for stopping times, then <math>\theta_{H_x}(\omega)=\left(\omega_{H_x(\omega)=n}\right)_{n\geq 0}</math> has the law <math>\p_x</math>. But then, since <math>N_x(\omega)=1+N_x(\Theta_{H_x}(\omega))</math>, we see that <math>N_x</math> has the same distribution as <math>1+N_x</math> under <math>\p_x</math>. This is only possible if <math>N_x=\infty</math> a.s.
}}
{{proofcard|Theorem (Strong Markov property)|thm-2|Let <math>(\Omega,\F,(\F_n)_{n\geq 0},\p)</math> be a filtered probability space. Let <math>T</math> be a stopping time for the filtration <math>(\F_n)_{n\geq 0}</math>. Let <math>F</math> and <math>G</math> be two measurable and positive functions on <math>\Omega</math>. Assume further that <math>F</math> is <math>\F_T</math>-measurable. Then for all <math>x\in E</math>, we get
<math display="block">
\E_x[\one_{\{ T < \infty\}}FG\circ \Theta_T]=\E_x[\one_{\{ T < \infty\}}F\E_x[G]]
</math>
or equivalently
<math display="block">
\E_x[\one_{\{T < \infty\}}G\circ \Theta_T\mid \F_T]=\one_{\{ T < \infty\}}\E_{X_T}[G].
</math>
|For <math>n\geq 0</math>, we have
<math display="block">
\E_x[\one_{\{ T=n\}}FG\circ\Theta_T]=\E_x[\one_{\{ T=n\}}FG\circ \Theta_n]=\E_n[\one_{\{ T=n\}}\E_{X_n}[G]].
</math>
The last equality follows from a previous theorem after obtaining that <math>\one_{\{T=n\}}F</math> is <math>\F_n</math>-measurable. We then sum over <math>n</math>.}}
{{proofcard|Corollary|cor-1|Let <math>(\Omega,\F,(\F_n)_{n\geq 0},\p)</math> be a filtered probability space. Let <math>T</math> be a stopping time such that <math>\p_x[T < \infty]=1</math>. Let us assume that there exists <math>y\in E</math> such that <math>\p_x[X_T=y]=1</math>. Then under <math>\p_x</math>, <math>\Theta_T(\omega)</math> is independent of <math>\F_T</math> and has the law <math>\p_y</math>.
|We only have to note that
<math display="block">
\E_x[FG\circ \Theta_T(\omega)]=\E_x[F\E_{X_T}[G]]=\E_x[F\E_y[G]]=\E_x[F]\E_y[G].
</math>}}
==General references==
{{cite arXiv|last=Moshayedi|first=Nima|year=2020|title=Lectures on Probability Theory|eprint=2010.16280|class=math.PR}}

Latest revision as of 00:09, 9 May 2024

[math] \newcommand{\R}{\mathbb{R}} \newcommand{\A}{\mathcal{A}} \newcommand{\B}{\mathcal{B}} \newcommand{\N}{\mathbb{N}} \newcommand{\C}{\mathbb{C}} \newcommand{\Rbar}{\overline{\mathbb{R}}} \newcommand{\Bbar}{\overline{\mathcal{B}}} \newcommand{\Q}{\mathbb{Q}} \newcommand{\E}{\mathbb{E}} \newcommand{\p}{\mathbb{P}} \newcommand{\one}{\mathds{1}} \newcommand{\0}{\mathcal{O}} \newcommand{\mat}{\textnormal{Mat}} \newcommand{\sign}{\textnormal{sign}} \newcommand{\CP}{\mathcal{P}} \newcommand{\CT}{\mathcal{T}} \newcommand{\CY}{\mathcal{Y}} \newcommand{\F}{\mathcal{F}} \newcommand{\mathds}{\mathbb}[/math]

We start with the existence of a Markov chain associated with a given transition matrix.

Proposition

Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math]Q[/math] be a stochastic matrix on [math]E[/math]. There exists a probability space [math](\Omega',\F',\p')[/math] on which there exists, for all [math]x\in E[/math], a stochastic process [math](X_n^x)_{n\geq 0}[/math] which is a Markov chain with transition matrix [math]Q[/math], starting from [math]X_0^x=x[/math].


Show Proof

We can take [math]\Omega'=[0,1)[/math], with the Borel [math]\sigma[/math]-Algebra and the Lebesgue measure. For [math]\omega\in[0,1)[/math], we set

[[math]] \omega=\sum_{n=0}^\infty \mathcal{E}_n(\omega)2^{-n-1}, [[/math]]
with [math]\mathcal{E}_n(\omega)\in\{0,1\}[/math]. We can further take the sequence [math](\mathcal{E}_n)_{n\geq 0}[/math] of iid r.v.'s with [math]\p[\mathcal{E}_n=1]=\p[\mathcal{E}_n=0]=\frac{1}{2}[/math]. If [math]\varphi:\N\times\N\to \N[/math] is a bijection, then the r.v.'s [math]\eta_{i,j}=\mathcal{E}_{\varphi(i,j)}[/math] for [math]i,j\in\N[/math] are also iid. Now define

[[math]] U_i=\sum_{j=0}^\infty\eta_{i,j}2^{-j-1}. [[/math]]
Therefore [math]U_0,U_1,U_2,...[/math] are iid. Denote [math]E=\{y_1,y_2,...,y_k,...\}[/math] and fix [math]x\in E[/math]. We set [math]X_0^x=x[/math] and [math]X_1^x=y_k[/math] if

[[math]] \sum_{i\leq j \lt k}Q(x,y_j) \lt U_1\leq \sum_{1\leq j\leq k}Q(x,y_j), [[/math]]
such that [math]\p[X_1^x=y]=Q(x,y)[/math] for all [math]y\in E[/math]. We then proceed by induction [math]X_{n+1}^x=y_k[/math] if

[[math]] \sum_{1\leq j \lt k} Q(X_n^x,y_j) \lt U_{n+1}\leq \sum_{1\leq j\leq k}Q(X_n^x,y_j). [[/math]]
Using the independence of the [math]U_i[/math]'s, we check that for all [math]k\geq 1[/math], we have

[[math]] \begin{multline*} \p[X_{n+1}^x=y_k\mid X_0^x=x,X_1^x=x_1,...,X_n^x=x_n]\\=\p\left[\sum_{1\leq j \lt k}Q(x_n,y_j) \lt U_{n+1}\leq \sum_{1\leq j\leq k}Q(x_n,y_j)\mid X_0^x=x,...,X_n^x=x_n\right]=Q(x_n,y_k). \end{multline*} [[/math]]

In the sequel, we shall take [math]\Omega=E^\N[/math]. An element [math]\omega\in \Omega[/math] is a sequence [math]\omega=(\omega_0,\omega_1,...)[/math] of elements in [math]E[/math]. Define the coordinate map

[[math]] X_n(\omega)=\omega_n. [[/math]]

We take [math]\F[/math] to be the smallest [math]\sigma[/math]-Algebra on [math]\Omega[/math] which make all [math]X_n[/math]'s measurable. This [math]\sigma[/math]-Algebra is generated by

[[math]] C=\{\omega\in\Omega\mid \omega_0=x_0,\omega_1=x_1,...,\omega_n=x_n\} [[/math]]

for [math]n\in\N[/math] and [math]x_0,...,x_n\in E[/math].

Lemma

Let [math](G,\mathcal{G})[/math] be a measurable space and let [math]\psi:G\to\Omega[/math] be a map. Then [math]\psi[/math] is measurable if and only if for all [math]n\geq 0[/math], [math]X_n\circ\psi[/math] is measurable.


Show Proof

We only need to show that if [math]X_n\circ \psi[/math] is measurable for all [math]n\geq 0[/math], then [math]\psi[/math] is measurable as well. Consider the [math]\sigma[/math]-Algebra [math]\A[/math] on [math]\Omega[/math] given by

[[math]] \A:=\{A\in\F\mid \psi^{-1}(A)\in\mathcal{G}\}, [[/math]]
which contains all the sets of the form [math]X_n^{-1}(y)[/math] for [math]y\in E[/math]. Hence all the [math]X_n[/math]'s are measurable with respect to [math]\A[/math] and the claim follows.

Theorem

Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math]Q[/math] be a stochastic matrix on [math]E[/math]. For all [math]x\in E[/math], there exists a unique probability measure [math]\p_x[/math], on [math]\Omega=E^\N[/math], such that the sequence of coordinate functions [math](X_n)_{n\geq 0}[/math] is a Markov chain with transition matrix [math]Q[/math] and [math]\p_x[X_0=x]=1[/math].


Show Proof

From proposition, there exists [math](\Omega',\F',\p')[/math] and [math](X_n^x)_{n\geq 0}[/math] which is a Markov chain with transition matrix [math]Q[/math] and such that [math]X_0^x=x[/math]. We then define [math]\p_x[/math] as the image of [math]\p'[/math] by the map

[[math]] \begin{align*} \Omega'&\longrightarrow\Omega\\ \omega'&\longmapsto (X_n^x(\omega'))_{n\geq 0} \end{align*} [[/math]]


This map is measurable because of lemma. We have [math]\p_x[X_0=x]=\p'[X_0^x=x]=1[/math] and for [math]x_0,x_1,...,x_n\in E[/math] we have

[[math]] \begin{multline*} \p_x[X_0=x_0,X_1=x_1,...,X_n=x_n]=\p'[X_0^x=x_0,X_1^x=x_1,...,X_n^x=x_n]\\=\p'[X_0^x=x_0]\prod_{j=1}^nQ(x_{j-1},x_j)=\p_x[X_0=x_0]\prod_{j=1}Q(x_{j-1},x_j) \end{multline*} [[/math]]


Therefore it follows that the sequence of the coordinate functions is a Markov chain. For uniqueness, we note that if [math]\p'_x[/math] is another such probability measure, [math]\p_x[/math] and [math]\p_x'[/math] coincide on cylinders. It then follows from an application of the monotone class theorem that [math]\p_x=\p_x'[/math].

It follows for all [math]n\geq 0[/math] and for all [math]x,y\in E[/math], that we have

[[math]] \p_x[X_n=y]=Q_n(x,y). [[/math]]

If [math]\mu[/math] is a probability measure on [math]E[/math], we write

[[math]] \p_\mu:=\sum_{x\in E}\mu(x)\p_x, [[/math]]
which defines a new probability measure on [math]\Omega[/math]. By writing an explicit formula for [math]\p_\mu[X_0=x_0,...,X_n=x_n][/math], one immediately checks that under [math]\p_\mu[/math] we get that [math](X_n)_{n\geq 0}[/math] is a Markov chain with transition matrix [math]Q[/math], where [math]X_0[/math] has the distribution [math]\mu[/math].

Let [math](X'_n)_{n\geq 0}[/math] be a Markov chain with transition matrix [math]Q[/math] and initial distribution [math]\mu[/math]. Then for all measurable subsets [math]B\subset\Omega=E^\N[/math], we have

[[math]] \p[(X'_n)_{n\geq 0}\in B]=\p_\mu[B]. [[/math]]
This is only true when [math]B[/math] is a cylinder as in the proof above. This shows that all results proven for the canonical Markov chains are true for a general Markov chain with the same transition matrix. One of the advantages of using a canonical Markov chain is that one can use translation operators. For all [math]k\in\N[/math], we define

[[math]] \Theta_k:\Omega\to\Omega,(\omega_n)_{n\geq 0}\mapsto \Theta_k((\omega_n)_{n\geq 0})=(\omega_{k+n})_{n\geq 0}. [[/math]]
Lemma 16.2. shows that these applications are measurable. We write [math]\F_n=\sigma(X_0,...,X_n)[/math] and we use the notation [math]\E_x[/math] when we integrate with respect to [math]\p_x[/math].

Theorem (Simple Markov property)

Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math]F[/math] and [math]G[/math] be two measurable, positive maps on [math]\Omega[/math] and let [math]n\geq 0[/math]. Assume that [math]F[/math] is [math]\F_n[/math]-measurable. Then for all [math]x\in E[/math],

[[math]] \E_x[FG\circ \Theta_n]=\E_x[F\E_{X_n}[G]]. [[/math]]
Equivalently, we have

[[math]] \E_x[G\circ \Theta_n\mid \F_n]=\E_{X_n}[G]. [[/math]]
We say that the conditional distribution of [math]\Theta_n(\omega)[/math] given [math](X_0,X_1,...,X_n)[/math] is [math]\p_{X_n}[/math].

Theorem is also true if one replace [math]\E_x[/math] with [math]\E_\mu[/math].


Show Proof

It is enough to prove the first statement. For this we can restrict ourselves to the case where [math]F=\one_{\{ X_0=x_0,X_1=x_1,...,X_n=x_n\}}[/math] for [math]x_0,x_1,...,x_n\in E[/math]. Let us first consider the case where [math]G=\one_{\{X_0=y_0,...,X_p=y_p\}}[/math]. In this case, if [math]y\in E[/math], then

[[math]] \E_y[G]=\one_{\{ Y_0=y\}}\prod_{j=1}^pQ(y_{j-1},y_j) [[/math]]
and

[[math]] \begin{multline*} \E_x[FG\circ \Theta_n]=\p_x[X_0=x_0,...,X_{n}=x_{n},X_n=y_0,X_{n+1}=y_{1},...,X_{n+p}=y_p]\\ =\one_{\{X_0=x\}}\prod_{j=1}^nQ(x_{j-1},x_j)\one_{\{y_0=y_n\}}\prod_{j=1}^pQ(y_{j-1},y_j) \end{multline*} [[/math]]


Now it follows from the monotone class theorem that the above holds for [math]G=\one_A[/math], for [math]A\in\F_n[/math] and thus we can conclude.

We would like to make sense of theorem 16.4. for [math]n[/math] replaced by a stopping time [math]T[/math]. Indeed let us assume that we want to the problem of knowing whether a Markov chain starting from [math]x[/math] visits back [math]x[/math]. In other words, let us write

[[math]] N_x=\sum_{n=0}^\infty \one_{\{X_n=x\}} [[/math]]
and ask the question, whether we have that [math]\p_x[N_x=\infty]=1[/math]. It is in fact enough to check that it comes back at least once at [math]x[/math]. If [math]H_x=\inf\{n\geq 1\mid X_n=x\}[/math], with the convention [math]\inf\varnothing=\infty[/math], we have [math]\p_x[N_x=\infty]=1[/math] if and only if [math]\p_x[H_x \lt \infty]=1[/math], which is trivial. If [math]\p_x[N_x \lt \infty]=1[/math], and if theorem 16.4. is true for stopping times, then [math]\theta_{H_x}(\omega)=\left(\omega_{H_x(\omega)=n}\right)_{n\geq 0}[/math] has the law [math]\p_x[/math]. But then, since [math]N_x(\omega)=1+N_x(\Theta_{H_x}(\omega))[/math], we see that [math]N_x[/math] has the same distribution as [math]1+N_x[/math] under [math]\p_x[/math]. This is only possible if [math]N_x=\infty[/math] a.s.

Theorem (Strong Markov property)

Let [math](\Omega,\F,(\F_n)_{n\geq 0},\p)[/math] be a filtered probability space. Let [math]T[/math] be a stopping time for the filtration [math](\F_n)_{n\geq 0}[/math]. Let [math]F[/math] and [math]G[/math] be two measurable and positive functions on [math]\Omega[/math]. Assume further that [math]F[/math] is [math]\F_T[/math]-measurable. Then for all [math]x\in E[/math], we get

[[math]] \E_x[\one_{\{ T \lt \infty\}}FG\circ \Theta_T]=\E_x[\one_{\{ T \lt \infty\}}F\E_x[G]] [[/math]]
or equivalently

[[math]] \E_x[\one_{\{T \lt \infty\}}G\circ \Theta_T\mid \F_T]=\one_{\{ T \lt \infty\}}\E_{X_T}[G]. [[/math]]


Show Proof

For [math]n\geq 0[/math], we have

[[math]] \E_x[\one_{\{ T=n\}}FG\circ\Theta_T]=\E_x[\one_{\{ T=n\}}FG\circ \Theta_n]=\E_n[\one_{\{ T=n\}}\E_{X_n}[G]]. [[/math]]
The last equality follows from a previous theorem after obtaining that [math]\one_{\{T=n\}}F[/math] is [math]\F_n[/math]-measurable. We then sum over [math]n[/math].

Corollary

Let [math](\Omega,\F,(\F_n)_{n\geq 0},\p)[/math] be a filtered probability space. Let [math]T[/math] be a stopping time such that [math]\p_x[T \lt \infty]=1[/math]. Let us assume that there exists [math]y\in E[/math] such that [math]\p_x[X_T=y]=1[/math]. Then under [math]\p_x[/math], [math]\Theta_T(\omega)[/math] is independent of [math]\F_T[/math] and has the law [math]\p_y[/math].


Show Proof

We only have to note that

[[math]] \E_x[FG\circ \Theta_T(\omega)]=\E_x[F\E_{X_T}[G]]=\E_x[F\E_y[G]]=\E_x[F]\E_y[G]. [[/math]]

General references

Moshayedi, Nima (2020). "Lectures on Probability Theory". arXiv:2010.16280 [math.PR].