guide:D9112ff950: Difference between revisions

Revision as of 01:53, 8 May 2024

We start with the existence of a Markov chain associated with a given transition matrix.

Proposition

Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math]Q[/math] be a stochastic matrix on [math]E[/math]. There exists a probability space [math](\Omega',\F',\p')[/math] on which there exists, for all [math]x\in E[/math], a stochastic process [math](X_n^x)_{n\geq 0}[/math] which is a Markov chain with transition matrix [math]Q[/math], starting from [math]X_0^x=x[/math].

Show Proof

We can take [math]\Omega'=[0,1)[/math], with the Borel [math]\sigma[/math]-Algebra and the Lebesgue measure. For [math]\omega\in[0,1)[/math], we set

[[math]] \omega=\sum_{n=0}^\infty \mathcal{E}_n(\omega)2^{-n-1}, [[/math]]

with [math]\mathcal{E}_n(\omega)\in\{0,1\}[/math]. We can further take the sequence [math](\mathcal{E}_n)_{n\geq 0}[/math] of iid r.v.'s with [math]\p[\mathcal{E}_n=1]=\p[\mathcal{E}_n=0]=\frac{1}{2}[/math]. If [math]\varphi:\N\times\N\to \N[/math] is a bijection, then the r.v.'s [math]\eta_{i,j}=\mathcal{E}_{\varphi(i,j)}[/math] for [math]i,j\in\N[/math] are also iid. Now define

[[math]] U_i=\sum_{j=0}^\infty\eta_{i,j}2^{-j-1}. [[/math]]

Therefore [math]U_0,U_1,U_2,...[/math] are iid. Denote [math]E=\{y_1,y_2,...,y_k,...\}[/math] and fix [math]x\in E[/math]. We set [math]X_0^x=x[/math] and [math]X_1^x=y_k[/math] if

[[math]] \sum_{i\leq j \lt k}Q(x,y_j) \lt U_1\leq \sum_{1\leq j\leq k}Q(x,y_j), [[/math]]

such that [math]\p[X_1^x=y]=Q(x,y)[/math] for all [math]y\in E[/math]. We then proceed by induction [math]X_{n+1}^x=y_k[/math] if

[[math]] \sum_{1\leq j \lt k} Q(X_n^x,y_j) \lt U_{n+1}\leq \sum_{1\leq j\leq k}Q(X_n^x,y_j). [[/math]]

Using the independence of the [math]U_i[/math]'s, we check that for all [math]k\geq 1[/math], we have

[[math]] \begin{multline*} \p[X_{n+1}^x=y_k\mid X_0^x=x,X_1^x=x_1,...,X_n^x=x_n]\\=\p\left[\sum_{1\leq j \lt k}Q(x_n,y_j) \lt U_{n+1}\leq \sum_{1\leq j\leq k}Q(x_n,y_j)\mid X_0^x=x,...,X_n^x=x_n\right]=Q(x_n,y_k). \end{multline*} [[/math]]

■

In the sequel, we shall take [math]\Omega=E^\N[/math]. An element [math]\omega\in \Omega[/math] is a sequence [math]\omega=(\omega_0,\omega_1,...)[/math] of elements in [math]E[/math]. Define the coordinate map

[[math]] X_n(\omega)=\omega_n. [[/math]]

We take [math]\F[/math] to be the smallest [math]\sigma[/math]-Algebra on [math]\Omega[/math] which make all [math]X_n[/math]'s measurable. This [math]\sigma[/math]-Algebra is generated by

[[math]] C=\{\omega\in\Omega\mid \omega_0=x_0,\omega_1=x_1,...,\omega_n=x_n\} [[/math]]

for [math]n\in\N[/math] and [math]x_0,...,x_n\in E[/math].

Lemma

Let [math](G,\mathcal{G})[/math] be a measurable space and let [math]\psi:G\to\Omega[/math] be a map. Then [math]\psi[/math] is measurable if and only if for all [math]n\geq 0[/math], [math]X_n\circ\psi[/math] is measurable.

Show Proof

We only need to show that if [math]X_n\circ \psi[/math] is measurable for all [math]n\geq 0[/math], then [math]\psi[/math] is measurable as well. Consider the [math]\sigma[/math]-Algebra [math]\A[/math] on [math]\Omega[/math] given by

[[math]] \A:=\{A\in\F\mid \psi^{-1}(A)\in\mathcal{G}\}, [[/math]]

which contains all the sets of the form [math]X_n^{-1}(y)[/math] for [math]y\in E[/math]. Hence all the [math]X_n[/math]'s are measurable with respect to [math]\A[/math] and the claim follows.

■

Theorem

Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math]Q[/math] be a stochastic matrix on [math]E[/math]. For all [math]x\in E[/math], there exists a unique probability measure [math]\p_x[/math], on [math]\Omega=E^\N[/math], such that the sequence of coordinate functions [math](X_n)_{n\geq 0}[/math] is a Markov chain with transition matrix [math]Q[/math] and [math]\p_x[X_0=x]=1[/math].

Show Proof

From proposition, there exists [math](\Omega',\F',\p')[/math] and [math](X_n^x)_{n\geq 0}[/math] which is a Markov chain with transition matrix [math]Q[/math] and such that [math]X_0^x=x[/math]. We then define [math]\p_x[/math] as the image of [math]\p'[/math] by the map

[[math]] \begin{align*} \Omega'&\longrightarrow\Omega\\ \omega'&\longmapsto (X_n^x(\omega'))_{n\geq 0} \end{align*} [[/math]]

This map is measurable because of lemma. We have [math]\p_x[X_0=x]=\p'[X_0^x=x]=1[/math] and for [math]x_0,x_1,...,x_n\in E[/math] we have

[[math]] \begin{multline*} \p_x[X_0=x_0,X_1=x_1,...,X_n=x_n]=\p'[X_0^x=x_0,X_1^x=x_1,...,X_n^x=x_n]\\=\p'[X_0^x=x_0]\prod_{j=1}^nQ(x_{j-1},x_j)=\p_x[X_0=x_0]\prod_{j=1}Q(x_{j-1},x_j) \end{multline*} [[/math]]

Therefore it follows that the sequence of the coordinate functions is a Markov chain. For uniqueness, we note that if [math]\p'_x[/math] is another such probability measure, [math]\p_x[/math] and [math]\p_x'[/math] coincide on cylinders. It then follows from an application of the monotone class theorem that [math]\p_x=\p_x'[/math].

■

It follows for all [math]n\geq 0[/math] and for all [math]x,y\in E[/math], that we have

[[math]] \p_x[X_n=y]=Q_n(x,y). [[/math]]

If [math]\mu[/math] is a probability measure on [math]E[/math], we write

[[math]] \p_\mu:=\sum_{x\in E}\mu(x)\p_x, [[/math]]

which defines a new probability measure on [math]\Omega[/math]. By writing an explicit formula for [math]\p_\mu[X_0=x_0,...,X_n=x_n][/math], one immediately checks that under [math]\p_\mu[/math] we get that [math](X_n)_{n\geq 0}[/math] is a Markov chain with transition matrix [math]Q[/math], where [math]X_0[/math] has the distribution [math]\mu[/math].

Let [math](X'_n)_{n\geq 0}[/math] be a Markov chain with transition matrix [math]Q[/math] and initial distribution [math]\mu[/math]. Then for all measurable subsets [math]B\subset\Omega=E^\N[/math], we have

[[math]] \p[(X'_n)_{n\geq 0}\in B]=\p_\mu[B]. [[/math]]

This is only true when [math]B[/math] is a cylinder as in the proof above. This shows that all results proven for the canonical Markov chains are true for a general Markov chain with the same transition matrix. One of the advantages of using a canonical Markov chain is that one can use translation operators. For all [math]k\in\N[/math], we define

[[math]] \Theta_k:\Omega\to\Omega,(\omega_n)_{n\geq 0}\mapsto \Theta_k((\omega_n)_{n\geq 0})=(\omega_{k+n})_{n\geq 0}. [[/math]]

Lemma 16.2. shows that these applications are measurable. We write [math]\F_n=\sigma(X_0,...,X_n)[/math] and we use the notation [math]\E_x[/math] when we integrate with respect to [math]\p_x[/math].

Theorem (Simple Markov property)

Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math]F[/math] and [math]G[/math] be two measurable, positive maps on [math]\Omega[/math] and let [math]n\geq 0[/math]. Assume that [math]F[/math] is [math]\F_n[/math]-measurable. Then for all [math]x\in E[/math],

[[math]] \E_x[FG\circ \Theta_n]=\E_x[F\E_{X_n}[G]]. [[/math]]

Equivalently, we have

[[math]] \E_x[G\circ \Theta_n\mid \F_n]=\E_{X_n}[G]. [[/math]]

We say that the conditional distribution of [math]\Theta_n(\omega)[/math] given [math](X_0,X_1,...,X_n)[/math] is [math]\p_{X_n}[/math].

Theorem is also true if one replace [math]\E_x[/math] with [math]\E_\mu[/math].

Show Proof

[Proof of Theorem] It is enough to prove the first statement. For this we can restrict ourselves to the case where [math]F=\one_{\{ X_0=x_0,X_1=x_1,...,X_n=x_n\}}[/math] for [math]x_0,x_1,...,x_n\in E[/math]. Let us first consider the case where [math]G=\one_{\{X_0=y_0,...,X_p=y_p\}}[/math]. In this case, if [math]y\in E[/math], then

[[math]] \E_y[G]=\one_{\{ Y_0=y\}}\prod_{j=1}^pQ(y_{j-1},y_j) [[/math]]

and

[[math]] \begin{multline*} \E_x[FG\circ \Theta_n]=\p_x[X_0=x_0,...,X_{n}=x_{n},X_n=y_0,X_{n+1}=y_{1},...,X_{n+p}=y_p]\\ =\one_{\{X_0=x\}}\prod_{j=1}^nQ(x_{j-1},x_j)\one_{\{y_0=y_n\}}\prod_{j=1}^pQ(y_{j-1},y_j) \end{multline*} [[/math]]

Now it follows from the monotone class theorem that the above holds for [math]G=\one_A[/math], for [math]A\in\F_n[/math] and thus we can conclude.

■

We would like to make sense of theorem 16.4. for [math]n[/math] replaced by a stopping time [math]T[/math]. Indeed let us assume that we want to the problem of knowing whether a Markov chain starting from [math]x[/math] visits back [math]x[/math]. In other words, let us write

[[math]] N_x=\sum_{n=0}^\infty \one_{\{X_n=x\}} [[/math]]

and ask the question, whether we have that [math]\p_x[N_x=\infty]=1[/math]. It is in fact enough to check that it comes back at least once at [math]x[/math]. If [math]H_x=\inf\{n\geq 1\mid X_n=x\}[/math], with the convention [math]\inf\varnothing=\infty[/math], we have [math]\p_x[N_x=\infty]=1[/math] if and only if [math]\p_x[H_x \lt \infty]=1[/math], which is trivial. If [math]\p_x[N_x \lt \infty]=1[/math], and if theorem 16.4. is true for stopping times, then [math]\theta_{H_x}(\omega)=\left(\omega_{H_x(\omega)=n}\right)_{n\geq 0}[/math] has the law [math]\p_x[/math]. But then, since [math]N_x(\omega)=1+N_x(\Theta_{H_x}(\omega))[/math], we see that [math]N_x[/math] has the same distribution as [math]1+N_x[/math] under [math]\p_x[/math]. This is only possible if [math]N_x=\infty[/math] a.s.

Theorem (Strong Markov property)

Let [math](\Omega,\F,(\F_n)_{n\geq 0},\p)[/math] be a filtered probability space. Let [math]T[/math] be a stopping time for the filtration [math](\F_n)_{n\geq 0}[/math]. Let [math]F[/math] and [math]G[/math] be two measurable and positive functions on [math]\Omega[/math]. Assume further that [math]F[/math] is [math]\F_T[/math]-measurable. Then for all [math]x\in E[/math], we get

[[math]] \E_x[\one_{\{ T \lt \infty\}}FG\circ \Theta_T]=\E_x[\one_{\{ T \lt \infty\}}F\E_x[G]] [[/math]]

or equivalently

[[math]] \E_x[\one_{\{T \lt \infty\}}G\circ \Theta_T\mid \F_T]=\one_{\{ T \lt \infty\}}\E_{X_T}[G]. [[/math]]

Show Proof

For [math]n\geq 0[/math], we have

[[math]] \E_x[\one_{\{ T=n\}}FG\circ\Theta_T]=\E_x[\one_{\{ T=n\}}FG\circ \Theta_n]=\E_n[\one_{\{ T=n\}}\E_{X_n}[G]]. [[/math]]

The last equality follows from a previous theorem after obtaining that [math]\one_{\{T=n\}}F[/math] is [math]\F_n[/math]-measurable. We then sum over [math]n[/math].

■

Corollary

Let [math](\Omega,\F,(\F_n)_{n\geq 0},\p)[/math] be a filtered probability space. Let [math]T[/math] be a stopping time such that [math]\p_x[T \lt \infty]=1[/math]. Let us assume that there exists [math]y\in E[/math] such that [math]\p_x[X_T=y]=1[/math]. Then under [math]\p_x[/math], [math]\Theta_T(\omega)[/math] is independent of [math]\F_T[/math] and has the law [math]\p_y[/math].

Show Proof

We only have to note that

[[math]] \E_x[FG\circ \Theta_T(\omega)]=\E_x[F\E_{X_T}[G]]=\E_x[F\E_y[G]]=\E_x[F]\E_y[G]. [[/math]]

■

General references

Moshayedi, Nima (2020). "Lectures on Probability Theory". arXiv:2010.16280 [math.PR].

@@ Line 1: / Line 1: @@
+<div class="d-none"><math>
+\newcommand{\R}{\mathbb{R}}
+\newcommand{\A}{\mathcal{A}}
+\newcommand{\B}{\mathcal{B}}
+\newcommand{\N}{\mathbb{N}}
+\newcommand{\C}{\mathbb{C}}
+\newcommand{\Rbar}{\overline{\mathbb{R}}}
+\newcommand{\Bbar}{\overline{\mathcal{B}}}
+\newcommand{\Q}{\mathbb{Q}}
+\newcommand{\E}{\mathbb{E}}
+\newcommand{\p}{\mathbb{P}}
+\newcommand{\one}{\mathds{1}}
+\newcommand{\0}{\mathcal{O}}
+\newcommand{\mat}{\textnormal{Mat}}
+\newcommand{\sign}{\textnormal{sign}}
+\newcommand{\CP}{\mathcal{P}}
+\newcommand{\CT}{\mathcal{T}}
+\newcommand{\CY}{\mathcal{Y}}
+\newcommand{\F}{\mathcal{F}}
+\newcommand{\mathds}{\mathbb}</math></div>
+We start with the existence of a Markov chain associated with a given transition matrix.
+{{proofcard|Proposition|prop2|Let <math>(\Omega,\F,\p)</math> be a probability space. Let <math>Q</math> be a stochastic matrix on <math>E</math>. There exists a probability space <math>(\Omega',\F',\p')</math> on which there exists, for all <math>x\in E</math>, a stochastic process <math>(X_n^x)_{n\geq 0}</math> which is a Markov chain with transition matrix <math>Q</math>, starting from <math>X_0^x=x</math>.
+|We can take <math>\Omega'=[0,1)</math>, with the Borel <math>\sigma</math>-Algebra and the Lebesgue measure. For <math>\omega\in[0,1)</math>, we set
+<math display="block">
+\omega=\sum_{n=0}^\infty \mathcal{E}_n(\omega)2^{-n-1},
+</math>
+with <math>\mathcal{E}_n(\omega)\in\{0,1\}</math>. We can further take the sequence <math>(\mathcal{E}_n)_{n\geq 0}</math> of iid r.v.'s with <math>\p[\mathcal{E}_n=1]=\p[\mathcal{E}_n=0]=\frac{1}{2}</math>. If <math>\varphi:\N\times\N\to \N</math> is a bijection, then the r.v.'s <math>\eta_{i,j}=\mathcal{E}_{\varphi(i,j)}</math> for <math>i,j\in\N</math> are also iid. Now define
+<math display="block">
+U_i=\sum_{j=0}^\infty\eta_{i,j}2^{-j-1}.
+</math>
+Therefore <math>U_0,U_1,U_2,...</math> are iid. Denote <math>E=\{y_1,y_2,...,y_k,...\}</math> and fix <math>x\in E</math>. We set <math>X_0^x=x</math> and <math>X_1^x=y_k</math> if
+<math display="block">
+\sum_{i\leq  j < k}Q(x,y_j) < U_1\leq  \sum_{1\leq  j\leq  k}Q(x,y_j),
+</math>
+such that <math>\p[X_1^x=y]=Q(x,y)</math> for all <math>y\in E</math>. We then proceed by induction <math>X_{n+1}^x=y_k</math> if
+<math display="block">
+\sum_{1\leq  j < k} Q(X_n^x,y_j) < U_{n+1}\leq  \sum_{1\leq  j\leq  k}Q(X_n^x,y_j).
+</math>
+Using the independence of the <math>U_i</math>'s, we check that for all <math>k\geq 1</math>, we have
+<math display="block">
+\begin{multline*}
+\p[X_{n+1}^x=y_k\mid X_0^x=x,X_1^x=x_1,...,X_n^x=x_n]\\=\p\left[\sum_{1\leq  j < k}Q(x_n,y_j) < U_{n+1}\leq  \sum_{1\leq  j\leq  k}Q(x_n,y_j)\mid X_0^x=x,...,X_n^x=x_n\right]=Q(x_n,y_k).
+\end{multline*}
+</math>}}
+In the sequel, we shall take <math>\Omega=E^\N</math>. An element <math>\omega\in \Omega</math> is a sequence <math>\omega=(\omega_0,\omega_1,...)</math> of elements in <math>E</math>. Define the coordinate map
+<math display="block">
+X_n(\omega)=\omega_n.
+</math>
+We take <math>\F</math> to be the smallest <math>\sigma</math>-Algebra on <math>\Omega</math> which make all <math>X_n</math>'s measurable. This <math>\sigma</math>-Algebra is generated by
+<math display="block">
+C=\{\omega\in\Omega\mid \omega_0=x_0,\omega_1=x_1,...,\omega_n=x_n\}
+</math>
+for <math>n\in\N</math> and <math>x_0,...,x_n\in E</math>.
+{{proofcard|Lemma|lem2|Let <math>(G,\mathcal{G})</math> be a measurable space and let <math>\psi:G\to\Omega</math> be a map. Then <math>\psi</math> is measurable if and only if for all <math>n\geq 0</math>, <math>X_n\circ\psi</math> is measurable.
+|We only need to show that if <math>X_n\circ \psi</math> is measurable for all <math>n\geq 0</math>, then <math>\psi</math> is measurable as well. Consider the <math>\sigma</math>-Algebra <math>\A</math> on <math>\Omega</math> given by
+<math display="block">
+\A:=\{A\in\F\mid \psi^{-1}(A)\in\mathcal{G}\},
+</math>
+which contains all the sets of the form <math>X_n^{-1}(y)</math> for <math>y\in E</math>. Hence all the <math>X_n</math>'s are measurable with respect to <math>\A</math> and the claim follows.}}
+{{proofcard|Theorem|thm-1|Let <math>(\Omega,\F,\p)</math> be a probability space. Let <math>Q</math> be a stochastic matrix on <math>E</math>. For all <math>x\in E</math>, there exists a unique probability measure <math>\p_x</math>, on <math>\Omega=E^\N</math>, such that the sequence of coordinate functions <math>(X_n)_{n\geq 0}</math> is a Markov chain with transition matrix <math>Q</math> and <math>\p_x[X_0=x]=1</math>.
+|From [[#prop2 |proposition]], there exists <math>(\Omega',\F',\p')</math> and <math>(X_n^x)_{n\geq 0}</math> which is a Markov chain with transition matrix <math>Q</math> and such that <math>X_0^x=x</math>. We then define <math>\p_x</math> as the image of <math>\p'</math> by the map
+<math display="block">
+\begin{align*}
+\Omega'&\longrightarrow\Omega\\
+\omega'&\longmapsto (X_n^x(\omega'))_{n\geq 0}
+\end{align*}
+</math>
+This map is measurable because of [[#lem2 |lemma]]. We have <math>\p_x[X_0=x]=\p'[X_0^x=x]=1</math> and for <math>x_0,x_1,...,x_n\in E</math> we have
+<math display="block">
+\begin{multline*}
+\p_x[X_0=x_0,X_1=x_1,...,X_n=x_n]=\p'[X_0^x=x_0,X_1^x=x_1,...,X_n^x=x_n]\\=\p'[X_0^x=x_0]\prod_{j=1}^nQ(x_{j-1},x_j)=\p_x[X_0=x_0]\prod_{j=1}Q(x_{j-1},x_j)
+\end{multline*}
+</math>
+Therefore it follows that the sequence of the coordinate functions is a Markov chain. For uniqueness, we note that if <math>\p'_x</math> is another such probability measure, <math>\p_x</math> and <math>\p_x'</math> coincide on cylinders. It then follows from an application of the monotone class theorem that <math>\p_x=\p_x'</math>.}}
+{{alert-info |
+It follows for all <math>n\geq 0</math> and for all <math>x,y\in E</math>, that we have
+<math display="block">
+\p_x[X_n=y]=Q_n(x,y).
+</math>
+}}
+{{alert-info |
+If <math>\mu</math> is a probability measure on <math>E</math>, we write
+<math display="block">
+\p_\mu:=\sum_{x\in E}\mu(x)\p_x,
+</math>
+which defines a new probability measure on <math>\Omega</math>. By writing an explicit formula for <math>\p_\mu[X_0=x_0,...,X_n=x_n]</math>, one immediately checks that under <math>\p_\mu</math> we get that <math>(X_n)_{n\geq 0}</math> is a Markov chain with transition matrix <math>Q</math>, where <math>X_0</math> has the distribution <math>\mu</math>.
+}}
+{{alert-info |
+Let <math>(X'_n)_{n\geq 0}</math> be a Markov chain with transition matrix <math>Q</math> and initial distribution <math>\mu</math>. Then for all measurable subsets <math>B\subset\Omega=E^\N</math>, we have
+<math display="block">
+\p[(X'_n)_{n\geq 0}\in B]=\p_\mu[B].
+</math>
+This is only true when <math>B</math> is a cylinder as in the proof above. This shows that all results proven for the canonical Markov chains are true for a general Markov chain with the same transition matrix.  One of the advantages of using a canonical Markov chain is that one can use translation operators. For all <math>k\in\N</math>, we define
+<math display="block">
+\Theta_k:\Omega\to\Omega,(\omega_n)_{n\geq 0}\mapsto \Theta_k((\omega_n)_{n\geq 0})=(\omega_{k+n})_{n\geq 0}.
+</math>
+Lemma 16.2. shows that these applications are measurable. We write <math>\F_n=\sigma(X_0,...,X_n)</math> and we use the notation <math>\E_x</math> when we integrate with respect to <math>\p_x</math>.
+}}
+{{proofcard|Theorem (Simple Markov property)|thm5|Let <math>(\Omega,\F,\p)</math> be a probability space. Let <math>F</math> and <math>G</math> be two measurable, positive maps on <math>\Omega</math> and let <math>n\geq 0</math>. Assume that <math>F</math> is <math>\F_n</math>-measurable. Then for all <math>x\in E</math>,
+<math display="block">
+\E_x[FG\circ \Theta_n]=\E_x[F\E_{X_n}[G]].
+</math>
+Equivalently, we have
+<math display="block">
+\E_x[G\circ \Theta_n\mid \F_n]=\E_{X_n}[G].
+</math>
+We say that the conditional distribution of <math>\Theta_n(\omega)</math> given <math>(X_0,X_1,...,X_n)</math> is <math>\p_{X_n}</math>.
+{{alert-info |
+[[#thm5 |Theorem]] is also true if one replace <math>\E_x</math> with <math>\E_\mu</math>.
+}}
+|[Proof of [[#thm5 |Theorem]]]
+It is enough to prove the first statement. For this we can restrict ourselves to the case where <math>F=\one_{\{ X_0=x_0,X_1=x_1,...,X_n=x_n\}}</math> for <math>x_0,x_1,...,x_n\in E</math>. Let us first consider the case where <math>G=\one_{\{X_0=y_0,...,X_p=y_p\}}</math>. In this case, if <math>y\in E</math>, then
+<math display="block">
+\E_y[G]=\one_{\{ Y_0=y\}}\prod_{j=1}^pQ(y_{j-1},y_j)
+</math>
+and
+<math display="block">
+\begin{multline*}
+\E_x[FG\circ \Theta_n]=\p_x[X_0=x_0,...,X_{n}=x_{n},X_n=y_0,X_{n+1}=y_{1},...,X_{n+p}=y_p]\\
+=\one_{\{X_0=x\}}\prod_{j=1}^nQ(x_{j-1},x_j)\one_{\{y_0=y_n\}}\prod_{j=1}^pQ(y_{j-1},y_j)
+\end{multline*}
+</math>
+Now it follows from the monotone class theorem that the above holds for <math>G=\one_A</math>, for <math>A\in\F_n</math> and thus we can conclude.}}
+{{alert-info |
+We would like to make sense of theorem 16.4. for <math>n</math> replaced by a stopping time <math>T</math>. Indeed let us assume that we want to  the problem of knowing whether a Markov chain starting from <math>x</math> visits back <math>x</math>. In other words, let us write
+<math display="block">
+N_x=\sum_{n=0}^\infty \one_{\{X_n=x\}}
+</math>
+and ask the question, whether we have that <math>\p_x[N_x=\infty]=1</math>. It is in fact enough to check that it comes back at least once at <math>x</math>. If <math>H_x=\inf\{n\geq 1\mid X_n=x\}</math>, with the convention <math>\inf\varnothing=\infty</math>, we have <math>\p_x[N_x=\infty]=1</math> if and only if <math>\p_x[H_x < \infty]=1</math>, which is trivial. If <math>\p_x[N_x < \infty]=1</math>, and if theorem 16.4. is true for stopping times, then <math>\theta_{H_x}(\omega)=\left(\omega_{H_x(\omega)=n}\right)_{n\geq 0}</math> has the law <math>\p_x</math>. But then, since <math>N_x(\omega)=1+N_x(\Theta_{H_x}(\omega))</math>, we see that <math>N_x</math> has the same distribution as <math>1+N_x</math> under <math>\p_x</math>. This is only possible if <math>N_x=\infty</math> a.s.
+}}
+{{proofcard|Theorem (Strong Markov property)|thm-2|Let <math>(\Omega,\F,(\F_n)_{n\geq 0},\p)</math> be a filtered probability space. Let <math>T</math> be a stopping time for the filtration <math>(\F_n)_{n\geq 0}</math>. Let <math>F</math> and <math>G</math> be two measurable and positive functions on <math>\Omega</math>. Assume further that <math>F</math> is <math>\F_T</math>-measurable. Then for all <math>x\in E</math>, we get
+<math display="block">
+\E_x[\one_{\{ T < \infty\}}FG\circ \Theta_T]=\E_x[\one_{\{ T < \infty\}}F\E_x[G]]
+</math>
+or equivalently
+<math display="block">
+\E_x[\one_{\{T < \infty\}}G\circ \Theta_T\mid \F_T]=\one_{\{ T < \infty\}}\E_{X_T}[G].
+</math>
+|For <math>n\geq 0</math>, we have
+<math display="block">
+\E_x[\one_{\{ T=n\}}FG\circ\Theta_T]=\E_x[\one_{\{ T=n\}}FG\circ \Theta_n]=\E_n[\one_{\{ T=n\}}\E_{X_n}[G]].
+</math>
+The last equality follows from a previous theorem after obtaining that <math>\one_{\{T=n\}}F</math> is <math>\F_n</math>-measurable. We then sum over <math>n</math>.}}
+{{proofcard|Corollary|cor-1|Let <math>(\Omega,\F,(\F_n)_{n\geq 0},\p)</math> be a filtered probability space. Let <math>T</math> be a stopping time such that <math>\p_x[T < \infty]=1</math>. Let us assume that there exists <math>y\in E</math> such that <math>\p_x[X_T=y]=1</math>. Then under <math>\p_x</math>, <math>\Theta_T(\omega)</math> is independent of <math>\F_T</math> and has the law <math>\p_y</math>.
+|We only have to note that
+<math display="block">
+\E_x[FG\circ \Theta_T(\omega)]=\E_x[F\E_{X_T}[G]]=\E_x[F\E_y[G]]=\E_x[F]\E_y[G].
+</math>}}
+==General references==
+{{cite arXiv|last=Moshayedi|first=Nima|year=2020|title=Lectures on Probability Theory|eprint=2010.16280|class=math.PR}}