The Canonical Markov chain

We start with the existence of a Markov chain associated with a given transition matrix.

Proposition

Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math]Q[/math] be a stochastic matrix on [math]E[/math]. There exists a probability space [math](\Omega',\F',\p')[/math] on which there exists, for all [math]x\in E[/math], a stochastic process [math](X_n^x)_{n\geq 0}[/math] which is a Markov chain with transition matrix [math]Q[/math], starting from [math]X_0^x=x[/math].

Show Proof

We can take [math]\Omega'=[0,1)[/math], with the Borel [math]\sigma[/math]-Algebra and the Lebesgue measure. For [math]\omega\in[0,1)[/math], we set

[[math]] \omega=\sum_{n=0}^\infty \mathcal{E}_n(\omega)2^{-n-1}, [[/math]]

with [math]\mathcal{E}_n(\omega)\in\{0,1\}[/math]. We can further take the sequence [math](\mathcal{E}_n)_{n\geq 0}[/math] of iid r.v.'s with [math]\p[\mathcal{E}_n=1]=\p[\mathcal{E}_n=0]=\frac{1}{2}[/math]. If [math]\varphi:\N\times\N\to \N[/math] is a bijection, then the r.v.'s [math]\eta_{i,j}=\mathcal{E}_{\varphi(i,j)}[/math] for [math]i,j\in\N[/math] are also iid. Now define

[[math]] U_i=\sum_{j=0}^\infty\eta_{i,j}2^{-j-1}. [[/math]]

Therefore [math]U_0,U_1,U_2,...[/math] are iid. Denote [math]E=\{y_1,y_2,...,y_k,...\}[/math] and fix [math]x\in E[/math]. We set [math]X_0^x=x[/math] and [math]X_1^x=y_k[/math] if

[[math]] \sum_{i\leq j \lt k}Q(x,y_j) \lt U_1\leq \sum_{1\leq j\leq k}Q(x,y_j), [[/math]]

such that [math]\p[X_1^x=y]=Q(x,y)[/math] for all [math]y\in E[/math]. We then proceed by induction [math]X_{n+1}^x=y_k[/math] if

[[math]] \sum_{1\leq j \lt k} Q(X_n^x,y_j) \lt U_{n+1}\leq \sum_{1\leq j\leq k}Q(X_n^x,y_j). [[/math]]

Using the independence of the [math]U_i[/math]'s, we check that for all [math]k\geq 1[/math], we have

[[math]] \begin{multline*} \p[X_{n+1}^x=y_k\mid X_0^x=x,X_1^x=x_1,...,X_n^x=x_n]\\=\p\left[\sum_{1\leq j \lt k}Q(x_n,y_j) \lt U_{n+1}\leq \sum_{1\leq j\leq k}Q(x_n,y_j)\mid X_0^x=x,...,X_n^x=x_n\right]=Q(x_n,y_k). \end{multline*} [[/math]]

■

In the sequel, we shall take [math]\Omega=E^\N[/math]. An element [math]\omega\in \Omega[/math] is a sequence [math]\omega=(\omega_0,\omega_1,...)[/math] of elements in [math]E[/math]. Define the coordinate map

[[math]] X_n(\omega)=\omega_n. [[/math]]

We take [math]\F[/math] to be the smallest [math]\sigma[/math]-Algebra on [math]\Omega[/math] which make all [math]X_n[/math]'s measurable. This [math]\sigma[/math]-Algebra is generated by

[[math]] C=\{\omega\in\Omega\mid \omega_0=x_0,\omega_1=x_1,...,\omega_n=x_n\} [[/math]]

for [math]n\in\N[/math] and [math]x_0,...,x_n\in E[/math].

Lemma

Let [math](G,\mathcal{G})[/math] be a measurable space and let [math]\psi:G\to\Omega[/math] be a map. Then [math]\psi[/math] is measurable if and only if for all [math]n\geq 0[/math], [math]X_n\circ\psi[/math] is measurable.

Show Proof

We only need to show that if [math]X_n\circ \psi[/math] is measurable for all [math]n\geq 0[/math], then [math]\psi[/math] is measurable as well. Consider the [math]\sigma[/math]-Algebra [math]\A[/math] on [math]\Omega[/math] given by

[[math]] \A:=\{A\in\F\mid \psi^{-1}(A)\in\mathcal{G}\}, [[/math]]

which contains all the sets of the form [math]X_n^{-1}(y)[/math] for [math]y\in E[/math]. Hence all the [math]X_n[/math]'s are measurable with respect to [math]\A[/math] and the claim follows.

■

Theorem

Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math]Q[/math] be a stochastic matrix on [math]E[/math]. For all [math]x\in E[/math], there exists a unique probability measure [math]\p_x[/math], on [math]\Omega=E^\N[/math], such that the sequence of coordinate functions [math](X_n)_{n\geq 0}[/math] is a Markov chain with transition matrix [math]Q[/math] and [math]\p_x[X_0=x]=1[/math].

Show Proof

From proposition, there exists [math](\Omega',\F',\p')[/math] and [math](X_n^x)_{n\geq 0}[/math] which is a Markov chain with transition matrix [math]Q[/math] and such that [math]X_0^x=x[/math]. We then define [math]\p_x[/math] as the image of [math]\p'[/math] by the map

[[math]] \begin{align*} \Omega'&\longrightarrow\Omega\\ \omega'&\longmapsto (X_n^x(\omega'))_{n\geq 0} \end{align*} [[/math]]

This map is measurable because of lemma. We have [math]\p_x[X_0=x]=\p'[X_0^x=x]=1[/math] and for [math]x_0,x_1,...,x_n\in E[/math] we have

[[math]] \begin{multline*} \p_x[X_0=x_0,X_1=x_1,...,X_n=x_n]=\p'[X_0^x=x_0,X_1^x=x_1,...,X_n^x=x_n]\\=\p'[X_0^x=x_0]\prod_{j=1}^nQ(x_{j-1},x_j)=\p_x[X_0=x_0]\prod_{j=1}Q(x_{j-1},x_j) \end{multline*} [[/math]]

Therefore it follows that the sequence of the coordinate functions is a Markov chain. For uniqueness, we note that if [math]\p'_x[/math] is another such probability measure, [math]\p_x[/math] and [math]\p_x'[/math] coincide on cylinders. It then follows from an application of the monotone class theorem that [math]\p_x=\p_x'[/math].

■

It follows for all [math]n\geq 0[/math] and for all [math]x,y\in E[/math], that we have

[[math]] \p_x[X_n=y]=Q_n(x,y). [[/math]]

If [math]\mu[/math] is a probability measure on [math]E[/math], we write

[[math]] \p_\mu:=\sum_{x\in E}\mu(x)\p_x, [[/math]]

which defines a new probability measure on [math]\Omega[/math]. By writing an explicit formula for [math]\p_\mu[X_0=x_0,...,X_n=x_n][/math], one immediately checks that under [math]\p_\mu[/math] we get that [math](X_n)_{n\geq 0}[/math] is a Markov chain with transition matrix [math]Q[/math], where [math]X_0[/math] has the distribution [math]\mu[/math].

Let [math](X'_n)_{n\geq 0}[/math] be a Markov chain with transition matrix [math]Q[/math] and initial distribution [math]\mu[/math]. Then for all measurable subsets [math]B\subset\Omega=E^\N[/math], we have

[[math]] \p[(X'_n)_{n\geq 0}\in B]=\p_\mu[B]. [[/math]]

This is only true when [math]B[/math] is a cylinder as in the proof above. This shows that all results proven for the canonical Markov chains are true for a general Markov chain with the same transition matrix. One of the advantages of using a canonical Markov chain is that one can use translation operators. For all [math]k\in\N[/math], we define

[[math]] \Theta_k:\Omega\to\Omega,(\omega_n)_{n\geq 0}\mapsto \Theta_k((\omega_n)_{n\geq 0})=(\omega_{k+n})_{n\geq 0}. [[/math]]

Lemma 16.2. shows that these applications are measurable. We write [math]\F_n=\sigma(X_0,...,X_n)[/math] and we use the notation [math]\E_x[/math] when we integrate with respect to [math]\p_x[/math].

Theorem (Simple Markov property)

Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math]F[/math] and [math]G[/math] be two measurable, positive maps on [math]\Omega[/math] and let [math]n\geq 0[/math]. Assume that [math]F[/math] is [math]\F_n[/math]-measurable. Then for all [math]x\in E[/math],

[[math]] \E_x[FG\circ \Theta_n]=\E_x[F\E_{X_n}[G]]. [[/math]]

Equivalently, we have

[[math]] \E_x[G\circ \Theta_n\mid \F_n]=\E_{X_n}[G]. [[/math]]

We say that the conditional distribution of [math]\Theta_n(\omega)[/math] given [math](X_0,X_1,...,X_n)[/math] is [math]\p_{X_n}[/math].

Theorem is also true if one replace [math]\E_x[/math] with [math]\E_\mu[/math].

Show Proof

[Proof of Theorem] It is enough to prove the first statement. For this we can restrict ourselves to the case where [math]F=\one_{\{ X_0=x_0,X_1=x_1,...,X_n=x_n\}}[/math] for [math]x_0,x_1,...,x_n\in E[/math]. Let us first consider the case where [math]G=\one_{\{X_0=y_0,...,X_p=y_p\}}[/math]. In this case, if [math]y\in E[/math], then

[[math]] \E_y[G]=\one_{\{ Y_0=y\}}\prod_{j=1}^pQ(y_{j-1},y_j) [[/math]]

and

[[math]] \begin{multline*} \E_x[FG\circ \Theta_n]=\p_x[X_0=x_0,...,X_{n}=x_{n},X_n=y_0,X_{n+1}=y_{1},...,X_{n+p}=y_p]\\ =\one_{\{X_0=x\}}\prod_{j=1}^nQ(x_{j-1},x_j)\one_{\{y_0=y_n\}}\prod_{j=1}^pQ(y_{j-1},y_j) \end{multline*} [[/math]]

Now it follows from the monotone class theorem that the above holds for [math]G=\one_A[/math], for [math]A\in\F_n[/math] and thus we can conclude.

■

We would like to make sense of theorem 16.4. for [math]n[/math] replaced by a stopping time [math]T[/math]. Indeed let us assume that we want to the problem of knowing whether a Markov chain starting from [math]x[/math] visits back [math]x[/math]. In other words, let us write

[[math]] N_x=\sum_{n=0}^\infty \one_{\{X_n=x\}} [[/math]]

and ask the question, whether we have that [math]\p_x[N_x=\infty]=1[/math]. It is in fact enough to check that it comes back at least once at [math]x[/math]. If [math]H_x=\inf\{n\geq 1\mid X_n=x\}[/math], with the convention [math]\inf\varnothing=\infty[/math], we have [math]\p_x[N_x=\infty]=1[/math] if and only if [math]\p_x[H_x \lt \infty]=1[/math], which is trivial. If [math]\p_x[N_x \lt \infty]=1[/math], and if theorem 16.4. is true for stopping times, then [math]\theta_{H_x}(\omega)=\left(\omega_{H_x(\omega)=n}\right)_{n\geq 0}[/math] has the law [math]\p_x[/math]. But then, since [math]N_x(\omega)=1+N_x(\Theta_{H_x}(\omega))[/math], we see that [math]N_x[/math] has the same distribution as [math]1+N_x[/math] under [math]\p_x[/math]. This is only possible if [math]N_x=\infty[/math] a.s.

Theorem (Strong Markov property)

Let [math](\Omega,\F,(\F_n)_{n\geq 0},\p)[/math] be a filtered probability space. Let [math]T[/math] be a stopping time for the filtration [math](\F_n)_{n\geq 0}[/math]. Let [math]F[/math] and [math]G[/math] be two measurable and positive functions on [math]\Omega[/math]. Assume further that [math]F[/math] is [math]\F_T[/math]-measurable. Then for all [math]x\in E[/math], we get

[[math]] \E_x[\one_{\{ T \lt \infty\}}FG\circ \Theta_T]=\E_x[\one_{\{ T \lt \infty\}}F\E_x[G]] [[/math]]

or equivalently

[[math]] \E_x[\one_{\{T \lt \infty\}}G\circ \Theta_T\mid \F_T]=\one_{\{ T \lt \infty\}}\E_{X_T}[G]. [[/math]]

Show Proof

For [math]n\geq 0[/math], we have

[[math]] \E_x[\one_{\{ T=n\}}FG\circ\Theta_T]=\E_x[\one_{\{ T=n\}}FG\circ \Theta_n]=\E_n[\one_{\{ T=n\}}\E_{X_n}[G]]. [[/math]]

The last equality follows from a previous theorem after obtaining that [math]\one_{\{T=n\}}F[/math] is [math]\F_n[/math]-measurable. We then sum over [math]n[/math].

■

Corollary

Let [math](\Omega,\F,(\F_n)_{n\geq 0},\p)[/math] be a filtered probability space. Let [math]T[/math] be a stopping time such that [math]\p_x[T \lt \infty]=1[/math]. Let us assume that there exists [math]y\in E[/math] such that [math]\p_x[X_T=y]=1[/math]. Then under [math]\p_x[/math], [math]\Theta_T(\omega)[/math] is independent of [math]\F_T[/math] and has the law [math]\p_y[/math].

Show Proof

We only have to note that

[[math]] \E_x[FG\circ \Theta_T(\omega)]=\E_x[F\E_{X_T}[G]]=\E_x[F\E_y[G]]=\E_x[F]\E_y[G]. [[/math]]

■

General references

Moshayedi, Nima (2020). "Lectures on Probability Theory". arXiv:2010.16280 [math.PR].