Definition and first properties

In this chapter [math]E[/math] will be a finite or a countable set, endowed with the [math]\sigma[/math]-Algebra [math]\mathcal{P}(E)[/math]. A stochastic matrix on [math]E[/math] is a family [math](Q(x,y))_{x,y\in E}[/math] of real numbers satisfying

[math]0\leq Q(x,y)\leq 1[/math] for all [math]x,y\in E[/math].
[math]\sum_{y\in E}Q(x,y)=1[/math] for all [math]x\in E[/math].

This is a transition probability from [math]E[/math] to [math]E[/math] in the following sense: if for [math]x\in E[/math] and [math]A\subset E[/math] we write

[[math]] \nu(x,A)=\sum_{y\in A}\nu(x,y), [[/math]]

we see that [math]\nu[/math] is a transition kernel probability from [math]E[/math] to [math]E[/math]. Conversely, if we start with such a transition kernel, the formula

[[math]] Q(x,y)=\nu(x,\{y\}) [[/math]]

defines a stochastic matrix on [math]E[/math]. For [math]n\geq 1[/math], we can define [math]Q_n=Q^n[/math]. Indeed, [math]Q_1=Q[/math] and by induction

[[math]] Q_{n+1}(x,y)=\sum_{z\in E}Q_n(x,z)Q(z,y). [[/math]]

One can check that [math]Q_n[/math] is also a stochastic matrix on [math]E[/math]. For [math]n=0[/math] we take [math]Q_0(x,y)=\one_{\{x=y\}}[/math]. For a measurable map [math]f:E\to\R_+[/math] we write [math]Qf[/math] as the function defined by

[[math]] Qf(x)=\sum_{y\in E}Q(x,y)f(y). [[/math]]

Definition (Markov chain)

Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math]Q[/math] be a stochastic matrix on [math]E[/math] and let [math](X_n)_{n\geq 0}[/math] be a stochastic process with values in [math]E[/math]. We say that [math](X_n)_{n\geq 0}[/math] is a Markov chain with transition matrix [math]Q[/math] if for all [math]n\geq 0[/math], the conditional distribution of [math]X_{n+1}[/math] given [math](X_0,X_1,...,X_n)[/math] is [math]Q(X_n,y)[/math], or equivalently if for all [math]x_0,x_1,...,x_n,y\in E[/math]

[[math]] \p[X_{n+1}=y\mid X_0=x_0,X_1=x_1,...,X_{n-1}=x_{n-1},X_n=x_n]=Q(x_n,y) [[/math]]

such that [math]\p[X_0=x_0,....,X_n=x_n] \gt 0[/math].

In general, the conditional distribution of [math]X_{n+1}[/math] given [math]X_0,X_1,...,X_n[/math] depends on all the variables [math]X_0,X_1,...,X_n[/math]. The fact that this conditional distribution only depend on [math]X_n[/math] is called the Markov property.

[math]Q(x,\cdot)[/math], which is the distribution of [math]X_{n+1}[/math] given [math]X_n=x_n[/math] does not depend on [math]n[/math]: this is the homogeneity of the Markov chain.

Proposition

Let [math](\Omega,\F,\p)[/math] be a probability space. A stochastic process [math](X_n)_{n\geq 0}[/math] with values in [math]E[/math] is a Markov chain with transition kernel [math]Q[/math] if and only if for all [math]n\geq 0[/math] and for all [math]x_0,x_1,...,x_n\in E[/math]

[[math]] \begin{equation} \label{markov} \p[X_0=x_0,X_1=x_1,...,X_n=x_n]=\p[X_0=x_0]\prod_{j=1}^nQ(x_{j-1},x_j). \end{equation} [[/math]]

In particular, if [math]\p[X_0=x_0] \gt 0[/math], then

[[math]] \p[X_n=x_n\mid X_0=x_0]=Q_n(x_0,x_n). [[/math]]

Show Proof

If [math](X_n)_{n\geq 0}[/math] is a Markov chain with transition matrix [math]Q[/math], we have

[[math]] \begin{align*} \p[X_0=x_0,...,X_n=x_n,X_{n+1}=x_{n+1}]&=\p[X_0=x_0,...,X_n=x_n]\p[X_{n+1}=x_{n+1}\mid X_0=x_0,...,X_n=x_n]\\ &=\p[X_1=x_1,...,X_n=x_n]\\ &=Q(x_{n+1},x_n). \end{align*} [[/math]]

Thus we can conclude by induction. Conversely, if (markov) is satisfied, then

[[math]] \p[X_{n+1}=y\mid X_0=x_0,...,X_n=x_n]=\frac{\p[X_0=x_0]\prod_{j=0}^{n-1}Q(x_j,x_{j+1})Q(x_n,y)}{\p[X_0=x_0]\prod_{j=0}^{n-1}Q(x_j,x_{j+1})}=Q(x_n,y). [[/math]]

To conclude, we note that

[[math]] Q_n(x_0,x)=\sum_{x_1,...x_{n+1}\in E}\prod_{j=1}^nQ(x_{j-1}x_j). [[/math]]

■

Proposition 15.1. shows that for a Markov chain, [math](X_0,X_1,...,X_n)[/math] is completely determined by the initial distribution (that of [math]X_0[/math]) and the transition matrix [math]Q[/math]. For now we want to note [math]\p[A\mid Z][/math] for [math]\E[\one_A\mid Z][/math].

Proposition

Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math](X_n)_{n\geq 0}[/math] be a Markov chain with transition matrix [math]Q[/math].

For all [math]n\geq 0[/math] and for all measurable maps [math]f:E\to \R_+[/math] we have
[[math]] \E[f(X_{n+1})\mid X_0,X_1,...,X_n]=\E[f(X_{n+1})\mid X_n]=Qf(X_n). [[/math]]
More generally for all [math]\{ i_1,...,i_n\}\subset \{0,1,...,n-1\}[/math], we have
[[math]] \E[f(X_{n+1})\mid X_{i_1},X_{i_2},...,X_{i_n},X_n]=Qf(X_n). [[/math]]
For all [math]n\geq 0, p\geq 1[/math] and for all [math]y_1,...,y_p\in E[/math] we have
[[math]] \p[X_{n+1}=y_1,...,X_{n+p}=y_p\mid X_0,...,X_n]=Q(X_n,y_1)\prod_{j=1}^{p-1}Q(y_j,y_{j+1}) [[/math]]
and hence
[[math]] \p[X_{n+p}=y_p\mid X_n]=Q_p(X_n,y_p). [[/math]]
If we take [math]Y_p=X_{n+p}[/math], then [math](Y_p)_{p\geq 0}[/math] is also a Markov chain with transition matrix [math]Q[/math].

Show Proof

For [math](i)[/math], we have

[[math]] \E[f(X_{n++})\mid X_0,X_1,...,X_n]=\sum_{y\in E}Q(X_n,y)f(y)=Qf(X_n). [[/math]]

Now if [math]\{i_1,...,i_n\}\subset \{ 0,1,...,n-1\}[/math], then^[a]

[[math]] \begin{multline*} \E[f(X_{n+1})\mid X_{i_1},...,X_{i_n},X_n]=\E[\E[f(X_{n+1})\mid X_0,X_1,...,X_n]\mid X_{i_1,...,X_{i_n}},X_n]\\=\E[Qf(X_n)\mid X_{i_1},...,X_{i_n},X_n]=Qf(X_n). \end{multline*} [[/math]]

For [math](ii)[/math], note that it follows immediately from markov that

[[math]] \p[X_{n+1}=y_1,...,X_{n-p}=y_p\mid X_0=x_0,...,X_n=x_n]=Q(x_n,y_1)\prod_{j=1}^{p-1}Q(y_j,y_{j+1}). [[/math]]

The formula for [math]\p[X_{n-p}=y_p\mid X_n][/math] follows for [math]y_1,...,y_{p-1}\in E[/math]. Finally we note that

[[math]] \p[Y_0=,Y_1=y_1,...,Y_p=y_p]=\p[X_n=y_n]\prod_{j=0}^pQ(y_{j-1},y_j) [[/math]]

and we can apply proposition 15.1.

■

Example

Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math](X_n)_{n\geq 0}[/math] be a sequence of independent r.v.'s with values in [math]E[/math], with the same distribution [math]\mu[/math]. Then [math](X_n)_{n\geq 0}[/math] is a Markov chain with transition matrix

[[math]] Q(x,y)=\mu(y) [[/math]]

for all [math]x,y\in E[/math].

Example

Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math]\eta,\xi_1,...,\xi_n,...[/math] be independent r.v.'s in [math]\mathbb{Z}^d[/math]. We assume that [math]\xi_1,...\xi_n,...[/math] have the same distribution [math]\mu[/math]. For [math]n\geq 0[/math] we write

[[math]] X_n=\eta+\xi_1+\dotsm+\xi_n. [[/math]]

Then [math](X_n)_{n\geq 0}[/math] is a Markov chain with transition matrix

[[math]] Q(x,y)=\mu(y-x) [[/math]]

for all [math]x,y\in E[/math]. Indeed,

[[math]] \begin{multline*} \p[X_{n+1}=y\mid X_0=x_0,...,X_n=x_n]=\p[\xi_{n+1}=y-x_n\mid X_0=x_0,...,X_n=x_n]\\=\p[\xi_{n+1}=y-x_n]=\mu(y-x_n). \end{multline*} [[/math]]

If [math](e_1,...,e_d)[/math] is the canonical basis of [math]\R^d[/math], and if [math]\mu(e_i)=\mu(-e_i)=\frac{1}{2d}[/math] for all [math]i\in\{1,...,d\}[/math], then the Markov chain is called a simple random walk on [math]\mathbb{Z}^d[/math].

Example

Let [math]\mathcal{P}_2(E)[/math] be the subsets of [math]E[/math] with two elements and let [math]A\subset\mathcal{P}_2(E)[/math]. For [math]x\in E[/math], we note

[[math]] A_x=\{y\in E\mid \{x,y\}\in A\}. [[/math]]

We assume that [math]A_x\not=\varnothing[/math] for all [math]x\in E[/math]. We then define a transition matrix on [math]E[/math] by setting for [math]x,y\in E[/math]

[[math]] Q(x,y)=\begin{cases}\frac{1}{\vert A_x\vert},&\text{if $\{x,y\}\in A$}\\ 0,&\text{otherwise}\end{cases} [[/math]]

A Markov chain with transition matrix [math]Q[/math] is called a simple random walk on the graph [math](E,A)[/math].

General references

Moshayedi, Nima (2020). "Lectures on Probability Theory". arXiv:2010.16280 [math.PR].

Notes

Recall that if [math]X\in(E,\mathcal{E})[/math], [math]Y\in(F,\F)[/math], we call the conditional distribution of [math]Y[/math] given [math]X[/math] any transition kernel [math]\nu:E\to F[/math] such that for all measurable maps [math]h:(F;\F)\to \R_+[/math], [math]\E[h(y)\mid X]=\int_\R h(y)\nu(X,dy)[/math].

[1] Recall that if [math]X\in(E,\mathcal{E})[/math], [math]Y\in(F,\F)[/math], we call the conditional distribution of [math]Y[/math] given [math]X[/math] any transition kernel [math]\nu:E\to F[/math] such that for all measurable maps [math]h:(F;\F)\to \R_+[/math], [math]\E[h(y)\mid X]=\int_\R h(y)\nu(X,dy)[/math].

[a]