Convergence of Random Variables

Types of Convergences

We have already seen the notion of a.s. convergence. There are different types of convergences for r.v.'s in probability theory. Let us recall the notion of a.s. convergence.

Definition (Almost sure convergence)

Let [math](\Omega,\A,\p)[/math] be a probability space. Let [math](X_n)_{n\geq 1}[/math] be a sequence of r.v.'s and let [math]X[/math] be a r.v. with values in [math]\R^d[/math]. Then

[[math]] \lim_{n\to\infty\atop a.s.}X_n=X\Longleftrightarrow\p\left[\left\{\omega\in\Omega\mid \lim_{n\to\infty}X_n(\omega)=X(\omega)\right\}\right]=1. [[/math]]

Another very important convergence type is the [math]L^p[/math]-convergence as it is described in measure theory. Recall that convergence in [math]L^p[/math] for [math]p\in[1,\infty)[/math] in the probability language means

[[math]] \lim_{n\to\infty\atop L^p}X_n=X\Longleftrightarrow \lim_{n\to\infty}\E\left[\vert X_n-X\vert^p\right]=0. [[/math]]

Definition (Convergence in probability)

Let [math](\Omega,\A,\p)[/math] be a probability space. We say that the sequence [math](X_n)_{n\geq 1}[/math] converges in probability to [math]X[/math] if for all [math]\epsilon \gt 0[/math]

[[math]] \lim_{n\to\infty\atop \p}X_n=X\Longleftrightarrow\lim_{n\to\infty}\p[\vert X_n-X\vert \gt \epsilon]=0. [[/math]]

Proposition

Let [math](\Omega,\A,\p)[/math] be a probability space. Let [math]\mathcal{L}^0_{\R^d}(\Omega,\A,\p)[/math] be the space of r.v.'s with values in [math]\R^d[/math] and let [math]L^0_{\R^d}(\Omega,\A,\p)[/math] be the quotient of [math]\mathcal{L}^0_{\R^d}(\Omega,\A,\p)[/math] by the equivalence relation [math]X\sim Y:\Longleftrightarrow X=Y[/math] a.s. Then the map

[[math]] \begin{align*} d:L^0_{\R^d}(\Omega,\A,\p)\times L^0_{\R^d}(\Omega,\A,\p)&\longrightarrow \R_+\\ (X,Y)&\longmapsto d(X,Y)=\E[\vert X-Y\vert\land 1] \end{align*} [[/math]]

defines a distance (metric) in [math]L^0_{\R^d}(\Omega,\A,\p)[/math], which is compatible with convergence in probability, i.e.

[[math]] \lim_{n\to\infty\atop \p}X_n=X\Longleftrightarrow \lim_{n\to\infty}d(X_n,X)=0. [[/math]]

Moreover [math]L^0_{\R^d}(\Omega,\A,\p)[/math] is complete for the metric [math]d[/math].

Show Proof

It's easy to see that [math]d[/math] defines a distance. If [math]\lim_{n\to\infty\atop \p}X_n=X[/math], then for all [math] \epsilon \gt 0[/math] we get

[[math]] \lim_{n\to\infty}\p[\vert X_n-X\vert \gt \epsilon]=0. [[/math]]

Fix [math]\epsilon \gt 0[/math]. Then

[[math]] \E[\vert X_n-X\vert \land 1]=\E[(\vert X_n-X\vert\land 1)\one_{\vert X_n-X\vert \leq \epsilon}]+\E[(\vert X_n-X\vert\land 1)\one_{\vert X_n-X\vert \gt \epsilon}]\leq \epsilon+\E[\one_{\vert X_n-X\vert \gt \epsilon}]\xrightarrow{n\to\infty} 0, [[/math]]

for [math]\epsilon[/math] arbitrary small. Conversely assume that [math]\lim_{n\to\infty}d(X_n,X)=0[/math]. Then for all [math]\epsilon\in (0,1)[/math] we have

[[math]] \p[\vert X_n-X\vert \gt \epsilon]\leq \frac{1}{\epsilon}\E[\vert X_n-X\vert\land 1]=\frac{1}{\epsilon}d(X_n,X). [[/math]]

Now we show completeness. Let [math](X_k)_{k\geq 0}[/math] be a Cauchy sequence for [math]d[/math]. Then there exists a subsequence [math]Y_k=X_{n_k}[/math], such that [math]d(Y_k,Y_{k+1})\leq \frac{1}{2^k}[/math]. It follows that

[[math]] \E\left[\sum_{k=1}^\infty(\vert Y_k-Y_{k+1}\vert\land1)\right]=\sum_{k=1}^\infty d(Y_k,Y_{k+1}) \lt \infty, [[/math]]

which implies that [math]\sum_{k=1}^\infty (\vert Y_k-Y_{k+1}\vert\land 1) \lt \infty[/math] and hence [math]\sum_{k=1}^\infty\vert Y_k-Y_{k+1}\vert \lt \infty[/math]. The r.v. [math]X=Y_1+\sum_{k=1}^\infty Y_{k+1}-Y_k=X_{n_1}+\sum_{k=1}^\infty X_{n_k+1}-X_{n_k}[/math] is well defined and [math](X_n)_{n\geq 1}[/math] converges to [math]X[/math].

■

Proposition

Let [math](\Omega,\A,\p)[/math] be a probability space. If [math](X_n)_{n\geq 1}[/math] converges a.s. or in [math]L^p[/math] to [math]X[/math], it also converges in probability to [math]X[/math]. Conversely, if [math](X_n)_{n\geq 1}[/math] converges to [math]X[/math] in probability, then there exists a subsequence [math](X_{n_k})_{k\geq 1}[/math] of [math](X_n)_{n\geq 1}[/math] such that

[[math]] \lim_{n\to\infty\atop a.s.}X_n=X. [[/math]]

Show Proof

Consider [math]d(X_n,X)[/math]. We need to prove that [math]\lim_{n\to\infty\atop a.s.}X_n=X[/math] or [math]\lim_{n\to\infty\atop L^p}X_n=X[/math], which implies that [math]\lim_{n\to\infty}d(X_n,X)=0[/math]. If [math]\lim_{n\to\infty\atop a.s.}X_n=X[/math], then we apply Lebesgue's dominated convergence theorem (we can do this, because [math]\vert X_n-X\vert\land 1\leq 1[/math] and [math]\E[1] \lt \infty[/math]) to obtain that [math]\lim_{n\to\infty}\E[\vert X_n-X\vert\land 1]=\E[\lim_{n\to\infty}(\vert X_n-X\vert\land 1)]=0[/math]. If [math]\lim_{n\to\infty\atop L^p}X_n=X[/math], we can use the fact that for all [math]p\geq 1[/math],

[[math]] \E[\vert X_n-X\vert\land 1]\leq \underbrace{\E[\vert X_n-X\vert]}_{\| X_n-X\|_1}\leq \|X_n-X\|_p\xrightarrow{n\to\infty}0. [[/math]]

■

Proposition

Let [math](\Omega,\A,\p)[/math] be a probability space. Let [math](X_n)_{n\geq 1}[/math] be a sequence of r.v.'s and let [math]\lim_{n\to\infty\atop\p}X_n=X[/math]. Assume there is some [math]r \lt 1[/math], such that [math](X_n)_{n\geq 1}[/math] is bounded in [math]L^r[/math], i.e.

[[math]] \sup_{n}\E[\vert X_n\vert^r] \lt \infty. [[/math]]

Then for every [math]p\in[1,\infty)[/math], we get that [math]\lim_{n\to\infty\atop L^p}X_n=X[/math].

Show Proof

The fact that [math](X_n)_{n\geq 1}[/math] is bounded in [math]L^r[/math] implies that there is some [math]C \gt 0[/math], such that for all [math]n\geq 1[/math]

[[math]] \E[\vert X_n\vert^r]\leq C. [[/math]]

With Fatou's lemma we get

[[math]] \E[\vert X\vert]\leq C. [[/math]]

So it follows that [math]X\in L^r[/math]. Now we apply Hölder's inequality to obtain for [math]p\in[1,r),[/math]

[[math]] \begin{align*} \E[\vert X_n-X\vert^p]&=\E[\vert X_n-X\vert^p\one_{\{\vert X_n-X\vert \leq \epsilon\}}]+\E[\vert X_n-X\vert^p\one_{\{\vert X_n-X\vert \gt \epsilon\}}]\\ &\leq \epsilon^p+\E[\vert X_n-X\vert^r]^{\frac{p}{r}}\p[\vert X_n-X\vert \gt \epsilon]^{1-\frac{p}{r}}\\ &\leq \epsilon^p+2^pC^{\frac{p}{r}}\p[\vert X_n-X\vert \gt \epsilon]\xrightarrow{n\to\infty}0. \end{align*} [[/math]]

■

The strong law of large numbers

Theorem (Kolmogorov's 0-1 law)

Let [math](\Omega,\A,\p)[/math] be a probability space. Let [math](X_n)_{n\geq 1}[/math] be a sequence of independent r.v.'s with values in arbitrary measure spaces. For [math]n\geq 1[/math], define the [math]\sigma[/math]-Algebra

[[math]] \B_n:=\sigma(X_k\mid k\geq n). [[/math]]

The tail [math]\sigma[/math]-Algebra [math]\B_\infty[/math] is defined as

[[math]] \B_\infty:=\bigcap_{n=1}^\infty\B_n. [[/math]]

Then [math]\B_\infty[/math] is trivial in the sense that for all [math]B\in \B_\infty[/math] we get that [math]\p[B]\in\{0,1\}[/math].

We can easily see that a r.v. which is [math]\B_\infty[/math]-measurable is constant a.s. indeed its distribution function can only take the values 0 and 1.

Show Proof

Define [math]\mathcal{D}_n:=\sigma(X_k\mid k\leq n)[/math]. We have already observed that [math]\mathcal{D}_n[/math] and [math]\B_{n+1}[/math] are independent and hence since [math]\B_\infty\subset \B_{n+1}[/math], we get that for all [math]n\geq 1[/math], [math]\mathcal{D}_{n}[/math] and [math]\B_{\infty}[/math] are also independent. This implies that for all [math]A\in\bigcup_{n=1}^\infty\mathcal{D}_n[/math] and for all [math]B\in\B_\infty[/math] we get

[[math]] \p[A\cap B]=\p[A]\p[B]. [[/math]]

Since [math]\bigcup_{n=1}^\infty \mathcal{D}_n[/math] is stable under finite intersection, we obtain that [math]\sigma\left(\bigcup_{n=1}^\infty\mathcal{D}_n\right)[/math] is independent of [math]\B_\infty[/math] and

[[math]] \sigma\left(\bigcup_{n=1}^\infty\mathcal{D}_n\right)=\sigma\left(X_1,X_2,...\right). [[/math]]

We also note that the fact that [math]\B_\infty\subset \sigma(X_1,X_2,...)[/math] implies that [math]\B_\infty[/math] is independent of itself. Thus it follows that for all [math]B\in\B_{\infty}[/math], we get [math]\p[B]=\p[B\cap B]=\p[B]\p[B]=\p[B]^2[/math]. Hence [math]\p[B]=\p[B]^2[/math] and therefore [math]\p[B]\in\{0,1\}[/math].

■

If [math](X_n)_{n\geq 1}[/math] is a sequence of independent r.v.'s, then [math]\limsup_{n}\frac{X_1+...+X_n}{n}[/math] [math](\in[-\infty,\infty])[/math] is [math]\B_\infty[/math] measurable. It follows that [math]\frac{1}{n}(X_1+...+X_n)[/math] converges a.s. Moreover, its limit is a.s. constant.

Proposition

Let [math](\Omega,\A,\p)[/math] be a probability space. Let [math](X_n)_{n\geq 1}[/math] be a sequence of independent r.v.'s with the same distribution

[[math]] \p[X_n=1]=\p[X_n=-1]=\frac{1}{2} [[/math]]

for all [math]n\geq 1[/math]and set [math]S_n=\sum_{j=1}^nX_j[/math]. Then

[[math]] \begin{cases}\sup_{n\geq 1}S_n=\infty&\text{a.s.}\\ \inf_{n\geq 1}S_n=-\infty&\text{a.s.}\end{cases} [[/math]]

Show Proof

We first need to show that for [math]p\geq 1[/math] we get [math]\p[-p\leq \inf_n S_n\leq \sup_n S_n\leq p]=0[/math]. This is a good exercise ^[a]. Now take [math]p\to\infty[/math] to obtain

[[math]] \p[\{\inf_n S_n \gt -\infty\}\cap\{\sup_n S_n \lt \infty\}]=0, [[/math]]

and therefore [math]\p[\{\inf_n S_n=-\infty\}\cup\{\sup_n S_n=\infty\}]=1[/math]. So it follows

[[math]] 1\leq \p[\inf_n S_n=-\infty]+\p[\sup_n S_n=\infty]. [[/math]]

By symmetry, we get [math]\p[\inf_n S_=-\infty]=\p[\sup_n S_n=\infty][/math] and hence [math]\p[\sup_n S_n=\infty] \gt 0[/math]. Now note that [math]\{\sup_n S_n=\infty\}\in\B_\infty[/math]. Indeed, for all [math]k\geq 1[/math] we get [math]\{\sup_n S_n=\infty\}=\{\sup_{n\geq k}(X_k,X_{k+1}+...+X_n)=\infty\}\in\B_k[/math]. Since [math]\{\sup_n S_n=\infty\}\in\B_\infty[/math] it follows that [math]\p[\sup_n S_n =\infty]\in\{0,1\}[/math], but we have just seen that [math]\p[\sup_n S_n=\infty]\geq \frac{1}{2} \gt 0[/math], which implies then [math]\p[\sup_n S_n=\infty]=1[/math].

■

Theorem (Strong law of large numbers)

Let [math](\Omega,\A,\p)[/math] be a probability space. Let [math](X_n)_{n\geq 1}[/math] be a sequence of iid r.v.'s, such that [math]X_i\in L^1(\Omega,\A,\p)[/math] for all [math]i\in\{1,...,n\}[/math]. Then

[[math]] \lim_{n\to\infty\atop a.s.}\frac{1}{n}(X_1+...+X_n)=\E[X_1]. [[/math]]

Moreover, for [math]\bar X_n:=\frac{1}{n}\sum_{j=1}^n(X_j-\E[X_j])[/math] we have

[[math]] \p\left[\limsup_{n\to\infty}\vert\bar X_n\vert=0\right]=1. [[/math]]

The assumption [math]\E[X_1] \lt \infty[/math] is important, but if [math]X_1\geq 0[/math] and [math]\E[X_1]=\infty[/math] we can apply the theorem to [math]X_1\land k[/math] for [math]k \gt 0[/math], and obtain that the theorem also holds with [math]\E[X_1]=\infty[/math].

Show Proof

Let [math]S_n=X_1+...+X_n[/math] with [math]S_0=0[/math] and take [math]a \gt \E[X_1][/math]. Define [math]M=\sup_{n \gt 0}(S_n-na)[/math]. We shall show that [math]M \lt \infty[/math] a.s. Since we obviously have [math]S_n\leq na+M[/math], it follows immediately that [math]\frac{S_n}{n}\leq a[/math] a.s. Choosing [math]a\searrow \E[X_1][/math] we obtain that [math]\limsup_n\frac{S_n}{n}\leq \E[X_1][/math]. Replacing [math](X_n)_{n\geq 1}[/math] with [math](-X_n)_{n\geq 1}[/math], we also get [math]\liminf_n\frac{S_n}{n}\geq \E[X_1][/math] a.s. So it follows that

[[math]] \liminf_n\frac{S_n}{n}=\limsup_n\frac{S_n}{n}=\E[X_1]a.s. [[/math]]

Hence we only need to show that [math]M \lt \infty[/math] a.s. We first note that [math]\{M \lt \infty\}\in\B_\infty[/math]. Indeed, for all [math]k\geq 0[/math] we get that [math]\{M \lt \infty\}=\{\sup_{n\in\N}(S_n-an) \lt \infty\}=\{\sup_{n\geq k}(S_n-S_k-(n-k)a) \lt \infty\}[/math]. So it follows that [math]\p[M \lt \infty]\in\{0,1\}[/math]. Now we need to show that [math]\p[M \lt \infty]=1[/math] or equivalently [math]\p[M=\infty] \lt 1[/math]. We do it by contradiction. For [math]k\in\N[/math], set [math]M_k=\sup_{0\leq n\leq k}(S_n-na)[/math] and [math]M'_k=\sup_{0\leq n\leq k}(S_{n+1}-S_n-na)[/math]. Then [math]M_k[/math] and [math]M'_k[/math] have the same distribution. Indeed, [math](X_1,...,X_k)[/math] and [math](X_2,...,X_{k+1})[/math] have the same distribution and [math]M_k=F_k(X_1,...,X_k)[/math] and [math]M'_k=F_k(X_2,...,X_{k+1})[/math] with some map [math]F_k:\R^k\to\R[/math]. Moreover, [math]M=\lim_{k\to\infty}\uparrow M_k[/math] and therefore [math]M'=\lim_{k\to\infty}M_k[/math]. Since [math]M_k[/math] and [math]M_k'[/math] have the same distribution, [math]M[/math] and [math]M'[/math] also have the same distribution. Indeed [math]\p[M'\leq X]=\lim_{k\to\infty}\downarrow \p[M_k'\leq X]=\lim_{k\to\infty}\downarrow \p[M_k\leq X]=\p[M\leq X][/math]. So [math]M[/math] and [math]M'[/math] have the same distribution function. Moreover, [math]M_{k+1}=\sup\{0,\sup_{1\leq n\leq k+1}(S_n-na)\}=\sup\{0,M_k'+X_1-a\}[/math], which implies that [math]M_{k+1}=M_k'-\inf\{a-X_1,M_k'\}[/math]. Now we can use the fact that [math]M_k'[/math] and [math]M_k[/math] are bounded to obtain

[[math]] \begin{align*} \E[\inf\{a-X_1,M_k'\}]&=\E[M_k']-\E[M_{k+1}]\\ &\leq \vert a-X_1\vert\\ &\leq \vert a\vert +\vert X_1\vert\in \mathcal{L}^1(\Omega,\A,\p) \end{align*} [[/math]]

and apply the dominated convergence theorem to obtain

[[math]] \E[\inf\{a-X_1,M'\}]\leq 0. [[/math]]

If we had [math]\p[M\leq \infty]=1[/math], then since [math]M'[/math] and [math]M[/math] have the same distribution we would also have [math]\p[M'=\infty]=1[/math], in which case [math]\inf\{a-X_1,M'\} \lt a-X_1[/math] and [math]\E[a-X_1] \gt 0[/math] and this contradicts

[[math]] \E[\inf\{a-X_1,M'\}]\leq 0. [[/math]]

■

General references

Moshayedi, Nima (2020). "Lectures on Probability Theory". arXiv:2010.16280 [math.PR].

Notes

Hint: Borel-Cantelli

[1] Hint: Borel-Cantelli

[a]