Independence

Independent events

Let [math](\Omega,\A,\p)[/math] be a probability space. If [math]A,B\in\A[/math], we say that [math]A[/math] and [math]B[/math] are independent if

[[math]] \p[A\cap B]=\p[A]\p[B]. [[/math]]

Example

[Throw of a die] We have the state space [math]\Omega=\{1,2,3,4,5,6\}[/math], [math]\omega\in\Omega[/math]. Hence we have [math]\p[\{\omega\}]=\frac{1}{6}[/math]. Now let [math]A=\{1,2\}[/math] and [math]B=\{1,3,5\}[/math]. Then

[[math]] \p[A\cap B]=\p[\{1\}]=\frac{1}{6}\text{and}\p[A]=\frac{1}{3},\p[B]=\frac{1}{2} [[/math]]

Therefore we get

[[math]] \p[A\cap B]=\p[A]\p[B]. [[/math]]

Hence we get that [math]A[/math] and [math]B[/math] are independent.

Definition (Independence of events)

We say that the [math]n[/math] events [math]A_1,...,A_n\in\A[/math] are independent if [math]\forall \{j_1,...,j_l\}\subset\{1,...,n\}[/math] we have

[[math]] \p[A_{j_1}\cap A_{j_2}\cap\dotsm \cap A_{j_l}]=\p[A_{j_1}]\dotsm \p[A_{j_l}]. [[/math]]

It is not enough to have [math]\p[A_1\cap\dotsm\cap A_n]=\p[A_1]\dotsm\p[A_n][/math]. It is also not enough to check that [math]\forall \{i,j\}\subset\{1,...,n\}[/math], [math]\p[A_i\cap A_j]=\p[A_i]\p[A_j][/math]. For instance, let us consider two tosses of a coin and consider events [math]A,B[/math] and [math]C[/math] given by

[[math]] A=\{\text{$H$ at the first throw}\},B=\{\text{$T$ at the first throw}\},C=\{\text{same outcome for both tosses}\} [[/math]]

The events [math]A,B[/math] and [math]C[/math] are two by two independent but [math]A,B[/math] and [math]C[/math] are not independent events.

Proposition

The [math]n[/math] events [math]A_1,...,A_n\in\A[/math] are independent if and only if

[[math]] (*)\p[B_1\cap\dotsm \cap B_n]=\p[B_1]\dotsm\p[B_n] [[/math]]

for all [math]B_i\in\sigma(A_i)=\{\emptyset,A_i,A_i^C,\Omega\}[/math], [math]\forall i\in\{1,...,n\}[/math].

Show Proof

If the above is satisfied and if [math]\{j_1,...,j_l\}\subset\{1,...,n\}[/math], then for [math]i\in\{j_1,...,j_l\}[/math] take [math]B_i=A_i[/math] and for [math]i\not\in\{j_1,...,j_l\}[/math] take [math]B_i=\Omega[/math]. So it follows that

[[math]] \p[A_{j_1}\cap\dotsm \cap A_{j_l}]=\p[A_{j_1}]\dotsm\p[A_{j_l}]. [[/math]]

Conversely, assume that [math]A_1,...,A_n\in\A[/math] are independent and we want to deduce [math](*)[/math]. We can assume that [math]\forall i\in\{1,...,n\}[/math] we have [math]B_i\not=\emptyset[/math] (for otherwise the identity is trivially satisfied). If [math]\{j_1,...,j_l\}=\{i\mid B_i\not=\Omega\}[/math], we have to check that

[[math]] \p[B_{j_1}\cap\dotsm\cap B_{j_l}]=\p[B_{j_1}]\dotsm\p[B_{j_l}], [[/math]]

as soon as [math]B_{j_k}=A_{j_k}[/math] or [math]B_{j_k}=A_{j_k}^C[/math]. Finally it's enough to show that if [math]C_1,...,C_p[/math] are independent events, then

[[math]] C_1^C,C_2,...,C_p [[/math]]

are also independent. But if [math]1\not\in\{i_1,...,i_q\}[/math], for all [math]\{i_1,...,i_q\}\subset\{1,...,p\}[/math], then from the definition of independence we have

[[math]] \p[C_{i_1}\cap\dotsm\cap C_{i_q}]=\p[C_{i_1}]\dotsm\p[C_{i_q}]. [[/math]]

If [math]1\in\{i_1,...,i_q\}[/math], say [math]1=i_1[/math], then

[[math]] \begin{align*} \p[C_{i_1}^C\cap C_{i_2}\cap\dotsm\cap C_{i_q}]&=\p[C_{i_1}\cap\dotsm\cap C_{i_q}]-\p[C_1\cap C_{i_2}\cap\dotsm\cap C_{i_q}]\\ &=\p[C_{i_2}]\dotsm\p[C_{i_q}]-\p[C_1]\p[C_{i_2}]\dotsm\p[C_{i_q}]\\ &=(1-\p[C_1])\p[C_{i_2}]\dotsm\p[C_{i_q}]=\p[C_1^C]\p[C_{i_2}]\dotsm\p[C_{i_q}] \end{align*} [[/math]]

■

Definition (Conditional probability)

Let [math](\Omega,\A,\p)[/math] be a probability space. Let [math]A,B\in\A[/math] such that [math]\p[B] \gt 0[/math]. The conditional probability of [math]A[/math] given [math]B[/math] is then defined as

[[math]] \p[A\mid B]=\frac{\p[A\cap B]}{\p[B]}. [[/math]]

Theorem

Let [math](\Omega,\A,\p)[/math] be a probability space. Let [math]A,B\in\A[/math] and suppose that [math]\p[B] \gt 0[/math].

[math]A[/math] and [math]B[/math] are independent if and only if
[[math]] \p[A\mid B]=\p[A]. [[/math]]
The map
[[math]] \A\to [0,1],A\mapsto \p[A\mid B] [[/math]]
defines a new probability measure on [math]\A[/math] called the conditional probability given [math]B[/math].

Show Proof

We need to show both points.

If [math]A[/math] and [math]B[/math] are independent, then
[[math]] \p[A\mid B]=\frac{\p[A\cap B]}{\p[B]}=\frac{\p[A]\p[B]}{\p[B]}=\p[A] [[/math]]
and conversely if [math]\p[A\mid B]=\p[A][/math], we get that
[[math]] \p[A\cap B]=\p[A]\p[B], [[/math]]
and hence [math]A[/math] and [math]B[/math] are independent.
Let [math]\Q[A]=\p[A\mid B][/math]. We have
[[math]] \Q[\Omega]=\p[\omega\mid B]=\frac{\p[\omega\cap B]}{\p[B]}=\frac{\p[B]}{\p[B]}=1. [[/math]]
Take [math](A_n)_{n\geq 1}\subset \A[/math] as a disjoint family of events. Then
[[math]] \begin{align*} \Q\left[\bigcup_{n\geq 1}A_n\right]&=\p\left[\bigcup_{n\geq 1}A_n\mid B\right]=\frac{\p\left[\left(\bigcup_{n\geq 1}A_n\right)\cap B\right]}{\p[B]}=\p\left[\bigcup_{n\geq 1}(A_n\cap B)\right]\\ &=\sum_{n\geq 1}\frac{\p[A_n\cap B]}{\p[B]}=\sum_{n\geq 1}\Q[A_n]. \end{align*} [[/math]]

■

Theorem

Let [math](\Omega,\A,\p)[/math] be a probability space. Let [math]A_1,...,A_n\in\A[/math] with [math]\p[A_1\cap\dotsm\cap A_n] \gt 0[/math]. Then

[[math]] \p[A_1\cap\dotsm\cap A_n]=\p[A_1]\p[A_2\mid A_1]\p[A_3\mid A_1\cap A_2]\dotsm\p[A_n\mid A_1\cap\dotsm\cap A_{n-1}]. [[/math]]

Show Proof

We prove this by induction. For [math]n=2[/math] it's just the definition of the conditional probability. Now we want to go from [math]n-1[/math] to [math]n[/math]. Therefore set [math]B=A_1\cap \dotsm\cap A_{n-1}[/math]. Then

[[math]] \p[B\cap A_n]=\p[A_n\mid B]\p[B]=\p[A_n\mid B]\p[A_1]\p[A_\mid A_1]\dotsm\p[A_{n-1}\mid A_1\cap\dotsm\cap A_{n-2}]. [[/math]]

■

Theorem

Let [math](\Omega,\A,\p)[/math] be a probability space. Let [math]\left(E_{n}\right)_{n\geq 1}[/math] be a finite or countable measurable partition of [math]\Omega[/math], such that [math]\p[E_n] \gt 0[/math] for all [math]n[/math]. If [math]A\in\A[/math], then

[[math]] \p[A]=\sum_{n\geq 1}\p[A\mid E_n]\p[E_n]. [[/math]]

Show Proof

Note that

[[math]] A=A\cap\Omega=A\cap\left(\bigcup_{n\geq 1}E_n\right)=\bigcup_{n\geq 1}(A_n\cap E_n). [[/math]]

Now since the [math](A\cap E_n)_{n\geq 1}[/math] are disjoint, we can write

[[math]] \p[A]=\sum_{n\geq 1}\p[A\cap E_n]=\sum_{n\geq 1}\p[A\mid E_n]\p[E_n]. [[/math]]

■

Theorem (Baye)

Let [math](\Omega,\A,\p)[/math] be a probability space. Let [math](E_n)_{n\geq 1}[/math] be a finite or countable partition of [math]\Omega[/math] and assume that [math]\p[A] \gt 0.[/math] Then

[[math]] \p[E_n\mid A]=\frac{\p[A\mid E_n]\p[E_n]}{\sum_{n\geq 1}\p[A\mid E_n]\p[E_n]}. [[/math]]

Show Proof

By the previous theorem we know that

[[math]] \p[A]=\sum_{n\geq 1}\p[A\mid E_n]\p[E_n],\p[E_n\mid A]=\frac{\p[E_n\cap A]}{\p[A]},\p[A\mid E_n]=\frac{\p[A\cap E_n]}{\p[E_n]} [[/math]]

Therefore, combining things, we get

[[math]] \p[E_n\mid A]=\frac{\p[E_n\cap A]}{\p[A]}=\frac{\p[A\mid E_n]\p[E_n]}{\sum_{n\geq 1}\p[A\mid E_n]\p[E_n]}. [[/math]]

■

Independent Random Variables and independent [math]\sigma[/math]-Algebras

Definition (Independence of [math]\sigma[/math]-Algebras)

Let [math](\Omega,\A,\p)[/math] be a probability space. We say that the sub [math]\sigma[/math]-Algebras [math]\B_1,...,\B_n[/math] of [math]\A[/math] are independent if for all [math] A_1\in\B_1,..., A_n\in\B_n[/math] we get

[[math]] \p[A_1\cap\dotsm \cap A_n]=\p[A_1]\dotsm\p[A_n]. [[/math]]

Let now [math]X_1,...,X_n[/math] be [math]n[/math] r.v.'s with values in measureable spaces [math](E_1,\mathcal{E}_1),...,(E_n,\mathcal{E}_n)[/math] respectively. We say that the r.v.'s [math]X_1,...,X_n[/math] are independent if the [math]\sigma[/math]-Algebras [math]\sigma(X_1),...,\sigma(X_n)[/math] are independent. This is equivalent to the fact that for all [math]F_1\in\mathcal{E}_1,...,F_n\in\mathcal{E}_n[/math] we have

[[math]] \p[\{X_1\in F_1\}\cap\dotsm\cap\{X_n\in F_n\}]=\p[X_1\in F_1]\dotsm \p[X_n\in F_n]. [[/math]]

(This comes from the fact that for all [math]i\in\{1,...,n\}[/math] we have that [math]\sigma(X_i)=\{X_i^{-1}(F)\mid F\in\mathcal{E}_i\}[/math])

If [math]\B_1,...,\B_n[/math] are [math]n[/math] independent sub [math]\sigma[/math]-Algebras and if [math]X_1,...,X_n[/math] are independent r.v.'s such that [math]X_i[/math] is [math]\B_i[/math] measurable for all [math]i\in\{1,...,n\}[/math], then [math]X_1,...,X_n[/math] are independent r.v.'s (This comes from the fact that for all [math]i\in\{1,...,n\}[/math] we have that [math]\sigma(X_i)\subset \B_i[/math]).

The [math]n[/math] events [math]A_1,...,A_n\in\A[/math] are independent if and only if [math]\sigma(A_1),...,\sigma(A_n)[/math] are independent.

Theorem (Independence of Random Variables)

Let [math](\Omega,\A,\p)[/math] be a probability space. Let [math]X_1,...,X_n[/math] be [math]n[/math] r.v.'s. Then [math]X_1,...,X_n[/math] are independent if and only if the law of the vector [math](X_1,...,X_n)[/math] is the product of the laws of [math]X_1,...,X_n[/math], i.e.

[[math]] \p_{(X_1,...,X_n)}=\p_{X_1}\otimes\dotsm\otimes \p_{X_n}. [[/math]]

Moreover, for every measurable map [math]f_i:(E_i,\mathcal{E}_i)\to\R_+[/math] defined on a measurable space [math](E_i,\mathcal{E}_i)[/math] for all [math]i\in\{1,...,n\}[/math], we have

[[math]] \E\left[\prod_{i=1}^nf_i(X_i)\right]=\prod_{i=1}^n\E[f_i(X_i)]. [[/math]]

Show Proof

Let [math]F_i\in\mathcal{E}_i[/math] for all [math]i\in\{1,...,n\}[/math]. Thus we have

[[math]] \p_{(X_1,...,X_n)}(F_1\times\dotsm \times F_n)=\p[\{X_1\in F_1\}\cap\dotsm\cap\{X_n\in F_n\}] [[/math]]

and on the other hand

[[math]] \left(\p_{X_1}\otimes\dotsm\otimes\p_{X_n}\right)(F_1\times\dotsm\times F_n)=\p_{X_1}[F_1]\dotsm\p_{X_n}[F_n]=\prod_{i=1}^n\p_{X_i}[F_i]=\prod_{i=1}^n\p[X_i\in F_i]. [[/math]]

If [math]X_1,...,X_n[/math] are independent, then

[[math]] \p_{(X_1,...,X_n)}(F_1\times\dotsm \times F_n)=\prod_{i=1}^n\p[X_i\in F_i]=\left(\p_{X_1}\otimes\dotsm\otimes \p_{X_n}\right)(F_1\times\dotsm\times F_n), [[/math]]

which implies that [math] \p_{(X_1,...,X_n)}[/math] and [math]\p_{X_1}\otimes\dotsm\otimes \p_{X_n}[/math] are equal on rectangles. Hence the monotone class theorem implies that

[[math]] \p_{(X_1,...,X_n)}=\p_{X_1}\otimes\dotsm\otimes\p_{X_n}. [[/math]]

Conversely, if [math]\p_{(X_1,...,X_n)}=\p_{X_1}\otimes\dotsm\otimes\p_{X_n}[/math], then for all [math]F_i\in\mathcal{E}_i[/math], with [math]i\in\{1,...,n\}[/math], we get that

[[math]] \p_{(X_1,...,X_n)}(F_1\times\dotsm\times F_n)=\left(\p_{X_1}\otimes\dotsm\otimes\p_{X_n}\right)(F_1\times\dotsm\times F_n) [[/math]]

and therefore

[[math]] \p[\{X_1\in F_1\}\cap\dotsm\cap\{X_n\in F_n\}]=\p[X_1\in F_1]\dotsm\p[X_n\in F_n]. [[/math]]

This implies that [math]X_1,...,X_n[/math] are independent. For the second assumption we get

[[math]] \E\left[\prod_{i=1}^nf_i(X_i)\right]=\int_{E_1\times\dotsm\times E_n}\prod_{i=1}^nf_i(X_i)\underbrace{P_{X_1}dx_1\dotsm P_{X_n}dx_n}_{\p_{X_1,...,X_n}(dx_1\dotsm dx_n)}=\prod_{i=1}^n\int_{E_i}f_i(x_i)P_{X_i}dx_i=\prod_{i=1}^n\E[f_i(X_i)], [[/math]]

where we have used the first part and Fubini's theorem.

■

We see from the proof above that as soon as for all [math]i\in\{1,...,n\}[/math] we have [math]\E[\vert f_i(X_i)\vert] \lt \infty[/math], it follows that

[[math]] \E\left [\prod_{i=1}^n f_i(X_i)\right]=\prod_{i=1}^n\E[ f_i(X_i) ]. [[/math]]

Indeed, the previous result shows that

[[math]] \E\left[\prod_{i=1}^n\vert f_i(X_i)\vert\right]=\prod_{i=1}^n\E[\vert f_i(X_i)\vert] \lt \infty [[/math]]

and thus we can apply Fubini's theorem. In particular if [math]X_1,...,X_n\in L^1(\Omega,\A,\p)[/math] and independent, we get that

[[math]] \E\left[\prod_{i=1}^nX_i\right]=\prod_{i=1}^n\E[X_i]. [[/math]]

Corollary

Let [math](\Omega,\A,\p)[/math] be a probability space. Let [math]X_1[/math] and [math]X_2[/math] be two independent r.v.'s in [math]L^2(\Omega,\A,\p)[/math]. Then we get

[[math]] Cov(X_1,X_2)=0. [[/math]]

Show Proof

Recall that if [math]X\in L^2(\Omega,\A,\p)[/math], we also have that [math]X\in L^1(\Omega,\A,\p)[/math]. Thus

[[math]] Cov(X_1,X_2)=\E[X_1X_2]-\E[X_1]\E[X_2]=\E[X_1]\E[X_2]-\E[X_1]\E[X_2]=0. [[/math]]

■

Note that the converse is not true! Let [math]X_1\sim\mathcal{N}(0,1)[/math]. We can also take for [math]X_1[/math] any symmetric r.v. in [math]L^2(\Omega,\A,\p)[/math] with density [math]P(x)[/math], such that [math]P(-x)=P(x)[/math]. Recall that being in [math]L^2(\Omega,\A,\p)[/math] simply means

[[math]] \E[X^2]=\int_\R x^2 P(x)dx \lt \infty, [[/math]]

which implies that [math]P(x)=P(-x)[/math] and thus [math]\E[X^2]=\int_\R x^2P(x)dx=0.[/math] Now consider a r.v. [math]Y[/math] with values in [math]\{-1,+1\}[/math]. Then we get [math]\p[Y=1]=\p[Y=-1]=\frac{1}{2}[/math] and thus [math]Y[/math] is independent of [math]X_1[/math]. Define [math]X_2:=YX_1[/math] and observe then

[[math]] Cov(X_1,X_2)=\E[X_1X_2]-\E[X_1]\E[X_2]=\E[YX_1^2]-\E[YX_1]\E[X_1] [[/math]]

and hence

[[math]] \E[Y]\E[X_1^2]-\E[Y]\E^2[X_1]=0-0=0. [[/math]]

If [math]X_1[/math] and [math]X_2[/math] are independent, we note that [math]\vert X_1\vert[/math] and [math]\vert X_2\vert[/math] would also be independent. But [math]\vert X_2\vert =\vert Y\vert \vert X_1\vert=\vert X_1\vert[/math]. This would mean that [math]\vert X_1\vert[/math] is independent of itself. So it follows that [math]\vert X_1\vert[/math] is equal to a constant a.s. If [math]c=\E[\vert X_1\vert][/math], and we want to look at [math]\E[(\vert X_1\vert-c)^2][/math], we now know that [math]\vert X_1\vert -c[/math] is independent of itself. Therefore we get

[[math]] \E[(\vert X_1\vert-c)^2]=\E[\vert X_1\vert-c]\E[\vert X_1\vert-c]=0\Longrightarrow \vert X_1\vert=c\text{a.s.} [[/math]]

This cannot happen since [math]\vert X_1\vert[/math] is the absolute value of a standard Gaussian distribution, which has a density given by

[[math]] P(x)=\frac{1}{\sqrt{2\pi}}e^{-\frac{x^2}{2}}. [[/math]]

Corollary

Let [math](\Omega,\A,\p)[/math] be a probability space. Let [math]X_1,...,X_n[/math] be [math]n[/math] r.v.'s with values in [math]\R[/math].

Assume that for [math]i\in \{1,...,n\}[/math], [math]\p_{X_i}[/math] has density [math]P_i[/math] and that the r.v.'s [math]X_1,...,X_n[/math] are independent. Then the law of [math](X_1,...,X_n)[/math] also has density given by [math]P(x_1,...,x_n)=\prod_{i=1}^nP_i(x_i)[/math].
Conversely assume that the law of [math](X_1,...,X_n)[/math] has density [math]P(x_1,...,x_n)=\prod_{i=1}^nq_i(x_i)[/math], where [math]q_i[/math] is Borel measurable and positive. Then the r.v.'s [math]X_1,...,X_n[/math] are independent and the law of [math]X_i[/math] has density [math]P_i=c_iq_i[/math], with [math]c_i \gt 0[/math] for [math]i\in\{1,...,n\}[/math].

Show Proof

We only need to show [math](ii)[/math]. From Fubini we get

[[math]] \int_\R\prod_{i=1}^nq_i(x_i)dx_i=\prod_{i=1}^n\int_\R q_i(x_i)dx_i=\int_{\R^{n}}P(x_1,...,x_n)dx_1\dotsm dx_n=1. [[/math]]

which implies that [math]K_i:=\int_\R q_i(x_i)dx_i\in(0,\infty)[/math], for all [math]i\in\{1,...,n\}[/math]. Now we know that the law of [math]X_i[/math] has density [math]P_i[/math] given by

[[math]] P_i(x_i)=\int_{\R^{n-1}}P(x_1,...,x_{i-1},x_i,x_{i+1},...,x_n)dx_1\dotsm dx_{i-1}dx_{i+1}\dotsm dx_n=\left(\prod_{j\not=i}K_j\right)q_i(x_i)=\frac{1}{K_i}q_i(x_i). [[/math]]

We can rewrite

[[math]] P(x_1,...,x_n)=\prod_{i=1}^nq_i(x_i)=\prod_{i=1}^nP_i(x_i). [[/math]]

Hence we get [math]P(x_1,...,x_n)=\p_{X_1}\otimes\dotsm \otimes \p_{X_n}[/math] and therefore [math]X_1,...,X_n[/math] are independent.

■

Example

Let [math]U[/math] be a r.v. with exponential distribution. Let [math]V[/math] be a uniform r.v. on [math][0,1][/math]. We assume that [math]U[/math] and [math]V[/math] are independent. Define the r.v.'s [math]X=\sqrt{U}\cos(2\pi V)[/math] and [math]Y=\sqrt{U}\sin(2\pi V)[/math]. Then [math]X[/math] and [math]Y[/math] are independent. Indeed, for a measurable function [math]\varphi:\R^2\to \R_+[/math] we get

[[math]] \E[\varphi(X,Y)]=\int_0^\infty\int_{0}^1\varphi(\sqrt{u}\cos(2\pi v),\sqrt{u}\sin(2\pi v))e^{u}dudv [[/math]]

[[math]] =\frac{1}{\sqrt{\pi}}\int_{0}^\infty\int_0^{2\pi}\varphi(r\cos(\theta),r\sin(\theta))re^{-r^2}drd\theta, [[/math]]

which implies that [math](X,Y)[/math] has density [math]\frac{e^{-x^2}e^{-y^2}}{\pi}[/math] on [math]\R\times\R[/math]. With the previous corollary we get that [math]X[/math] and [math]Y[/math] are independent and [math]X[/math] and [math]Y[/math] have the same density [math]P(x)=\frac{1}{\sqrt{\pi}}e^{-x^2}[/math]. This means that [math]X[/math] and [math]Y[/math] are independent.

We write [math]X\stackrel{law}{=}Y[/math] to say that [math]\p_X=\p_Y[/math]. Thus in the example above we would have

[[math]] X\stackrel{law}{=}Y\sim\mathcal{N}(0,\frac{1}{2}). [[/math]]

Important facts

Let [math]X_1,...,X_n[/math] be [math]n[/math] real valued r.v.'s. Then the following are equivalent

[math]X_1,...,X_n[/math] are independent.
For [math]X=(X_1,...,X_n)\in\R^n[/math] we have
[[math]] \Phi_X(\xi_1,...,\xi_n)=\prod_{i=1}^n\Phi_{X_i}(\xi_i). [[/math]]
For all [math]a_1,..,a_n\in\R[/math], we have
[[math]] \p[X_1\leq a_1,..,X_n\leq a_n]=\prod_{i=1}^n\p[X_i\leq a_i] [[/math]]
If [math]f_1,...,f_n:\R\to\R_+[/math] are continuous, measurable maps with compact support, then
[[math]] \E\left[\prod_{i=1}^nf_i(X_i)\right]=\prod_{i=1}^n\E[f_i(X_i)]. [[/math]]

\begin{proof} First we show [math](i)\Longrightarrow (ii)[/math]. By definition and the iid property, we get

[[math]] \Phi_X(\xi_1,..,\xi_n)=\E\left[e^{i(\xi_1X_1+...+\xi_nX_n)}\right]=\E\left[e^{i\xi_1X_1}\dotsm e^{i\xi_nX_n}\right]=\prod_{i=1}^n\E[e^{i\xi X_1}]=\prod_{i=1}^n\Phi_{X_i}(\xi_i), [[/math]]

where the map [math]t\mapsto e^{it}[/math] is measurable and bounded. Next we show [math](ii)\Longrightarrow (i)[/math]. Note that by theorem we have [math]\p_X=\p_Y[/math] if

[[math]] \Phi_X(\xi_1,...,\xi_n)=\Phi_Y(\xi_1,...,\xi_n). [[/math]]

Now if [math]\Phi_X(\xi_1,...,\xi_n)=\prod_{i=1}^n\Phi_{X_i}(\xi_i)[/math], we note that [math]\prod_{i=1}^n\Phi_{X_i}(\xi_i)[/math] is the characteristic function of the probability distribution if the probability distribution is [math]\p_{X_1}\otimes\dotsm \otimes\p_{X_n}[/math]. Now from injectivity it follows that [math]\p_{(X_1,...,X_n)}=\p_{X_1}\otimes\dotsm\otimes\p_{X_n}[/math], which implies that [math]X_1,...,X_n[/math] are independent. \end{proof}

Proposition

Let [math](\Omega,\A,\p)[/math] be a probability space. Let [math]\B_1,...,\B_n\subset\A[/math] be sub [math]\sigma[/math]-Algebras of [math]\A[/math]. For every [math]i\in\{1,...,n\}[/math], let [math]\mathcal{C}_i\subset\B_i[/math] be a family of subsets of [math]\Omega[/math] such that [math]\mathcal{C}_i[/math] is stable under finite intersection and [math]\sigma(\mathcal{C}_i)=\B_i[/math]. Assume that for all [math]C_i\in\mathcal{C}_i[/math] with [math]i\in\{1,...,n\}[/math] we have

[[math]] \p\left[\prod_{i=1}^nC_i\right]=\prod_{i=1}^n\p[C_i]. [[/math]]

Then [math]\B_1,...,\B_n[/math] are independent [math]\sigma[/math]-Algebras.

Show Proof

Let us fix [math]C_2\in \mathcal{C}_2,...,C_n\in\mathcal{C}_n[/math] and define

[[math]] M_1:=\left\{B_1\in\B_1\mid \p[B_1\cap C_2\cap\dotsm\cap C_2]=\p[B_1]\p[C_2]\dotsm \p[C_n]\right\}. [[/math]]

Now since [math]\mathcal{C}_1\subset M_1[/math] and [math]M_1[/math] is a monotone class, we get [math]\sigma(\mathcal{C}_1)=\B_1\subset M_1[/math] and thus [math]\B_1=M_1[/math]. Let now [math]B_1\in\B_1,[/math] [math]C_3\in\mathcal{C}_3,...,C_n\in\mathcal{C}_n[/math] and define

[[math]] M_2:=\{B_2\in\B_2\mid \p[B_2\cap B_1\cap C_3\cap\dotsm\cap C_n]=\p[B_2]\p[B_1]\p[C_3]\dotsm\p[C_n]\}. [[/math]]

Again, since [math]\mathcal{C}_2\subset M_2[/math], we get [math]\sigma(\mathcal{C}_2)=\B_2\subset M_2[/math] and thus [math]B_2=M_2[/math]. By induction we complete the proof.

■

[math]Consequence:[/math] Let [math]\B_1,...,\B_n[/math] be [math]n[/math] independent [math]\sigma[/math]-Algebras and let [math]m_0=0 \lt m_1 \lt ... \lt m_p=n[/math]. Then the [math]\sigma[/math]-Algebras

[[math]] \begin{align*} \mathcal{D}_1&=\B_1\lor\dotsm\lor\B_n=\sigma(\B_1,...,\B_n)=\sigma\left(\bigcup_{k=1}^n\B_k\right)\\ \mathcal{D}_2&=\B_{m_i+1}\lor\dotsm\lor\B_{n_2}\\ \vdots\\ \mathcal{D}_p&=\B_{n_{p-1}+1}\lor\dotsm\lor\B_{n_p} \end{align*} [[/math]]

are also independent. Indeed, we can apply The previous proposition to the class of sets

[[math]] C_j=\{B_{n_{j-1}+1}\cap\dotsm\cap B_{n_j}\mid B_i\in\mathcal{C}_i, i\in\{n_{j-1}+1,...,n_j\}\}. [[/math]]

In particular if [math]X_1,...,X_n[/math] are independent r.v.'s, then

[[math]] \begin{align*} Y_1&=(X_1,...,X_n)\\ \vdots\\ Y_p&=(X_{n_{p_1}},...,X_{n_p}) \end{align*} [[/math]]

are also independent.

Example

Let [math]X_1,...,X_4[/math] be real valued independent r.v.'s. Then [math]Z_1=X_1X_3[/math] and [math]Z_2=X_2^3+X_4[/math] are independent and [math]Z_3=\sigma(X_1,X_3)[/math] and [math]Z_4=\sigma(X_2,X_4)[/math] are measurable. From above [math]\sigma(X_1,X_3)[/math] and [math]\sigma(X_2,X_4)[/math] are independent if for [math]X:\Omega\to\R[/math] we have that [math]Y[/math] is [math]\sigma(X)[/math] measurable if and only if [math]Y=f(X)[/math] with [math]f[/math] being a measurable map, i.e. if [math]Y[/math] is [math]\sigma(X_1,...,X_n)[/math] measurable, then [math]Y=f(X_1,....,X_n)[/math].

Proposition (Independence for an infinite family)

Let [math](\Omega,\A,\p)[/math] be a probability space. Let [math](\B_i)_{i\in I}[/math] be an infinite family of sub [math]\sigma[/math]-Algebras of [math]A[/math]. We say that the family [math](\B_i)_{i\in I}[/math] is independent if for all [math]\{i_1,..,i_p\}\in I[/math], [math]\B_{i_1},...,\B_{i_p}[/math] are independent. If [math](X_i)_{i\in I}[/math] is a family of r.v.'s we say that they are independent if [math](\sigma(X_i))_{i\in I}[/math] is independent.

Proposition

Let [math](\Omega,\A,\p)[/math] be a probability space. Let [math](X_n)_{n\geq 1}[/math] be a sequence of independent r.v.'s. Then for all [math]p\in\N[/math] we get that [math]p_1=\sigma(X_1,...,X_p)[/math] and [math]p_2=\sigma(X_{p+1},...,X_n)[/math] are independent.

Show Proof

Apply Proposition 5.9. to [math]\mathcal{C}_1=\sigma(X_1,...,X_p)[/math] and [math]\mathcal{C}_2=\bigcup_{k=p+1}^\infty\sigma(X_{p+1},...,X_n)\in\B_2[/math].

■

The Borel-Cantelli Lemma

Let [math](\Omega,\A,\p)[/math] be a probability space. Let [math](A_n)_{n\in\N}[/math] be a sequence of events in [math]\A[/math]. Recall that we can write

[[math]] \limsup_{n\to \infty} A_n=\bigcap_{n=0}^\infty\left(\bigcup_{k=n}^\infty A_k\right)\text{and}\liminf_{n\to \infty} A_n=\bigcup_{n=0}^\infty\left(\bigcap_{k=n}^\infty A_k\right). [[/math]]

Moreover, both are again measurable sets. For [math]\omega\in\limsup_n A_n[/math] we get that [math]\omega\in\bigcup_{k=n}^\infty A_k[/math], for all [math]n\geq 0[/math]. Moreover, for all [math]n\geq 0[/math], there exists a [math]k\geq n[/math] such that, [math]\omega\in A_n[/math] and [math]\omega[/math] is in infinitely many [math]A_k[/math]'s. For [math]\omega\in\liminf_n A_n[/math], we get that for all [math]n\geq 0[/math] such that [math]\omega\in\bigcap_{k=n}^\infty A_k[/math], there exists [math]n\geq 0[/math], such that for all [math]k\geq n[/math] we have [math]\omega\in A_k[/math], which shows that [math]\liminf_nA_n\subset \limsup_nA_n[/math].

Lemma (Borel-Cantelli)

Let [math](\Omega,\A,\p)[/math] be a probability space. Let [math](A_n)_{n\in\N}\in\A[/math] be a family of measurable sets.

If [math]\sum_{n\geq 1}\p[A_n] \lt \infty[/math], then
[[math]] \p\left[\limsup_{n\to\infty} A_n\right]=0, [[/math]]
which means that the set [math]\{n\in\N\mid \omega\in A_n\}[/math] is a.s. finite.
If [math]\sum_{n\geq 1}\p[A_n]=\infty[/math], and if the events [math](A_n)_{n\in\N}[/math] are independent, then
[[math]] \p\left[\limsup_{n\to\infty} A_n\right]=1, [[/math]]
which means that the set [math]\{n\in\N\mid \omega\in A_n\}[/math] is a.s. finite.

Show Proof

We need to show both points.

If [math]\sum_{n\geq 1}\p[A_n] \lt \infty,[/math] then, by Fubini, we get
[[math]] \E\left[\sum_{n\geq 1}\one_{A_n}\right]=\sum_{n\geq 1}\p[A_n], [[/math]]
which implies that [math]\sum_{n\geq 1}\one_{A_n} \lt \infty[/math] and [math]\one_{A_n}\not=0[/math] a.s. for finite numbers of [math]n[/math].
Fix [math]n_0\in\N[/math] and note that for all [math]n\geq n_0[/math] we have
[[math]] \p\left[\bigcap_{k=n_0}^nA_k^C\right]=\prod_{k=n_0}^n\p[A_k^C]=\prod_{k=n_0}^n\p[1-A_n]. [[/math]]
Now we see that
[[math]] \sum_{n\geq 1}\p[A_n]=\infty [[/math]]
and thus
[[math]] \p\left[\bigcap_{k=n_0}^nA_k^C\right]=0. [[/math]]
Since this is true for every [math]n_0[/math] we have that
[[math]] \p\left[\bigcup_{n=0}^\infty\bigcap_{k=n_0}^\infty A_k^C\right]\leq \sum_{n\geq 1}\p[A_k^C]=0. [[/math]]
Hence we get
[[math]] \p\left[\bigcup_{n=0}^\infty\bigcap_{k=n_0}^\infty A_k^C\right]=\p\left[\bigcap_{n=0}^\infty\bigcup_{k=n}^\infty A_k\right]=\p\left[\limsup_{n\to\infty} A_n\right]=1. [[/math]]

■

Application 1

Let [math](\Omega,\A,\p)[/math] be a probability space. There does not exist a probability measure on [math]\N[/math] such that the probability of the set of multiples of an integer [math]n[/math] is [math]\frac{1}{n}[/math] for [math]n\geq 1[/math]. Let us assume that such a probability measure exists. Let [math]\tilde{p}[/math] denote the set of prime numbers. For [math]p\in\tilde{p}[/math] we note that [math]A_p=p\N[/math], i.e. the set of all multiples of [math]p[/math]. We first show that the sets [math](A_p)_{p\in\tilde{p}}[/math] are independent. Indeed let [math]p_1,...,p_n\in\tilde{p}[/math] be distinct. Then we have

[[math]] \p[p_1\N\cap\dotsm\cap p_n\N]=\p[p_1,...,p_n\N]=\frac{1}{p_1\dotsm p_n}=\p[p_1\N]\dotsm\p[p_n\N]. [[/math]]

Moreover it is known that

[[math]] \sum_{p\in\tilde{p}}\p[p\N]=\sum_{p\in\tilde{p}}\frac{1}{p}=\infty. [[/math]]

The second part of the Borel-Cantelli lemma implies that all integers [math]n[/math] belong to infinitely many [math]A_p[/math]'s. So it follows that [math]n[/math] is divisible by infinitely many distinct prime numbers.

Application 2

Let [math](\Omega,\A,\p)[/math] be a probability space. Let [math]X[/math] be an exponential r.v. with parameter [math]\lambda=1[/math]. Thus we know that [math]X[/math] has density [math]e^{-x}\one_{\R_+}(x)[/math]. Now consider a sequence [math](X_n)_{n\geq 1}[/math] of independent r.v.'s with the same distribution as [math]X[/math], i.e. for all [math]n\geq 1[/math],we have [math]X_n\sim X[/math]. Then [math]\limsup_n \frac{X_n}{\log(n)}=1[/math] a.s., i.e. there exists an [math]N\in\A[/math] such that [math]\p[N]=0[/math] and for [math]\omega\not\in N[/math] we get

[[math]] \limsup_{n\to\infty} \frac{X_n(\omega)}{\log(n)}=1. [[/math]]

Therefore we can compute the probability

[[math]] \p[X \gt t]=\int_t^\infty e^{-x}dx=e^{-t}. [[/math]]

Now let [math]\epsilon \gt 0[/math] and consider the sets [math]A_n=\{X_n \gt (1+\epsilon)\log(n)\}[/math] and [math]B_n=\{X_n \gt \log(n)\}[/math]. Then

[[math]] \p[A_n]=\p[X_n \gt (1+\epsilon)\log(n)]=\p[X \gt (1+\epsilon)\log(n)]=e^{-(1+\epsilon)\log(n)}=\frac{1}{n^{1+\epsilon}}. [[/math]]

This implies that

[[math]] \sum_{n\geq 1}\p[A_n] \lt \infty. [[/math]]

With the Borel-Cantelli lemma we get that [math]\p\left[\limsup_{n\to\infty} A_n\right]=0[/math]. Let us define

[[math]] N_{\epsilon}=\limsup_{n\to\infty} A_n. [[/math]]

Then we have [math]\p[N_\epsilon]=0[/math] for [math]\omega\not\in N_{\epsilon}[/math], which implies that there exists an [math]n_0(\omega)[/math] such that for all [math]n\geq n_0[/math] we have

[[math]] X_n(\omega)\leq (1+\epsilon)\log(n) [[/math]]

and thus for [math]\omega\not\in N_{\epsilon}[/math], we get [math]\limsup_{n\to\infty}\frac{X_n(\omega)}{\log(n)}\leq 1+\epsilon[/math]. Moreover, let

[[math]] N'=\bigcup_{\epsilon\in \Q_+}N_{\epsilon}. [[/math]]

Therefore we get [math]\p[N']\leq \sum_{\epsilon\in\Q_+}\p[N_{\epsilon}]=0[/math] for [math]\omega\not\in N'[/math]. Hence we get

[[math]] \limsup_{n\to\infty}\frac{X_n(\omega)}{\log(n)}\leq 1. [[/math]]

Now we note that the [math]B_n[/math]'s are independent, since [math]B_n\in\sigma(X_n)[/math] and the fact that the [math]X_n[/math]'s are independent. Moreover,

[[math]] \p[B_n]=\p[X_n \gt \log(n)]=\p[X \gt \log(n)]=\frac{1}{n}, [[/math]]

which gives that

[[math]] \sum_{n\geq 1}\p[B_n]=\infty. [[/math]]

Now we can use Borel-Cantelli to get

[[math]] \p\left[\limsup_{n\to\infty} B_n\right]=1. [[/math]]

If we denote [math]N''=\left(\limsup_{n\to\infty} B_n\right)^C[/math], then for [math]\omega\not\in N''[/math] we get that [math]X_n(\omega) \gt \log(n)[/math] for infinitely many [math]n[/math]. So it follows that for [math]\omega\not\in N''[/math] we have

[[math]] \limsup_{n\to\infty}\frac{X_n(\omega)}{\log(n)}\geq 1. [[/math]]

Finally, take [math]N=N'\cup N''[/math] to obtain [math]\p[N]=0[/math]. Thus for [math]\omega\not\in N[/math] we get

[[math]] \limsup_{n\to\infty} \frac{X_n(\omega)}{\log(n)}=1. [[/math]]

Sums of independent Random Variables

Let us first define the convolution of two probability measures. If [math]\mu[/math] and [math]\nu[/math] are two probability measures on [math]\R^d[/math], we denote by [math]\mu*\nu[/math] the image of the measure [math]\mu\otimes\nu[/math] by the application

[[math]] \R^d\times\R^d\to\R^d,(x,y)\mapsto x+y. [[/math]]

Moreover, for all measurable maps [math]\varphi:\R^d\to \R_+[/math], we have

[[math]] \int_{\R^d}\varphi(z)(\mu*\nu)(dz)=\iint_{\R^d}\varphi(x+y)\mu(dx)\nu(dy). [[/math]]

Proposition

Let [math](\Omega,\A,\p)[/math] be a probability space. Let [math]X[/math] and [math]Y[/math] be two independent r.v.'s with values in [math]\R^d[/math]. Then the following hold.

The law of [math]X+Y[/math] is given by [math]\p_X*\p_Y[/math]. In particular if [math]X[/math] has density [math]f[/math] and [math]Y[/math] has density [math]g[/math], then [math]X+Y[/math] has density [math]f*g[/math], where [math]*[/math] denotes the convolution product, which is given by
[[math]] f*g(\xi)=\int_{\R^d} f(x)g(\xi-x)dx. [[/math]]
[math]\Phi_{X+Y}(\xi)=\Phi_X(\xi)\Phi_Y(\xi).[/math]
If [math]X[/math] and [math]Y[/math] are in [math]L^2(\Omega,\A,\p)[/math], we get
[[math]] K_{X+Y}=K_X+K_Y. [[/math]]
In particular when [math]d=1[/math], we obtain
[[math]] Var(X+Y)=Var(X)+Var(Y). [[/math]]

Show Proof

We need to show all three points.

If [math]X[/math] and [math]Y[/math] are independent r.v.'s, then [math]\p_{(X,Y)}=\p_X\otimes\p_Y[/math]. Consequently, for all measurable maps [math]\varphi:\R^d\to\R_+[/math], we have
[[math]] \begin{multline*} \E[\varphi(X+Y)]=\iint_{\R^d}\varphi(X+Y)\p_{(X,Y)}(dxdy)=\iint_{\R^d}\varphi(X+Y)\p_X(dx)\p_{Y}(dy)\\=\int_{\R^d}\varphi(\xi)(\p_X*\p_Y)(d\xi). \end{multline*} [[/math]]
Now since [math]X[/math] and [math]Y[/math] have densities [math]f[/math] and [math]g[/math] respectively, we get
[[math]] \E[\varphi(Z=X+Y)]=\iint_{\R^d}\varphi(X+Y)f(x)*g(y)dxdy=\iint_{\R^d}\varphi(\xi)\left(\int_{\R^d} f(x)g(\xi-x)dx\right)d\xi. [[/math]]
Since this identity here is true for all measurable maps [math]\varphi:\R^d\to\R_+[/math], the r.v. [math]Z:=X+Y[/math] has density
[[math]] h(\xi)=(f*g)(\xi)=\int_{\R^d}f(x)g(\xi-x)dx. [[/math]]
By definition of the characteristic function and the independence property, we get
[[math]] \Phi_{X+Y}(\xi)=\E\left[e^{i\xi(X+Y)}\right]=\E\left[e^{i\xi X}e^{i\xi Y}\right]=\E\left[e^{i\xi X}\right]\E\left[e^{i\xi Y}\right]=\Phi_X(\xi)\Phi_Y(\xi). [[/math]]
If [math]X=(X_1,...,X_d)[/math] and [math]Y=(Y_1,...,Y_d)[/math] are independent r.v.'s on [math]\R^d[/math], we get that [math]Cov(X_i,Y_j)=0[/math], for all [math]0\leq i,j\leq d[/math]. By using the multi linearity of the covariance we get that
[[math]] Cov(X_i+Y_i,X_j+Y_j)=Cov(X_i,X_j)+Cov(Y_j+Y_j), [[/math]]
and hence [math]K_{X+Y}=K_X+K_Y[/math]. For [math]d=1[/math] we get
[[math]] \begin{align*} Var(X+Y)&=\E[((X+Y)-\E[X+Y])^2]=\E[((X-\E[X])+(Y-\E[Y]))^2]\\ &=\underbrace{\E[(X-\E[X])^2]}_{Var(X)}+\underbrace{\E[(Y-\E[Y])^2]}_{Var(Y)}+\underbrace{2\E[(X-\E[X])(Y-\E[Y])]}_{2Cov(X,Y)} \end{align*} [[/math]]
Now since [math]Cov(X,Y)=0[/math], we get the result.

■

Theorem (Weak law of large numbers)

Let [math](\Omega,\A,\p)[/math] be a probability space. Let [math](X_n)_{n\geq 1}[/math] be a sequence of independent r.v.'s. Moreover, write [math]\mu=\E[X_n][/math] for all [math]n\geq1[/math] and assume [math]\E[(X_n-\mu)^2]\leq C[/math] for all [math]n\geq1[/math] and for some constant [math]C \lt \infty[/math]. We also write [math]S_n=\sum_{j=1}^nX_j[/math] and [math]\tilde X_n=\frac{S_n}{n}[/math] for all [math]n\geq 1[/math]. Then for all [math]\epsilon \gt 0[/math]

[[math]] \p[\vert \tilde X_n-\mu\vert \gt \epsilon]\xrightarrow{n\to\infty}0. [[/math]]

Thus, we also have

[[math]] \E[S_n]=\frac{1}{n}\E\left[\sum_{j=1}^nX_j\right]=\frac{1}{n}n\E[X_j]=\E[X_j]. [[/math]]

Show Proof

We note that

[[math]] \E[(S_n-n\mu)^2]=\sum_{j=1}^n\E[(X_j-\mu)^2]\leq nC. [[/math]]

Hence for [math]\epsilon \gt 0[/math] we get by Markov's inequality

[[math]] \p[\vert \tilde X-\mu \gt \epsilon]=\p[(S_n-n\mu)^2 \gt (n\epsilon)^2]\leq \frac{\E[(S_n-n\mu)^2]}{n^2\epsilon^2}\leq \frac{C}{n\epsilon^2}\xrightarrow{n\to\infty}0 [[/math]]

■

Corollary

Let [math](\Omega,\A,\p)[/math] be a probability space. Let [math](A_n)_{n\geq 1}\in \A[/math] be a sequence of independent events with the same probabilities, i.e. [math]\p[A_n]=\p[A_m][/math], for all [math]n,m\geq 1[/math]. Then

[[math]] \lim_{n\to\infty}\frac{1}{n}\sum_{i=1}^\infty \one_{A_i}=\p[A_1]a.s. [[/math]]

Show Proof

Note that by the weak law of large numbers, we get for a sequence of independent r.v.'s [math](X_n)_{n\geq 1}[/math] with the same expectation for all [math]n\geq 1[/math]

[[math]] \lim_{n\to\infty}\E\left[\frac{1}{n}\sum_{j=1}^nX_j\right]=\E[X_1] [[/math]]

and thus we can take [math]X_j=\one_{A_j}[/math], since we know that [math]\E[\one_A]=\p[A][/math].

■

General references

Moshayedi, Nima (2020). "Lectures on Probability Theory". arXiv:2010.16280 [math.PR].