guide:Aec06a58f6: Difference between revisions

From Stochiki
No edit summary
 
No edit summary
 
Line 1: Line 1:
<div class="d-none"><math>
\newcommand{\R}{\mathbb{R}}
\newcommand{\A}{\mathcal{A}}
\newcommand{\B}{\mathcal{B}}
\newcommand{\N}{\mathbb{N}}
\newcommand{\C}{\mathbb{C}}
\newcommand{\Rbar}{\overline{\mathbb{R}}}
\newcommand{\Bbar}{\overline{\mathcal{B}}}
\newcommand{\Q}{\mathbb{Q}}
\newcommand{\E}{\mathbb{E}}
\newcommand{\p}{\mathbb{P}}
\newcommand{\one}{\mathds{1}}
\newcommand{\0}{\mathcal{O}}
\newcommand{\mat}{\textnormal{Mat}}
\newcommand{\sign}{\textnormal{sign}}
\newcommand{\CP}{\mathcal{P}}
\newcommand{\CT}{\mathcal{T}}
\newcommand{\CY}{\mathcal{Y}}
\newcommand{\F}{\mathcal{F}}
\newcommand{\mathds}{\mathbb}</math></div>
===Independent events===
Let <math>(\Omega,\A,\p)</math> be a probability space. If <math>A,B\in\A</math>, we say that <math>A</math> and <math>B</math> are independent if


<math display="block">
\p[A\cap B]=\p[A]\p[B].
</math>
'''Example'''
[Throw of a die] We have the state space <math>\Omega=\{1,2,3,4,5,6\}</math>, <math>\omega\in\Omega</math>. Hence we have <math>\p[\{\omega\}]=\frac{1}{6}</math>. Now let <math>A=\{1,2\}</math> and <math>B=\{1,3,5\}</math>. Then
<math display="block">
\p[A\cap B]=\p[\{1\}]=\frac{1}{6}\text{and}\p[A]=\frac{1}{3},\p[B]=\frac{1}{2}
</math>
Therefore we get
<math display="block">
\p[A\cap B]=\p[A]\p[B].
</math>
Hence we get that <math>A</math> and <math>B</math> are independent.
{{definitioncard|Independence of events|We say that the <math>n</math> events <math>A_1,...,A_n\in\A</math> are independent if <math>\forall \{j_1,...,j_l\}\subset\{1,...,n\}</math> we have
<math display="block">
\p[A_{j_1}\cap A_{j_2}\cap\dotsm \cap A_{j_l}]=\p[A_{j_1}]\dotsm \p[A_{j_l}].
</math>
}}
{{alert-info |
It is not enough to have <math>\p[A_1\cap\dotsm\cap A_n]=\p[A_1]\dotsm\p[A_n]</math>. It is also not enough to check that <math>\forall \{i,j\}\subset\{1,...,n\}</math>, <math>\p[A_i\cap A_j]=\p[A_i]\p[A_j]</math>. For instance, let us consider two tosses of a coin and consider events <math>A,B</math> and <math>C</math> given by
<math display="block">
A=\{\text{$H$ at the first throw}\},B=\{\text{$T$ at the first throw}\},C=\{\text{same outcome for both tosses}\}
</math>
The events <math>A,B</math> and <math>C</math> are two by two independent but <math>A,B</math> and <math>C</math> are not independent events.
}}
{{proofcard|Proposition|prop-1|The <math>n</math> events <math>A_1,...,A_n\in\A</math> are independent if and only if
<math display="block">
(*)\p[B_1\cap\dotsm \cap B_n]=\p[B_1]\dotsm\p[B_n]
</math>
for all <math>B_i\in\sigma(A_i)=\{\emptyset,A_i,A_i^C,\Omega\}</math>, <math>\forall i\in\{1,...,n\}</math>.
|If the above is satisfied and if <math>\{j_1,...,j_l\}\subset\{1,...,n\}</math>, then for <math>i\in\{j_1,...,j_l\}</math> take <math>B_i=A_i</math> and for <math>i\not\in\{j_1,...,j_l\}</math> take <math>B_i=\Omega</math>. So it follows that
<math display="block">
\p[A_{j_1}\cap\dotsm \cap A_{j_l}]=\p[A_{j_1}]\dotsm\p[A_{j_l}].
</math>
Conversely, assume that <math>A_1,...,A_n\in\A</math> are independent and we want to deduce <math>(*)</math>. We can assume that <math>\forall i\in\{1,...,n\}</math> we have <math>B_i\not=\emptyset</math> (for otherwise the identity is trivially satisfied). If <math>\{j_1,...,j_l\}=\{i\mid B_i\not=\Omega\}</math>, we have to check that
<math display="block">
\p[B_{j_1}\cap\dotsm\cap B_{j_l}]=\p[B_{j_1}]\dotsm\p[B_{j_l}],
</math>
as soon as <math>B_{j_k}=A_{j_k}</math> or <math>B_{j_k}=A_{j_k}^C</math>. Finally it's enough to show that if <math>C_1,...,C_p</math> are independent events, then
<math display="block">
C_1^C,C_2,...,C_p
</math>
are also independent. But if <math>1\not\in\{i_1,...,i_q\}</math>, for all <math>\{i_1,...,i_q\}\subset\{1,...,p\}</math>, then from the definition of independence we have
<math display="block">
\p[C_{i_1}\cap\dotsm\cap C_{i_q}]=\p[C_{i_1}]\dotsm\p[C_{i_q}].
</math>
If <math>1\in\{i_1,...,i_q\}</math>, say <math>1=i_1</math>, then
<math display="block">
\begin{align*}
\p[C_{i_1}^C\cap C_{i_2}\cap\dotsm\cap C_{i_q}]&=\p[C_{i_1}\cap\dotsm\cap C_{i_q}]-\p[C_1\cap C_{i_2}\cap\dotsm\cap C_{i_q}]\\
&=\p[C_{i_2}]\dotsm\p[C_{i_q}]-\p[C_1]\p[C_{i_2}]\dotsm\p[C_{i_q}]\\
&=(1-\p[C_1])\p[C_{i_2}]\dotsm\p[C_{i_q}]=\p[C_1^C]\p[C_{i_2}]\dotsm\p[C_{i_q}]
\end{align*}
</math>}}
{{definitioncard|Conditional probability|Let <math>(\Omega,\A,\p)</math> be a probability space. Let <math>A,B\in\A</math> such that <math>\p[B] > 0</math>. The conditional probability of <math>A</math> given <math>B</math> is then defined as
<math display="block">
\p[A\mid B]=\frac{\p[A\cap B]}{\p[B]}.
</math>
}}
{{proofcard|Theorem|thm-1|Let <math>(\Omega,\A,\p)</math> be a probability space. Let <math>A,B\in\A</math> and suppose that <math>\p[B] > 0</math>.
<ul style{{=}}"list-style-type:lower-roman"><li><math>A</math> and <math>B</math> are independent if and only if
<math display="block">
\p[A\mid B]=\p[A].
</math>
</li>
<li>The map
<math display="block">
\A\to [0,1],A\mapsto \p[A\mid B]
</math>
defines a new probability measure on <math>\A</math> called the conditional probability given <math>B</math>.
</li>
</ul>
|We need to show both points.
<ul style{{=}}"list-style-type:lower-roman"><li>If <math>A</math> and <math>B</math> are independent, then
<math display="block">
\p[A\mid B]=\frac{\p[A\cap B]}{\p[B]}=\frac{\p[A]\p[B]}{\p[B]}=\p[A]
</math>
and conversely if <math>\p[A\mid B]=\p[A]</math>, we get that
<math display="block">
\p[A\cap B]=\p[A]\p[B],
</math>
and hence <math>A</math> and <math>B</math> are independent.
</li>
<li>Let <math>\Q[A]=\p[A\mid B]</math>. We have
<math display="block">
\Q[\Omega]=\p[\omega\mid B]=\frac{\p[\omega\cap B]}{\p[B]}=\frac{\p[B]}{\p[B]}=1.
</math>
Take <math>(A_n)_{n\geq 1}\subset \A</math> as a disjoint family of events. Then
<math display="block">
\begin{align*}
\Q\left[\bigcup_{n\geq 1}A_n\right]&=\p\left[\bigcup_{n\geq 1}A_n\mid B\right]=\frac{\p\left[\left(\bigcup_{n\geq 1}A_n\right)\cap B\right]}{\p[B]}=\p\left[\bigcup_{n\geq 1}(A_n\cap B)\right]\\
&=\sum_{n\geq 1}\frac{\p[A_n\cap B]}{\p[B]}=\sum_{n\geq 1}\Q[A_n].
\end{align*}
</math>
</li>
</ul>}}
{{proofcard|Theorem|thm-2|Let <math>(\Omega,\A,\p)</math> be a probability space. Let <math>A_1,...,A_n\in\A</math> with <math>\p[A_1\cap\dotsm\cap A_n] > 0</math>. Then
<math display="block">
\p[A_1\cap\dotsm\cap A_n]=\p[A_1]\p[A_2\mid A_1]\p[A_3\mid A_1\cap A_2]\dotsm\p[A_n\mid A_1\cap\dotsm\cap A_{n-1}].
</math>
|We prove this by induction. For <math>n=2</math> it's just the definition of the conditional probability. Now we want to go from <math>n-1</math> to <math>n</math>. Therefore set <math>B=A_1\cap \dotsm\cap A_{n-1}</math>. Then
<math display="block">
\p[B\cap A_n]=\p[A_n\mid B]\p[B]=\p[A_n\mid B]\p[A_1]\p[A_\mid A_1]\dotsm\p[A_{n-1}\mid A_1\cap\dotsm\cap A_{n-2}].
</math>}}
{{proofcard|Theorem|thm-3|Let <math>(\Omega,\A,\p)</math> be a probability space. Let <math>\left(E_{n}\right)_{n\geq 1}</math> be a finite or countable measurable partition of <math>\Omega</math>, such that <math>\p[E_n] > 0</math> for all <math>n</math>. If <math>A\in\A</math>, then
<math display="block">
\p[A]=\sum_{n\geq 1}\p[A\mid E_n]\p[E_n].
</math>
|Note that
<math display="block">
A=A\cap\Omega=A\cap\left(\bigcup_{n\geq 1}E_n\right)=\bigcup_{n\geq 1}(A_n\cap E_n).
</math>
Now since the <math>(A\cap E_n)_{n\geq 1}</math> are disjoint, we can write
<math display="block">
\p[A]=\sum_{n\geq 1}\p[A\cap E_n]=\sum_{n\geq 1}\p[A\mid E_n]\p[E_n].
</math>}}
{{proofcard|Theorem (Baye)|thm-4|Let <math>(\Omega,\A,\p)</math> be a probability space. Let <math>(E_n)_{n\geq 1}</math> be a finite or countable partition of <math>\Omega</math> and assume that <math>\p[A] > 0.</math> Then
<math display="block">
\p[E_n\mid A]=\frac{\p[A\mid E_n]\p[E_n]}{\sum_{n\geq 1}\p[A\mid E_n]\p[E_n]}.
</math>
|By the previous theorem we know that
<math display="block">
\p[A]=\sum_{n\geq 1}\p[A\mid E_n]\p[E_n],\p[E_n\mid A]=\frac{\p[E_n\cap A]}{\p[A]},\p[A\mid E_n]=\frac{\p[A\cap E_n]}{\p[E_n]}
</math>
Therefore, combining things, we get
<math display="block">
\p[E_n\mid A]=\frac{\p[E_n\cap A]}{\p[A]}=\frac{\p[A\mid E_n]\p[E_n]}{\sum_{n\geq 1}\p[A\mid E_n]\p[E_n]}.
</math>}}
===Independent Random Variables and independent <math>\sigma</math>-Algebras===
{{definitioncard|Independence of <math>\sigma</math>-Algebras|Let <math>(\Omega,\A,\p)</math> be a probability space. We say that the sub <math>\sigma</math>-Algebras <math>\B_1,...,\B_n</math> of <math>\A</math> are independent if for all <math> A_1\in\B_1,..., A_n\in\B_n</math> we get
<math display="block">
\p[A_1\cap\dotsm \cap A_n]=\p[A_1]\dotsm\p[A_n].
</math>
Let now <math>X_1,...,X_n</math> be <math>n</math> r.v.'s with values in measureable spaces <math>(E_1,\mathcal{E}_1),...,(E_n,\mathcal{E}_n)</math> respectively. We say that the r.v.'s <math>X_1,...,X_n</math> are independent if the <math>\sigma</math>-Algebras <math>\sigma(X_1),...,\sigma(X_n)</math> are independent. This is equivalent to the fact that for all <math>F_1\in\mathcal{E}_1,...,F_n\in\mathcal{E}_n</math> we have
<math display="block">
\p[\{X_1\in F_1\}\cap\dotsm\cap\{X_n\in F_n\}]=\p[X_1\in F_1]\dotsm \p[X_n\in F_n].
</math>
(This comes from the fact that for all <math>i\in\{1,...,n\}</math> we have that <math>\sigma(X_i)=\{X_i^{-1}(F)\mid F\in\mathcal{E}_i\}</math>)}}
{{alert-info |
If <math>\B_1,...,\B_n</math> are <math>n</math> independent sub <math>\sigma</math>-Algebras and if <math>X_1,...,X_n</math> are independent r.v.'s such that <math>X_i</math> is <math>\B_i</math> measurable for all <math>i\in\{1,...,n\}</math>, then <math>X_1,...,X_n</math> are independent r.v.'s (This comes from the fact that for all <math>i\in\{1,...,n\}</math> we have that <math>\sigma(X_i)\subset \B_i</math>).
}}
{{alert-info |
The <math>n</math> events <math>A_1,...,A_n\in\A</math> are independent if and only if <math>\sigma(A_1),...,\sigma(A_n)</math> are independent.
}}
{{proofcard|Theorem (Independence of Random Variables)|thm7|Let <math>(\Omega,\A,\p)</math> be a probability space. Let <math>X_1,...,X_n</math> be <math>n</math> r.v.'s. Then <math>X_1,...,X_n</math> are independent if and only if the law of the vector <math>(X_1,...,X_n)</math> is the product of the laws of <math>X_1,...,X_n</math>, i.e.
<math display="block">
\p_{(X_1,...,X_n)}=\p_{X_1}\otimes\dotsm\otimes \p_{X_n}.
</math>
Moreover, for every measurable map <math>f_i:(E_i,\mathcal{E}_i)\to\R_+</math> defined on a measurable space <math>(E_i,\mathcal{E}_i)</math> for all <math>i\in\{1,...,n\}</math>, we have
<math display="block">
\E\left[\prod_{i=1}^nf_i(X_i)\right]=\prod_{i=1}^n\E[f_i(X_i)].
</math>
|Let <math>F_i\in\mathcal{E}_i</math> for all <math>i\in\{1,...,n\}</math>. Thus we have
<math display="block">
\p_{(X_1,...,X_n)}(F_1\times\dotsm \times F_n)=\p[\{X_1\in F_1\}\cap\dotsm\cap\{X_n\in F_n\}]
</math>
and on the other hand
<math display="block">
\left(\p_{X_1}\otimes\dotsm\otimes\p_{X_n}\right)(F_1\times\dotsm\times F_n)=\p_{X_1}[F_1]\dotsm\p_{X_n}[F_n]=\prod_{i=1}^n\p_{X_i}[F_i]=\prod_{i=1}^n\p[X_i\in F_i].
</math>
If <math>X_1,...,X_n</math> are independent, then
<math display="block">
\p_{(X_1,...,X_n)}(F_1\times\dotsm \times F_n)=\prod_{i=1}^n\p[X_i\in F_i]=\left(\p_{X_1}\otimes\dotsm\otimes \p_{X_n}\right)(F_1\times\dotsm\times F_n),
</math>
which implies that <math> \p_{(X_1,...,X_n)}</math> and <math>\p_{X_1}\otimes\dotsm\otimes \p_{X_n}</math> are equal on rectangles. Hence the monotone class theorem implies that
<math display="block">
\p_{(X_1,...,X_n)}=\p_{X_1}\otimes\dotsm\otimes\p_{X_n}.
</math>
Conversely, if <math>\p_{(X_1,...,X_n)}=\p_{X_1}\otimes\dotsm\otimes\p_{X_n}</math>, then for all <math>F_i\in\mathcal{E}_i</math>, with <math>i\in\{1,...,n\}</math>, we get that
<math display="block">
\p_{(X_1,...,X_n)}(F_1\times\dotsm\times F_n)=\left(\p_{X_1}\otimes\dotsm\otimes\p_{X_n}\right)(F_1\times\dotsm\times F_n)
</math>
and therefore
<math display="block">
\p[\{X_1\in F_1\}\cap\dotsm\cap\{X_n\in F_n\}]=\p[X_1\in F_1]\dotsm\p[X_n\in F_n].
</math>
This implies that <math>X_1,...,X_n</math> are independent. For the second assumption we get 
<math display="block">
\E\left[\prod_{i=1}^nf_i(X_i)\right]=\int_{E_1\times\dotsm\times E_n}\prod_{i=1}^nf_i(X_i)\underbrace{P_{X_1}dx_1\dotsm P_{X_n}dx_n}_{\p_{X_1,...,X_n}(dx_1\dotsm dx_n)}=\prod_{i=1}^n\int_{E_i}f_i(x_i)P_{X_i}dx_i=\prod_{i=1}^n\E[f_i(X_i)],
</math>
where we have used the first part and Fubini's theorem.}}
{{alert-info |
We see from the proof above that as soon as for all <math>i\in\{1,...,n\}</math> we have <math>\E[\vert f_i(X_i)\vert] < \infty</math>, it follows that
<math display="block">
\E\left [\prod_{i=1}^n f_i(X_i)\right]=\prod_{i=1}^n\E[ f_i(X_i) ].
</math>
Indeed, the previous result shows that
<math display="block">
\E\left[\prod_{i=1}^n\vert f_i(X_i)\vert\right]=\prod_{i=1}^n\E[\vert f_i(X_i)\vert] < \infty
</math>
and thus we can apply Fubini's theorem. In particular if <math>X_1,...,X_n\in L^1(\Omega,\A,\p)</math> and independent, we get that
<math display="block">
\E\left[\prod_{i=1}^nX_i\right]=\prod_{i=1}^n\E[X_i].
</math>
}}
{{proofcard|Corollary|cor-1|Let <math>(\Omega,\A,\p)</math> be a probability space. Let <math>X_1</math> and <math>X_2</math> be two independent r.v.'s in <math>L^2(\Omega,\A,\p)</math>. Then we get
<math display="block">
Cov(X_1,X_2)=0.
</math>
|Recall that if <math>X\in L^2(\Omega,\A,\p)</math>, we also have that <math>X\in L^1(\Omega,\A,\p)</math>. Thus
<math display="block">
Cov(X_1,X_2)=\E[X_1X_2]-\E[X_1]\E[X_2]=\E[X_1]\E[X_2]-\E[X_1]\E[X_2]=0.
</math>}}
{{alert-info |
Note that the converse is not true! Let <math>X_1\sim\mathcal{N}(0,1)</math>. We can also take for <math>X_1</math> any symmetric r.v. in <math>L^2(\Omega,\A,\p)</math> with density <math>P(x)</math>, such that <math>P(-x)=P(x)</math>. Recall that being in <math>L^2(\Omega,\A,\p)</math> simply means
<math display="block">
\E[X^2]=\int_\R x^2 P(x)dx < \infty,
</math>
which implies that <math>P(x)=P(-x)</math> and thus <math>\E[X^2]=\int_\R x^2P(x)dx=0.</math>
Now consider a r.v. <math>Y</math> with values in <math>\{-1,+1\}</math>. Then we get <math>\p[Y=1]=\p[Y=-1]=\frac{1}{2}</math> and thus <math>Y</math> is independent of <math>X_1</math>. Define <math>X_2:=YX_1</math> and observe then
<math display="block">
Cov(X_1,X_2)=\E[X_1X_2]-\E[X_1]\E[X_2]=\E[YX_1^2]-\E[YX_1]\E[X_1]
</math>
and hence
<math display="block">
\E[Y]\E[X_1^2]-\E[Y]\E^2[X_1]=0-0=0.
</math>
If <math>X_1</math> and <math>X_2</math> are independent, we note that <math>\vert X_1\vert</math> and <math>\vert X_2\vert</math> would also be independent. But <math>\vert X_2\vert =\vert Y\vert \vert X_1\vert=\vert X_1\vert</math>. This would mean that <math>\vert X_1\vert</math> is independent of itself. So it follows that <math>\vert X_1\vert</math> is equal to a constant a.s. If <math>c=\E[\vert X_1\vert]</math>, and we want to look at <math>\E[(\vert X_1\vert-c)^2]</math>, we now know that <math>\vert X_1\vert -c</math> is independent of itself. Therefore we get
<math display="block">
\E[(\vert X_1\vert-c)^2]=\E[\vert X_1\vert-c]\E[\vert X_1\vert-c]=0\Longrightarrow \vert X_1\vert=c\text{a.s.}
</math>
This cannot happen since <math>\vert X_1\vert</math> is the absolute value of a standard Gaussian distribution, which has a density given by
<math display="block">
P(x)=\frac{1}{\sqrt{2\pi}}e^{-\frac{x^2}{2}}.
</math>
}}
{{proofcard|Corollary|cor-2|Let <math>(\Omega,\A,\p)</math> be a probability space. Let <math>X_1,...,X_n</math> be <math>n</math> r.v.'s with values in <math>\R</math>.
<ul style{{=}}"list-style-type:lower-roman"><li>Assume that for <math>i\in \{1,...,n\}</math>, <math>\p_{X_i}</math> has density <math>P_i</math> and that the r.v.'s <math>X_1,...,X_n</math> are independent. Then the law of <math>(X_1,...,X_n)</math> also has density given by <math>P(x_1,...,x_n)=\prod_{i=1}^nP_i(x_i)</math>.
</li>
<li>Conversely assume that the law of <math>(X_1,...,X_n)</math> has density <math>P(x_1,...,x_n)=\prod_{i=1}^nq_i(x_i)</math>, where <math>q_i</math> is Borel measurable and positive. Then the r.v.'s <math>X_1,...,X_n</math> are independent and the law of <math>X_i</math> has density <math>P_i=c_iq_i</math>, with <math>c_i > 0</math> for <math>i\in\{1,...,n\}</math>.
</li>
</ul>
|We only need to show <math>(ii)</math>. From Fubini we get
<math display="block">
\int_\R\prod_{i=1}^nq_i(x_i)dx_i=\prod_{i=1}^n\int_\R q_i(x_i)dx_i=\int_{\R^{n}}P(x_1,...,x_n)dx_1\dotsm dx_n=1.
</math>
which implies that <math>K_i:=\int_\R q_i(x_i)dx_i\in(0,\infty)</math>, for all <math>i\in\{1,...,n\}</math>. Now we know that the law of <math>X_i</math> has density <math>P_i</math> given by
<math display="block">
P_i(x_i)=\int_{\R^{n-1}}P(x_1,...,x_{i-1},x_i,x_{i+1},...,x_n)dx_1\dotsm dx_{i-1}dx_{i+1}\dotsm dx_n=\left(\prod_{j\not=i}K_j\right)q_i(x_i)=\frac{1}{K_i}q_i(x_i).
</math>
We can rewrite
<math display="block">
P(x_1,...,x_n)=\prod_{i=1}^nq_i(x_i)=\prod_{i=1}^nP_i(x_i).
</math>
Hence we get <math>P(x_1,...,x_n)=\p_{X_1}\otimes\dotsm \otimes \p_{X_n}</math> and therefore <math>X_1,...,X_n</math> are independent.}}
'''Example'''
Let <math>U</math> be a r.v. with exponential distribution. Let <math>V</math> be a uniform r.v. on <math>[0,1]</math>. We assume that <math>U</math> and <math>V</math> are independent. Define the r.v.'s <math>X=\sqrt{U}\cos(2\pi V)</math> and <math>Y=\sqrt{U}\sin(2\pi V)</math>. Then <math>X</math> and <math>Y</math> are independent. Indeed, for a measurable function <math>\varphi:\R^2\to \R_+</math> we get
<math display="block">
\E[\varphi(X,Y)]=\int_0^\infty\int_{0}^1\varphi(\sqrt{u}\cos(2\pi v),\sqrt{u}\sin(2\pi v))e^{u}dudv
</math>
<math display="block">
=\frac{1}{\sqrt{\pi}}\int_{0}^\infty\int_0^{2\pi}\varphi(r\cos(\theta),r\sin(\theta))re^{-r^2}drd\theta,
</math>
which implies that <math>(X,Y)</math> has density <math>\frac{e^{-x^2}e^{-y^2}}{\pi}</math> on <math>\R\times\R</math>. With the previous corollary we get that <math>X</math> and <math>Y</math> are independent and <math>X</math> and <math>Y</math> have the same density <math>P(x)=\frac{1}{\sqrt{\pi}}e^{-x^2}</math>. This means that <math>X</math> and <math>Y</math> are independent.
{{alert-info |
We write <math>X\stackrel{law}{=}Y</math> to say that <math>\p_X=\p_Y</math>. Thus in the example above we would have
<math display="block">
X\stackrel{law}{=}Y\sim\mathcal{N}(0,\frac{1}{2}).
</math>
}}
====Important facts====
Let <math>X_1,...,X_n</math> be <math>n</math> real valued r.v.'s. Then the following are equivalent
<ul style{{=}}"list-style-type:lower-roman"><li><math>X_1,...,X_n</math> are independent.
</li>
<li>For <math>X=(X_1,...,X_n)\in\R^n</math> we have
<math display="block">
\Phi_X(\xi_1,...,\xi_n)=\prod_{i=1}^n\Phi_{X_i}(\xi_i).
</math>
</li>
<li>For all <math>a_1,..,a_n\in\R</math>, we have
<math display="block">
\p[X_1\leq a_1,..,X_n\leq a_n]=\prod_{i=1}^n\p[X_i\leq a_i]
</math>
</li>
<li>If <math>f_1,...,f_n:\R\to\R_+</math> are continuous, measurable maps with compact support, then
<math display="block">
\E\left[\prod_{i=1}^nf_i(X_i)\right]=\prod_{i=1}^n\E[f_i(X_i)].
</math>
</li>
</ul>
\begin{proof}
First we show <math>(i)\Longrightarrow (ii)</math>. By definition and the iid property, we get
<math display="block">
\Phi_X(\xi_1,..,\xi_n)=\E\left[e^{i(\xi_1X_1+...+\xi_nX_n)}\right]=\E\left[e^{i\xi_1X_1}\dotsm e^{i\xi_nX_n}\right]=\prod_{i=1}^n\E[e^{i\xi X_1}]=\prod_{i=1}^n\Phi_{X_i}(\xi_i),
</math>
where the map <math>t\mapsto e^{it}</math> is measurable and bounded.
Next we show <math>(ii)\Longrightarrow (i)</math>. Note that by [[#thm7 |theorem]] we have <math>\p_X=\p_Y</math> if
<math display="block">
\Phi_X(\xi_1,...,\xi_n)=\Phi_Y(\xi_1,...,\xi_n).
</math>
Now if <math>\Phi_X(\xi_1,...,\xi_n)=\prod_{i=1}^n\Phi_{X_i}(\xi_i)</math>, we note that <math>\prod_{i=1}^n\Phi_{X_i}(\xi_i)</math>
is the characteristic function of the probability distribution if the probability distribution is <math>\p_{X_1}\otimes\dotsm \otimes\p_{X_n}</math>. Now from injectivity it follows that <math>\p_{(X_1,...,X_n)}=\p_{X_1}\otimes\dotsm\otimes\p_{X_n}</math>, which implies that <math>X_1,...,X_n</math> are independent.
\end{proof}
{{proofcard|Proposition|prop-2|Let <math>(\Omega,\A,\p)</math> be a probability space. Let <math>\B_1,...,\B_n\subset\A</math> be sub <math>\sigma</math>-Algebras of <math>\A</math>. For every <math>i\in\{1,...,n\}</math>, let <math>\mathcal{C}_i\subset\B_i</math> be a family of subsets of <math>\Omega</math> such that  <math>\mathcal{C}_i</math> is stable under finite intersection and <math>\sigma(\mathcal{C}_i)=\B_i</math>. Assume that for all <math>C_i\in\mathcal{C}_i</math> with <math>i\in\{1,...,n\}</math> we have
<math display="block">
\p\left[\prod_{i=1}^nC_i\right]=\prod_{i=1}^n\p[C_i].
</math>
Then <math>\B_1,...,\B_n</math> are independent <math>\sigma</math>-Algebras.
|Let us fix <math>C_2\in \mathcal{C}_2,...,C_n\in\mathcal{C}_n</math> and define
<math display="block">
M_1:=\left\{B_1\in\B_1\mid \p[B_1\cap C_2\cap\dotsm\cap C_2]=\p[B_1]\p[C_2]\dotsm \p[C_n]\right\}.
</math>
Now since <math>\mathcal{C}_1\subset M_1</math> and <math>M_1</math> is a monotone class, we get <math>\sigma(\mathcal{C}_1)=\B_1\subset M_1</math> and thus <math>\B_1=M_1</math>. Let now <math>B_1\in\B_1,</math> <math>C_3\in\mathcal{C}_3,...,C_n\in\mathcal{C}_n</math> and define
<math display="block">
M_2:=\{B_2\in\B_2\mid \p[B_2\cap B_1\cap C_3\cap\dotsm\cap C_n]=\p[B_2]\p[B_1]\p[C_3]\dotsm\p[C_n]\}.
</math>
Again, since <math>\mathcal{C}_2\subset M_2</math>, we get <math>\sigma(\mathcal{C}_2)=\B_2\subset M_2</math> and thus <math>B_2=M_2</math>. By induction we complete the proof.}}
<math>Consequence:</math> Let <math>\B_1,...,\B_n</math> be <math>n</math> independent <math>\sigma</math>-Algebras and let <math>m_0=0 < m_1 < ... < m_p=n</math>. Then the <math>\sigma</math>-Algebras
<math display="block">
\begin{align*}
\mathcal{D}_1&=\B_1\lor\dotsm\lor\B_n=\sigma(\B_1,...,\B_n)=\sigma\left(\bigcup_{k=1}^n\B_k\right)\\
\mathcal{D}_2&=\B_{m_i+1}\lor\dotsm\lor\B_{n_2}\\
\vdots\\
\mathcal{D}_p&=\B_{n_{p-1}+1}\lor\dotsm\lor\B_{n_p}
\end{align*}
</math>
are also independent. Indeed, we can apply The previous proposition to the class of sets
<math display="block">
C_j=\{B_{n_{j-1}+1}\cap\dotsm\cap B_{n_j}\mid B_i\in\mathcal{C}_i, i\in\{n_{j-1}+1,...,n_j\}\}.
</math>
In particular if <math>X_1,...,X_n</math> are independent r.v.'s, then
<math display="block">
\begin{align*}
Y_1&=(X_1,...,X_n)\\
\vdots\\
Y_p&=(X_{n_{p_1}},...,X_{n_p})
\end{align*}
</math>
are also independent.
'''Example'''
Let <math>X_1,...,X_4</math> be real valued independent r.v.'s. Then <math>Z_1=X_1X_3</math> and <math>Z_2=X_2^3+X_4</math> are independent and <math>Z_3=\sigma(X_1,X_3)</math> and <math>Z_4=\sigma(X_2,X_4)</math> are measurable.
From above <math>\sigma(X_1,X_3)</math> and <math>\sigma(X_2,X_4)</math> are independent if for <math>X:\Omega\to\R</math> we have that <math>Y</math> is <math>\sigma(X)</math> measurable if and only if <math>Y=f(X)</math> with <math>f</math> being a measurable map, i.e. if <math>Y</math> is <math>\sigma(X_1,...,X_n)</math> measurable, then <math>Y=f(X_1,....,X_n)</math>.
{{proofcard|Proposition (Independence for an infinite family)|prop-3|Let <math>(\Omega,\A,\p)</math> be a probability space. Let <math>(\B_i)_{i\in I}</math> be an infinite family of sub <math>\sigma</math>-Algebras of <math>A</math>. We say that the family <math>(\B_i)_{i\in I}</math> is independent if for all <math>\{i_1,..,i_p\}\in I</math>, <math>\B_{i_1},...,\B_{i_p}</math> are independent. If <math>(X_i)_{i\in I}</math> is a family of r.v.'s we say that they are independent if <math>(\sigma(X_i))_{i\in I}</math> is independent.|}}
{{proofcard|Proposition|prop-4|Let <math>(\Omega,\A,\p)</math> be a probability space. Let <math>(X_n)_{n\geq 1}</math> be a sequence of independent r.v.'s. Then for all <math>p\in\N</math> we get that <math>p_1=\sigma(X_1,...,X_p)</math> and <math>p_2=\sigma(X_{p+1},...,X_n)</math> are independent.
|Apply Proposition 5.9. to <math>\mathcal{C}_1=\sigma(X_1,...,X_p)</math> and
<math>\mathcal{C}_2=\bigcup_{k=p+1}^\infty\sigma(X_{p+1},...,X_n)\in\B_2</math>.}}
===The Borel-Cantelli Lemma===
Let <math>(\Omega,\A,\p)</math> be a probability space. Let <math>(A_n)_{n\in\N}</math> be a sequence of events in <math>\A</math>. Recall that we can write
<math display="block">
\limsup_{n\to \infty} A_n=\bigcap_{n=0}^\infty\left(\bigcup_{k=n}^\infty A_k\right)\text{and}\liminf_{n\to \infty} A_n=\bigcup_{n=0}^\infty\left(\bigcap_{k=n}^\infty A_k\right).
</math>
Moreover, both are again measurable sets. For <math>\omega\in\limsup_n A_n</math> we get that <math>\omega\in\bigcup_{k=n}^\infty A_k</math>, for all <math>n\geq 0</math>. Moreover, for all <math>n\geq 0</math>, there exists a <math>k\geq n</math> such that, <math>\omega\in A_n</math> and <math>\omega</math> is in infinitely many <math>A_k</math>'s. For <math>\omega\in\liminf_n A_n</math>, we get that for all <math>n\geq 0</math> such that <math>\omega\in\bigcap_{k=n}^\infty A_k</math>, there exists <math>n\geq 0</math>, such that for all <math>k\geq n</math> we have <math>\omega\in A_k</math>, which shows that <math>\liminf_nA_n\subset \limsup_nA_n</math>.
{{proofcard|Lemma (Borel-Cantelli)|lem-1|Let <math>(\Omega,\A,\p)</math> be a probability space. Let <math>(A_n)_{n\in\N}\in\A</math> be a family of measurable sets.
<ul style{{=}}"list-style-type:lower-roman"><li>If <math>\sum_{n\geq 1}\p[A_n] < \infty</math>, then
<math display="block">
\p\left[\limsup_{n\to\infty} A_n\right]=0,
</math>
which means that the set <math>\{n\in\N\mid \omega\in A_n\}</math> is a.s. finite.
</li>
<li>If <math>\sum_{n\geq 1}\p[A_n]=\infty</math>, and if the events <math>(A_n)_{n\in\N}</math> are independent, then
<math display="block">
\p\left[\limsup_{n\to\infty} A_n\right]=1,
</math>
which means that the set <math>\{n\in\N\mid \omega\in A_n\}</math> is a.s. finite.
</li>
</ul>
|We need to show both points.
<ul style{{=}}"list-style-type:lower-roman"><li>If <math>\sum_{n\geq 1}\p[A_n] < \infty,</math> then, by Fubini, we get
<math display="block">
\E\left[\sum_{n\geq 1}\one_{A_n}\right]=\sum_{n\geq 1}\p[A_n],
</math>
which implies that <math>\sum_{n\geq 1}\one_{A_n} < \infty</math> and <math>\one_{A_n}\not=0</math> a.s. for finite numbers of <math>n</math>.
</li>
<li>Fix <math>n_0\in\N</math> and note that for all <math>n\geq n_0</math> we have
<math display="block">
\p\left[\bigcap_{k=n_0}^nA_k^C\right]=\prod_{k=n_0}^n\p[A_k^C]=\prod_{k=n_0}^n\p[1-A_n].
</math>
Now we see that
<math display="block">
\sum_{n\geq 1}\p[A_n]=\infty
</math>
and thus
<math display="block">
\p\left[\bigcap_{k=n_0}^nA_k^C\right]=0.
</math>
Since this is true for every <math>n_0</math> we have that
<math display="block">
\p\left[\bigcup_{n=0}^\infty\bigcap_{k=n_0}^\infty A_k^C\right]\leq \sum_{n\geq 1}\p[A_k^C]=0.
</math>
Hence we get
<math display="block">
\p\left[\bigcup_{n=0}^\infty\bigcap_{k=n_0}^\infty A_k^C\right]=\p\left[\bigcap_{n=0}^\infty\bigcup_{k=n}^\infty A_k\right]=\p\left[\limsup_{n\to\infty} A_n\right]=1.
</math>
</li>
</ul>}}
====Application 1====
Let <math>(\Omega,\A,\p)</math> be a probability space. There does not exist a probability measure on <math>\N</math> such that the probability of the set of multiples of an integer <math>n</math> is <math>\frac{1}{n}</math> for <math>n\geq 1</math>. Let us assume that such a probability measure exists. Let <math>\tilde{p}</math> denote the set of prime numbers. For <math>p\in\tilde{p}</math> we note that <math>A_p=p\N</math>, i.e. the set of all multiples of <math>p</math>. We first show that the sets <math>(A_p)_{p\in\tilde{p}}</math> are independent. Indeed let <math>p_1,...,p_n\in\tilde{p}</math> be distinct. Then we have
<math display="block">
\p[p_1\N\cap\dotsm\cap p_n\N]=\p[p_1,...,p_n\N]=\frac{1}{p_1\dotsm p_n}=\p[p_1\N]\dotsm\p[p_n\N].
</math>
Moreover it is known that
<math display="block">
\sum_{p\in\tilde{p}}\p[p\N]=\sum_{p\in\tilde{p}}\frac{1}{p}=\infty.
</math>
The second part of the Borel-Cantelli lemma implies that all integers <math>n</math> belong to infinitely many <math>A_p</math>'s. So it follows that <math>n</math> is divisible by infinitely many distinct prime numbers.
====Application 2====
Let <math>(\Omega,\A,\p)</math> be a probability space. Let <math>X</math> be an exponential r.v. with parameter <math>\lambda=1</math>. Thus we know that <math>X</math> has density <math>e^{-x}\one_{\R_+}(x)</math>. Now consider a sequence <math>(X_n)_{n\geq 1}</math> of independent r.v.'s with the same distribution as <math>X</math>, i.e. for all <math>n\geq 1</math>,we have <math>X_n\sim X</math>. Then <math>\limsup_n \frac{X_n}{\log(n)}=1</math> a.s., i.e. there exists an <math>N\in\A</math> such that <math>\p[N]=0</math> and  for <math>\omega\not\in N</math> we get
<math display="block">
\limsup_{n\to\infty} \frac{X_n(\omega)}{\log(n)}=1.
</math>
Therefore we can compute the probability
<math display="block">
\p[X > t]=\int_t^\infty e^{-x}dx=e^{-t}.
</math>
Now let <math>\epsilon > 0</math> and consider the sets <math>A_n=\{X_n > (1+\epsilon)\log(n)\}</math> and <math>B_n=\{X_n > \log(n)\}</math>. Then
<math display="block">
\p[A_n]=\p[X_n > (1+\epsilon)\log(n)]=\p[X > (1+\epsilon)\log(n)]=e^{-(1+\epsilon)\log(n)}=\frac{1}{n^{1+\epsilon}}.
</math>
This implies that
<math display="block">
\sum_{n\geq 1}\p[A_n] < \infty.
</math>
With the Borel-Cantelli lemma we get that <math>\p\left[\limsup_{n\to\infty} A_n\right]=0</math>. Let us define
<math display="block">
N_{\epsilon}=\limsup_{n\to\infty} A_n.
</math>
Then we have <math>\p[N_\epsilon]=0</math> for <math>\omega\not\in N_{\epsilon}</math>, which implies that there exists an <math>n_0(\omega)</math> such that for all <math>n\geq n_0</math> we have
<math display="block">
X_n(\omega)\leq (1+\epsilon)\log(n)
</math>
and thus for <math>\omega\not\in N_{\epsilon}</math>, we get <math>\limsup_{n\to\infty}\frac{X_n(\omega)}{\log(n)}\leq 1+\epsilon</math>. Moreover, let
<math display="block">
N'=\bigcup_{\epsilon\in \Q_+}N_{\epsilon}.
</math>
Therefore we get <math>\p[N']\leq \sum_{\epsilon\in\Q_+}\p[N_{\epsilon}]=0</math> for <math>\omega\not\in N'</math>. Hence we get
<math display="block">
\limsup_{n\to\infty}\frac{X_n(\omega)}{\log(n)}\leq 1.
</math>
Now we note that the <math>B_n</math>'s are independent, since <math>B_n\in\sigma(X_n)</math> and the fact that the <math>X_n</math>'s are independent. Moreover,
<math display="block">
\p[B_n]=\p[X_n > \log(n)]=\p[X > \log(n)]=\frac{1}{n},
</math>
which gives that
<math display="block">
\sum_{n\geq 1}\p[B_n]=\infty.
</math>
Now we can use Borel-Cantelli to get
<math display="block">
\p\left[\limsup_{n\to\infty} B_n\right]=1.
</math>
If we denote <math>N''=\left(\limsup_{n\to\infty} B_n\right)^C</math>, then for <math>\omega\not\in N''</math> we get that <math>X_n(\omega) > \log(n)</math> for infinitely many <math>n</math>. So it follows that for <math>\omega\not\in N''</math> we have
<math display="block">
\limsup_{n\to\infty}\frac{X_n(\omega)}{\log(n)}\geq 1.
</math>
Finally, take <math>N=N'\cup N''</math> to obtain <math>\p[N]=0</math>. Thus for <math>\omega\not\in N</math> we get
<math display="block">
\limsup_{n\to\infty} \frac{X_n(\omega)}{\log(n)}=1.
</math>
===Sums of independent Random Variables===
Let us first define the convolution of two probability measures. If <math>\mu</math> and <math>\nu</math> are two probability measures on <math>\R^d</math>, we denote by <math>\mu*\nu</math> the image of the measure <math>\mu\otimes\nu</math> by the application
<math display="block">
\R^d\times\R^d\to\R^d,(x,y)\mapsto x+y.
</math>
Moreover, for all measurable maps <math>\varphi:\R^d\to \R_+</math>, we have
<math display="block">
\int_{\R^d}\varphi(z)(\mu*\nu)(dz)=\iint_{\R^d}\varphi(x+y)\mu(dx)\nu(dy).
</math>
{{proofcard|Proposition|prop-5|Let <math>(\Omega,\A,\p)</math> be a probability space. Let <math>X</math> and <math>Y</math> be two independent r.v.'s with values in <math>\R^d</math>. Then the following hold.
<ul style{{=}}"list-style-type:lower-roman"><li>The law of <math>X+Y</math> is given by <math>\p_X*\p_Y</math>. In particular if <math>X</math> has density <math>f</math> and <math>Y</math> has density <math>g</math>, then <math>X+Y</math> has density <math>f*g</math>, where <math>*</math> denotes the convolution product, which is given by
<math display="block">
f*g(\xi)=\int_{\R^d} f(x)g(\xi-x)dx.
</math>
</li>
<li>
<math>\Phi_{X+Y}(\xi)=\Phi_X(\xi)\Phi_Y(\xi).</math>
</li>
<li>If <math>X</math> and <math>Y</math> are in <math>L^2(\Omega,\A,\p)</math>, we get
<math display="block">
K_{X+Y}=K_X+K_Y.
</math>
In particular when <math>d=1</math>, we obtain
<math display="block">
Var(X+Y)=Var(X)+Var(Y).
</math>
</li>
</ul>
|We need to show all three points.
<ul style{{=}}"list-style-type:lower-roman"><li>If <math>X</math> and <math>Y</math> are independent r.v.'s, then <math>\p_{(X,Y)}=\p_X\otimes\p_Y</math>. Consequently, for all measurable maps <math>\varphi:\R^d\to\R_+</math>, we have
<math display="block">
\begin{multline*}
\E[\varphi(X+Y)]=\iint_{\R^d}\varphi(X+Y)\p_{(X,Y)}(dxdy)=\iint_{\R^d}\varphi(X+Y)\p_X(dx)\p_{Y}(dy)\\=\int_{\R^d}\varphi(\xi)(\p_X*\p_Y)(d\xi).
\end{multline*}
</math>
Now since <math>X</math> and <math>Y</math> have densities <math>f</math> and <math>g</math> respectively, we get
<math display="block">
\E[\varphi(Z=X+Y)]=\iint_{\R^d}\varphi(X+Y)f(x)*g(y)dxdy=\iint_{\R^d}\varphi(\xi)\left(\int_{\R^d} f(x)g(\xi-x)dx\right)d\xi.
</math>
Since this identity here is true for all measurable maps <math>\varphi:\R^d\to\R_+</math>, the r.v. <math>Z:=X+Y</math> has density
<math display="block">
h(\xi)=(f*g)(\xi)=\int_{\R^d}f(x)g(\xi-x)dx.
</math>
</li>
<li>By definition of the characteristic function and the independence property, we get
<math display="block">
\Phi_{X+Y}(\xi)=\E\left[e^{i\xi(X+Y)}\right]=\E\left[e^{i\xi X}e^{i\xi Y}\right]=\E\left[e^{i\xi X}\right]\E\left[e^{i\xi Y}\right]=\Phi_X(\xi)\Phi_Y(\xi).
</math>
</li>
<li>If <math>X=(X_1,...,X_d)</math> and <math>Y=(Y_1,...,Y_d)</math> are independent r.v.'s on <math>\R^d</math>, we get that <math>Cov(X_i,Y_j)=0</math>, for all <math>0\leq  i,j\leq  d</math>. By using the multi linearity of the covariance we get that
<math display="block">
Cov(X_i+Y_i,X_j+Y_j)=Cov(X_i,X_j)+Cov(Y_j+Y_j),
</math>
and hence <math>K_{X+Y}=K_X+K_Y</math>. For <math>d=1</math> we get
<math display="block">
\begin{align*}
Var(X+Y)&=\E[((X+Y)-\E[X+Y])^2]=\E[((X-\E[X])+(Y-\E[Y]))^2]\\
&=\underbrace{\E[(X-\E[X])^2]}_{Var(X)}+\underbrace{\E[(Y-\E[Y])^2]}_{Var(Y)}+\underbrace{2\E[(X-\E[X])(Y-\E[Y])]}_{2Cov(X,Y)}
\end{align*}
</math>
Now since <math>Cov(X,Y)=0</math>, we get the result.
</li>
</ul>}}
{{proofcard|Theorem (Weak law of large numbers)|thm-5|Let <math>(\Omega,\A,\p)</math> be a probability space. Let <math>(X_n)_{n\geq 1}</math> be a sequence of independent r.v.'s. Moreover, write <math>\mu=\E[X_n]</math> for all <math>n\geq1</math> and assume <math>\E[(X_n-\mu)^2]\leq  C</math> for all <math>n\geq1</math> and for some constant <math>C < \infty</math>. We also write <math>S_n=\sum_{j=1}^nX_j</math> and <math>\tilde X_n=\frac{S_n}{n}</math> for all <math>n\geq 1</math>. Then for all <math>\epsilon > 0</math>
<math display="block">
\p[\vert \tilde X_n-\mu\vert  > \epsilon]\xrightarrow{n\to\infty}0.
</math>
Thus, we also have
<math display="block">
\E[S_n]=\frac{1}{n}\E\left[\sum_{j=1}^nX_j\right]=\frac{1}{n}n\E[X_j]=\E[X_j].
</math>
|We note that
<math display="block">
\E[(S_n-n\mu)^2]=\sum_{j=1}^n\E[(X_j-\mu)^2]\leq  nC.
</math>
Hence for <math>\epsilon > 0</math> we get by Markov's inequality
<math display="block">
\p[\vert \tilde X-\mu > \epsilon]=\p[(S_n-n\mu)^2 > (n\epsilon)^2]\leq  \frac{\E[(S_n-n\mu)^2]}{n^2\epsilon^2}\leq  \frac{C}{n\epsilon^2}\xrightarrow{n\to\infty}0
</math>}}
{{proofcard|Corollary|cor-3|Let <math>(\Omega,\A,\p)</math> be a probability space. Let <math>(A_n)_{n\geq 1}\in \A</math> be a sequence of independent events with the same probabilities, i.e. <math>\p[A_n]=\p[A_m]</math>, for all <math>n,m\geq 1</math>. Then
<math display="block">
\lim_{n\to\infty}\frac{1}{n}\sum_{i=1}^\infty \one_{A_i}=\p[A_1]a.s.
</math>
|Note that by the weak law of large numbers, we get for a sequence of independent r.v.'s <math>(X_n)_{n\geq 1}</math> with the same expectation for all <math>n\geq 1</math>
<math display="block">
\lim_{n\to\infty}\E\left[\frac{1}{n}\sum_{j=1}^nX_j\right]=\E[X_1]
</math>
and thus we can take <math>X_j=\one_{A_j}</math>, since we know that <math>\E[\one_A]=\p[A]</math>.}}
==General references==
{{cite arXiv|last=Moshayedi|first=Nima|year=2020|title=Lectures on Probability Theory|eprint=2010.16280|class=math.PR}}

Latest revision as of 01:53, 8 May 2024

[math] \newcommand{\R}{\mathbb{R}} \newcommand{\A}{\mathcal{A}} \newcommand{\B}{\mathcal{B}} \newcommand{\N}{\mathbb{N}} \newcommand{\C}{\mathbb{C}} \newcommand{\Rbar}{\overline{\mathbb{R}}} \newcommand{\Bbar}{\overline{\mathcal{B}}} \newcommand{\Q}{\mathbb{Q}} \newcommand{\E}{\mathbb{E}} \newcommand{\p}{\mathbb{P}} \newcommand{\one}{\mathds{1}} \newcommand{\0}{\mathcal{O}} \newcommand{\mat}{\textnormal{Mat}} \newcommand{\sign}{\textnormal{sign}} \newcommand{\CP}{\mathcal{P}} \newcommand{\CT}{\mathcal{T}} \newcommand{\CY}{\mathcal{Y}} \newcommand{\F}{\mathcal{F}} \newcommand{\mathds}{\mathbb}[/math]

Independent events

Let [math](\Omega,\A,\p)[/math] be a probability space. If [math]A,B\in\A[/math], we say that [math]A[/math] and [math]B[/math] are independent if

[[math]] \p[A\cap B]=\p[A]\p[B]. [[/math]]

Example

[Throw of a die] We have the state space [math]\Omega=\{1,2,3,4,5,6\}[/math], [math]\omega\in\Omega[/math]. Hence we have [math]\p[\{\omega\}]=\frac{1}{6}[/math]. Now let [math]A=\{1,2\}[/math] and [math]B=\{1,3,5\}[/math]. Then

[[math]] \p[A\cap B]=\p[\{1\}]=\frac{1}{6}\text{and}\p[A]=\frac{1}{3},\p[B]=\frac{1}{2} [[/math]]

Therefore we get

[[math]] \p[A\cap B]=\p[A]\p[B]. [[/math]]

Hence we get that [math]A[/math] and [math]B[/math] are independent.

Definition (Independence of events)

We say that the [math]n[/math] events [math]A_1,...,A_n\in\A[/math] are independent if [math]\forall \{j_1,...,j_l\}\subset\{1,...,n\}[/math] we have

[[math]] \p[A_{j_1}\cap A_{j_2}\cap\dotsm \cap A_{j_l}]=\p[A_{j_1}]\dotsm \p[A_{j_l}]. [[/math]]

It is not enough to have [math]\p[A_1\cap\dotsm\cap A_n]=\p[A_1]\dotsm\p[A_n][/math]. It is also not enough to check that [math]\forall \{i,j\}\subset\{1,...,n\}[/math], [math]\p[A_i\cap A_j]=\p[A_i]\p[A_j][/math]. For instance, let us consider two tosses of a coin and consider events [math]A,B[/math] and [math]C[/math] given by

[[math]] A=\{\text{$H$ at the first throw}\},B=\{\text{$T$ at the first throw}\},C=\{\text{same outcome for both tosses}\} [[/math]]
The events [math]A,B[/math] and [math]C[/math] are two by two independent but [math]A,B[/math] and [math]C[/math] are not independent events.

Proposition

The [math]n[/math] events [math]A_1,...,A_n\in\A[/math] are independent if and only if

[[math]] (*)\p[B_1\cap\dotsm \cap B_n]=\p[B_1]\dotsm\p[B_n] [[/math]]
for all [math]B_i\in\sigma(A_i)=\{\emptyset,A_i,A_i^C,\Omega\}[/math], [math]\forall i\in\{1,...,n\}[/math].


Show Proof

If the above is satisfied and if [math]\{j_1,...,j_l\}\subset\{1,...,n\}[/math], then for [math]i\in\{j_1,...,j_l\}[/math] take [math]B_i=A_i[/math] and for [math]i\not\in\{j_1,...,j_l\}[/math] take [math]B_i=\Omega[/math]. So it follows that

[[math]] \p[A_{j_1}\cap\dotsm \cap A_{j_l}]=\p[A_{j_1}]\dotsm\p[A_{j_l}]. [[/math]]
Conversely, assume that [math]A_1,...,A_n\in\A[/math] are independent and we want to deduce [math](*)[/math]. We can assume that [math]\forall i\in\{1,...,n\}[/math] we have [math]B_i\not=\emptyset[/math] (for otherwise the identity is trivially satisfied). If [math]\{j_1,...,j_l\}=\{i\mid B_i\not=\Omega\}[/math], we have to check that

[[math]] \p[B_{j_1}\cap\dotsm\cap B_{j_l}]=\p[B_{j_1}]\dotsm\p[B_{j_l}], [[/math]]
as soon as [math]B_{j_k}=A_{j_k}[/math] or [math]B_{j_k}=A_{j_k}^C[/math]. Finally it's enough to show that if [math]C_1,...,C_p[/math] are independent events, then

[[math]] C_1^C,C_2,...,C_p [[/math]]
are also independent. But if [math]1\not\in\{i_1,...,i_q\}[/math], for all [math]\{i_1,...,i_q\}\subset\{1,...,p\}[/math], then from the definition of independence we have

[[math]] \p[C_{i_1}\cap\dotsm\cap C_{i_q}]=\p[C_{i_1}]\dotsm\p[C_{i_q}]. [[/math]]
If [math]1\in\{i_1,...,i_q\}[/math], say [math]1=i_1[/math], then

[[math]] \begin{align*} \p[C_{i_1}^C\cap C_{i_2}\cap\dotsm\cap C_{i_q}]&=\p[C_{i_1}\cap\dotsm\cap C_{i_q}]-\p[C_1\cap C_{i_2}\cap\dotsm\cap C_{i_q}]\\ &=\p[C_{i_2}]\dotsm\p[C_{i_q}]-\p[C_1]\p[C_{i_2}]\dotsm\p[C_{i_q}]\\ &=(1-\p[C_1])\p[C_{i_2}]\dotsm\p[C_{i_q}]=\p[C_1^C]\p[C_{i_2}]\dotsm\p[C_{i_q}] \end{align*} [[/math]]

Definition (Conditional probability)

Let [math](\Omega,\A,\p)[/math] be a probability space. Let [math]A,B\in\A[/math] such that [math]\p[B] \gt 0[/math]. The conditional probability of [math]A[/math] given [math]B[/math] is then defined as

[[math]] \p[A\mid B]=\frac{\p[A\cap B]}{\p[B]}. [[/math]]

Theorem

Let [math](\Omega,\A,\p)[/math] be a probability space. Let [math]A,B\in\A[/math] and suppose that [math]\p[B] \gt 0[/math].

  • [math]A[/math] and [math]B[/math] are independent if and only if
    [[math]] \p[A\mid B]=\p[A]. [[/math]]
  • The map
    [[math]] \A\to [0,1],A\mapsto \p[A\mid B] [[/math]]
    defines a new probability measure on [math]\A[/math] called the conditional probability given [math]B[/math].


Show Proof

We need to show both points.

  • If [math]A[/math] and [math]B[/math] are independent, then
    [[math]] \p[A\mid B]=\frac{\p[A\cap B]}{\p[B]}=\frac{\p[A]\p[B]}{\p[B]}=\p[A] [[/math]]
    and conversely if [math]\p[A\mid B]=\p[A][/math], we get that
    [[math]] \p[A\cap B]=\p[A]\p[B], [[/math]]
    and hence [math]A[/math] and [math]B[/math] are independent.
  • Let [math]\Q[A]=\p[A\mid B][/math]. We have
    [[math]] \Q[\Omega]=\p[\omega\mid B]=\frac{\p[\omega\cap B]}{\p[B]}=\frac{\p[B]}{\p[B]}=1. [[/math]]
    Take [math](A_n)_{n\geq 1}\subset \A[/math] as a disjoint family of events. Then
    [[math]] \begin{align*} \Q\left[\bigcup_{n\geq 1}A_n\right]&=\p\left[\bigcup_{n\geq 1}A_n\mid B\right]=\frac{\p\left[\left(\bigcup_{n\geq 1}A_n\right)\cap B\right]}{\p[B]}=\p\left[\bigcup_{n\geq 1}(A_n\cap B)\right]\\ &=\sum_{n\geq 1}\frac{\p[A_n\cap B]}{\p[B]}=\sum_{n\geq 1}\Q[A_n]. \end{align*} [[/math]]
Theorem

Let [math](\Omega,\A,\p)[/math] be a probability space. Let [math]A_1,...,A_n\in\A[/math] with [math]\p[A_1\cap\dotsm\cap A_n] \gt 0[/math]. Then

[[math]] \p[A_1\cap\dotsm\cap A_n]=\p[A_1]\p[A_2\mid A_1]\p[A_3\mid A_1\cap A_2]\dotsm\p[A_n\mid A_1\cap\dotsm\cap A_{n-1}]. [[/math]]


Show Proof

We prove this by induction. For [math]n=2[/math] it's just the definition of the conditional probability. Now we want to go from [math]n-1[/math] to [math]n[/math]. Therefore set [math]B=A_1\cap \dotsm\cap A_{n-1}[/math]. Then

[[math]] \p[B\cap A_n]=\p[A_n\mid B]\p[B]=\p[A_n\mid B]\p[A_1]\p[A_\mid A_1]\dotsm\p[A_{n-1}\mid A_1\cap\dotsm\cap A_{n-2}]. [[/math]]

Theorem

Let [math](\Omega,\A,\p)[/math] be a probability space. Let [math]\left(E_{n}\right)_{n\geq 1}[/math] be a finite or countable measurable partition of [math]\Omega[/math], such that [math]\p[E_n] \gt 0[/math] for all [math]n[/math]. If [math]A\in\A[/math], then

[[math]] \p[A]=\sum_{n\geq 1}\p[A\mid E_n]\p[E_n]. [[/math]]


Show Proof

Note that

[[math]] A=A\cap\Omega=A\cap\left(\bigcup_{n\geq 1}E_n\right)=\bigcup_{n\geq 1}(A_n\cap E_n). [[/math]]
Now since the [math](A\cap E_n)_{n\geq 1}[/math] are disjoint, we can write

[[math]] \p[A]=\sum_{n\geq 1}\p[A\cap E_n]=\sum_{n\geq 1}\p[A\mid E_n]\p[E_n]. [[/math]]

Theorem (Baye)

Let [math](\Omega,\A,\p)[/math] be a probability space. Let [math](E_n)_{n\geq 1}[/math] be a finite or countable partition of [math]\Omega[/math] and assume that [math]\p[A] \gt 0.[/math] Then

[[math]] \p[E_n\mid A]=\frac{\p[A\mid E_n]\p[E_n]}{\sum_{n\geq 1}\p[A\mid E_n]\p[E_n]}. [[/math]]


Show Proof

By the previous theorem we know that

[[math]] \p[A]=\sum_{n\geq 1}\p[A\mid E_n]\p[E_n],\p[E_n\mid A]=\frac{\p[E_n\cap A]}{\p[A]},\p[A\mid E_n]=\frac{\p[A\cap E_n]}{\p[E_n]} [[/math]]
Therefore, combining things, we get

[[math]] \p[E_n\mid A]=\frac{\p[E_n\cap A]}{\p[A]}=\frac{\p[A\mid E_n]\p[E_n]}{\sum_{n\geq 1}\p[A\mid E_n]\p[E_n]}. [[/math]]

Independent Random Variables and independent [math]\sigma[/math]-Algebras

Definition (Independence of [math]\sigma[/math]-Algebras)

Let [math](\Omega,\A,\p)[/math] be a probability space. We say that the sub [math]\sigma[/math]-Algebras [math]\B_1,...,\B_n[/math] of [math]\A[/math] are independent if for all [math] A_1\in\B_1,..., A_n\in\B_n[/math] we get

[[math]] \p[A_1\cap\dotsm \cap A_n]=\p[A_1]\dotsm\p[A_n]. [[/math]]
Let now [math]X_1,...,X_n[/math] be [math]n[/math] r.v.'s with values in measureable spaces [math](E_1,\mathcal{E}_1),...,(E_n,\mathcal{E}_n)[/math] respectively. We say that the r.v.'s [math]X_1,...,X_n[/math] are independent if the [math]\sigma[/math]-Algebras [math]\sigma(X_1),...,\sigma(X_n)[/math] are independent. This is equivalent to the fact that for all [math]F_1\in\mathcal{E}_1,...,F_n\in\mathcal{E}_n[/math] we have

[[math]] \p[\{X_1\in F_1\}\cap\dotsm\cap\{X_n\in F_n\}]=\p[X_1\in F_1]\dotsm \p[X_n\in F_n]. [[/math]]
(This comes from the fact that for all [math]i\in\{1,...,n\}[/math] we have that [math]\sigma(X_i)=\{X_i^{-1}(F)\mid F\in\mathcal{E}_i\}[/math])

If [math]\B_1,...,\B_n[/math] are [math]n[/math] independent sub [math]\sigma[/math]-Algebras and if [math]X_1,...,X_n[/math] are independent r.v.'s such that [math]X_i[/math] is [math]\B_i[/math] measurable for all [math]i\in\{1,...,n\}[/math], then [math]X_1,...,X_n[/math] are independent r.v.'s (This comes from the fact that for all [math]i\in\{1,...,n\}[/math] we have that [math]\sigma(X_i)\subset \B_i[/math]).

The [math]n[/math] events [math]A_1,...,A_n\in\A[/math] are independent if and only if [math]\sigma(A_1),...,\sigma(A_n)[/math] are independent.

Theorem (Independence of Random Variables)

Let [math](\Omega,\A,\p)[/math] be a probability space. Let [math]X_1,...,X_n[/math] be [math]n[/math] r.v.'s. Then [math]X_1,...,X_n[/math] are independent if and only if the law of the vector [math](X_1,...,X_n)[/math] is the product of the laws of [math]X_1,...,X_n[/math], i.e.

[[math]] \p_{(X_1,...,X_n)}=\p_{X_1}\otimes\dotsm\otimes \p_{X_n}. [[/math]]
Moreover, for every measurable map [math]f_i:(E_i,\mathcal{E}_i)\to\R_+[/math] defined on a measurable space [math](E_i,\mathcal{E}_i)[/math] for all [math]i\in\{1,...,n\}[/math], we have

[[math]] \E\left[\prod_{i=1}^nf_i(X_i)\right]=\prod_{i=1}^n\E[f_i(X_i)]. [[/math]]


Show Proof

Let [math]F_i\in\mathcal{E}_i[/math] for all [math]i\in\{1,...,n\}[/math]. Thus we have

[[math]] \p_{(X_1,...,X_n)}(F_1\times\dotsm \times F_n)=\p[\{X_1\in F_1\}\cap\dotsm\cap\{X_n\in F_n\}] [[/math]]
and on the other hand

[[math]] \left(\p_{X_1}\otimes\dotsm\otimes\p_{X_n}\right)(F_1\times\dotsm\times F_n)=\p_{X_1}[F_1]\dotsm\p_{X_n}[F_n]=\prod_{i=1}^n\p_{X_i}[F_i]=\prod_{i=1}^n\p[X_i\in F_i]. [[/math]]
If [math]X_1,...,X_n[/math] are independent, then
[[math]] \p_{(X_1,...,X_n)}(F_1\times\dotsm \times F_n)=\prod_{i=1}^n\p[X_i\in F_i]=\left(\p_{X_1}\otimes\dotsm\otimes \p_{X_n}\right)(F_1\times\dotsm\times F_n), [[/math]]
which implies that [math] \p_{(X_1,...,X_n)}[/math] and [math]\p_{X_1}\otimes\dotsm\otimes \p_{X_n}[/math] are equal on rectangles. Hence the monotone class theorem implies that

[[math]] \p_{(X_1,...,X_n)}=\p_{X_1}\otimes\dotsm\otimes\p_{X_n}. [[/math]]
Conversely, if [math]\p_{(X_1,...,X_n)}=\p_{X_1}\otimes\dotsm\otimes\p_{X_n}[/math], then for all [math]F_i\in\mathcal{E}_i[/math], with [math]i\in\{1,...,n\}[/math], we get that

[[math]] \p_{(X_1,...,X_n)}(F_1\times\dotsm\times F_n)=\left(\p_{X_1}\otimes\dotsm\otimes\p_{X_n}\right)(F_1\times\dotsm\times F_n) [[/math]]
and therefore

[[math]] \p[\{X_1\in F_1\}\cap\dotsm\cap\{X_n\in F_n\}]=\p[X_1\in F_1]\dotsm\p[X_n\in F_n]. [[/math]]
This implies that [math]X_1,...,X_n[/math] are independent. For the second assumption we get

[[math]] \E\left[\prod_{i=1}^nf_i(X_i)\right]=\int_{E_1\times\dotsm\times E_n}\prod_{i=1}^nf_i(X_i)\underbrace{P_{X_1}dx_1\dotsm P_{X_n}dx_n}_{\p_{X_1,...,X_n}(dx_1\dotsm dx_n)}=\prod_{i=1}^n\int_{E_i}f_i(x_i)P_{X_i}dx_i=\prod_{i=1}^n\E[f_i(X_i)], [[/math]]
where we have used the first part and Fubini's theorem.

We see from the proof above that as soon as for all [math]i\in\{1,...,n\}[/math] we have [math]\E[\vert f_i(X_i)\vert] \lt \infty[/math], it follows that

[[math]] \E\left [\prod_{i=1}^n f_i(X_i)\right]=\prod_{i=1}^n\E[ f_i(X_i) ]. [[/math]]
Indeed, the previous result shows that

[[math]] \E\left[\prod_{i=1}^n\vert f_i(X_i)\vert\right]=\prod_{i=1}^n\E[\vert f_i(X_i)\vert] \lt \infty [[/math]]
and thus we can apply Fubini's theorem. In particular if [math]X_1,...,X_n\in L^1(\Omega,\A,\p)[/math] and independent, we get that

[[math]] \E\left[\prod_{i=1}^nX_i\right]=\prod_{i=1}^n\E[X_i]. [[/math]]

Corollary

Let [math](\Omega,\A,\p)[/math] be a probability space. Let [math]X_1[/math] and [math]X_2[/math] be two independent r.v.'s in [math]L^2(\Omega,\A,\p)[/math]. Then we get

[[math]] Cov(X_1,X_2)=0. [[/math]]


Show Proof

Recall that if [math]X\in L^2(\Omega,\A,\p)[/math], we also have that [math]X\in L^1(\Omega,\A,\p)[/math]. Thus

[[math]] Cov(X_1,X_2)=\E[X_1X_2]-\E[X_1]\E[X_2]=\E[X_1]\E[X_2]-\E[X_1]\E[X_2]=0. [[/math]]

Note that the converse is not true! Let [math]X_1\sim\mathcal{N}(0,1)[/math]. We can also take for [math]X_1[/math] any symmetric r.v. in [math]L^2(\Omega,\A,\p)[/math] with density [math]P(x)[/math], such that [math]P(-x)=P(x)[/math]. Recall that being in [math]L^2(\Omega,\A,\p)[/math] simply means

[[math]] \E[X^2]=\int_\R x^2 P(x)dx \lt \infty, [[/math]]
which implies that [math]P(x)=P(-x)[/math] and thus [math]\E[X^2]=\int_\R x^2P(x)dx=0.[/math] Now consider a r.v. [math]Y[/math] with values in [math]\{-1,+1\}[/math]. Then we get [math]\p[Y=1]=\p[Y=-1]=\frac{1}{2}[/math] and thus [math]Y[/math] is independent of [math]X_1[/math]. Define [math]X_2:=YX_1[/math] and observe then

[[math]] Cov(X_1,X_2)=\E[X_1X_2]-\E[X_1]\E[X_2]=\E[YX_1^2]-\E[YX_1]\E[X_1] [[/math]]
and hence

[[math]] \E[Y]\E[X_1^2]-\E[Y]\E^2[X_1]=0-0=0. [[/math]]
If [math]X_1[/math] and [math]X_2[/math] are independent, we note that [math]\vert X_1\vert[/math] and [math]\vert X_2\vert[/math] would also be independent. But [math]\vert X_2\vert =\vert Y\vert \vert X_1\vert=\vert X_1\vert[/math]. This would mean that [math]\vert X_1\vert[/math] is independent of itself. So it follows that [math]\vert X_1\vert[/math] is equal to a constant a.s. If [math]c=\E[\vert X_1\vert][/math], and we want to look at [math]\E[(\vert X_1\vert-c)^2][/math], we now know that [math]\vert X_1\vert -c[/math] is independent of itself. Therefore we get

[[math]] \E[(\vert X_1\vert-c)^2]=\E[\vert X_1\vert-c]\E[\vert X_1\vert-c]=0\Longrightarrow \vert X_1\vert=c\text{a.s.} [[/math]]
This cannot happen since [math]\vert X_1\vert[/math] is the absolute value of a standard Gaussian distribution, which has a density given by

[[math]] P(x)=\frac{1}{\sqrt{2\pi}}e^{-\frac{x^2}{2}}. [[/math]]

Corollary

Let [math](\Omega,\A,\p)[/math] be a probability space. Let [math]X_1,...,X_n[/math] be [math]n[/math] r.v.'s with values in [math]\R[/math].

  • Assume that for [math]i\in \{1,...,n\}[/math], [math]\p_{X_i}[/math] has density [math]P_i[/math] and that the r.v.'s [math]X_1,...,X_n[/math] are independent. Then the law of [math](X_1,...,X_n)[/math] also has density given by [math]P(x_1,...,x_n)=\prod_{i=1}^nP_i(x_i)[/math].
  • Conversely assume that the law of [math](X_1,...,X_n)[/math] has density [math]P(x_1,...,x_n)=\prod_{i=1}^nq_i(x_i)[/math], where [math]q_i[/math] is Borel measurable and positive. Then the r.v.'s [math]X_1,...,X_n[/math] are independent and the law of [math]X_i[/math] has density [math]P_i=c_iq_i[/math], with [math]c_i \gt 0[/math] for [math]i\in\{1,...,n\}[/math].


Show Proof

We only need to show [math](ii)[/math]. From Fubini we get

[[math]] \int_\R\prod_{i=1}^nq_i(x_i)dx_i=\prod_{i=1}^n\int_\R q_i(x_i)dx_i=\int_{\R^{n}}P(x_1,...,x_n)dx_1\dotsm dx_n=1. [[/math]]
which implies that [math]K_i:=\int_\R q_i(x_i)dx_i\in(0,\infty)[/math], for all [math]i\in\{1,...,n\}[/math]. Now we know that the law of [math]X_i[/math] has density [math]P_i[/math] given by

[[math]] P_i(x_i)=\int_{\R^{n-1}}P(x_1,...,x_{i-1},x_i,x_{i+1},...,x_n)dx_1\dotsm dx_{i-1}dx_{i+1}\dotsm dx_n=\left(\prod_{j\not=i}K_j\right)q_i(x_i)=\frac{1}{K_i}q_i(x_i). [[/math]]
We can rewrite

[[math]] P(x_1,...,x_n)=\prod_{i=1}^nq_i(x_i)=\prod_{i=1}^nP_i(x_i). [[/math]]
Hence we get [math]P(x_1,...,x_n)=\p_{X_1}\otimes\dotsm \otimes \p_{X_n}[/math] and therefore [math]X_1,...,X_n[/math] are independent.

Example

Let [math]U[/math] be a r.v. with exponential distribution. Let [math]V[/math] be a uniform r.v. on [math][0,1][/math]. We assume that [math]U[/math] and [math]V[/math] are independent. Define the r.v.'s [math]X=\sqrt{U}\cos(2\pi V)[/math] and [math]Y=\sqrt{U}\sin(2\pi V)[/math]. Then [math]X[/math] and [math]Y[/math] are independent. Indeed, for a measurable function [math]\varphi:\R^2\to \R_+[/math] we get

[[math]] \E[\varphi(X,Y)]=\int_0^\infty\int_{0}^1\varphi(\sqrt{u}\cos(2\pi v),\sqrt{u}\sin(2\pi v))e^{u}dudv [[/math]]

[[math]] =\frac{1}{\sqrt{\pi}}\int_{0}^\infty\int_0^{2\pi}\varphi(r\cos(\theta),r\sin(\theta))re^{-r^2}drd\theta, [[/math]]

which implies that [math](X,Y)[/math] has density [math]\frac{e^{-x^2}e^{-y^2}}{\pi}[/math] on [math]\R\times\R[/math]. With the previous corollary we get that [math]X[/math] and [math]Y[/math] are independent and [math]X[/math] and [math]Y[/math] have the same density [math]P(x)=\frac{1}{\sqrt{\pi}}e^{-x^2}[/math]. This means that [math]X[/math] and [math]Y[/math] are independent.

We write [math]X\stackrel{law}{=}Y[/math] to say that [math]\p_X=\p_Y[/math]. Thus in the example above we would have

[[math]] X\stackrel{law}{=}Y\sim\mathcal{N}(0,\frac{1}{2}). [[/math]]

Important facts

Let [math]X_1,...,X_n[/math] be [math]n[/math] real valued r.v.'s. Then the following are equivalent

  • [math]X_1,...,X_n[/math] are independent.
  • For [math]X=(X_1,...,X_n)\in\R^n[/math] we have
    [[math]] \Phi_X(\xi_1,...,\xi_n)=\prod_{i=1}^n\Phi_{X_i}(\xi_i). [[/math]]
  • For all [math]a_1,..,a_n\in\R[/math], we have
    [[math]] \p[X_1\leq a_1,..,X_n\leq a_n]=\prod_{i=1}^n\p[X_i\leq a_i] [[/math]]
  • If [math]f_1,...,f_n:\R\to\R_+[/math] are continuous, measurable maps with compact support, then
    [[math]] \E\left[\prod_{i=1}^nf_i(X_i)\right]=\prod_{i=1}^n\E[f_i(X_i)]. [[/math]]

\begin{proof} First we show [math](i)\Longrightarrow (ii)[/math]. By definition and the iid property, we get

[[math]] \Phi_X(\xi_1,..,\xi_n)=\E\left[e^{i(\xi_1X_1+...+\xi_nX_n)}\right]=\E\left[e^{i\xi_1X_1}\dotsm e^{i\xi_nX_n}\right]=\prod_{i=1}^n\E[e^{i\xi X_1}]=\prod_{i=1}^n\Phi_{X_i}(\xi_i), [[/math]]

where the map [math]t\mapsto e^{it}[/math] is measurable and bounded. Next we show [math](ii)\Longrightarrow (i)[/math]. Note that by theorem we have [math]\p_X=\p_Y[/math] if

[[math]] \Phi_X(\xi_1,...,\xi_n)=\Phi_Y(\xi_1,...,\xi_n). [[/math]]

Now if [math]\Phi_X(\xi_1,...,\xi_n)=\prod_{i=1}^n\Phi_{X_i}(\xi_i)[/math], we note that [math]\prod_{i=1}^n\Phi_{X_i}(\xi_i)[/math] is the characteristic function of the probability distribution if the probability distribution is [math]\p_{X_1}\otimes\dotsm \otimes\p_{X_n}[/math]. Now from injectivity it follows that [math]\p_{(X_1,...,X_n)}=\p_{X_1}\otimes\dotsm\otimes\p_{X_n}[/math], which implies that [math]X_1,...,X_n[/math] are independent. \end{proof}

Proposition

Let [math](\Omega,\A,\p)[/math] be a probability space. Let [math]\B_1,...,\B_n\subset\A[/math] be sub [math]\sigma[/math]-Algebras of [math]\A[/math]. For every [math]i\in\{1,...,n\}[/math], let [math]\mathcal{C}_i\subset\B_i[/math] be a family of subsets of [math]\Omega[/math] such that [math]\mathcal{C}_i[/math] is stable under finite intersection and [math]\sigma(\mathcal{C}_i)=\B_i[/math]. Assume that for all [math]C_i\in\mathcal{C}_i[/math] with [math]i\in\{1,...,n\}[/math] we have

[[math]] \p\left[\prod_{i=1}^nC_i\right]=\prod_{i=1}^n\p[C_i]. [[/math]]
Then [math]\B_1,...,\B_n[/math] are independent [math]\sigma[/math]-Algebras.


Show Proof

Let us fix [math]C_2\in \mathcal{C}_2,...,C_n\in\mathcal{C}_n[/math] and define

[[math]] M_1:=\left\{B_1\in\B_1\mid \p[B_1\cap C_2\cap\dotsm\cap C_2]=\p[B_1]\p[C_2]\dotsm \p[C_n]\right\}. [[/math]]
Now since [math]\mathcal{C}_1\subset M_1[/math] and [math]M_1[/math] is a monotone class, we get [math]\sigma(\mathcal{C}_1)=\B_1\subset M_1[/math] and thus [math]\B_1=M_1[/math]. Let now [math]B_1\in\B_1,[/math] [math]C_3\in\mathcal{C}_3,...,C_n\in\mathcal{C}_n[/math] and define

[[math]] M_2:=\{B_2\in\B_2\mid \p[B_2\cap B_1\cap C_3\cap\dotsm\cap C_n]=\p[B_2]\p[B_1]\p[C_3]\dotsm\p[C_n]\}. [[/math]]
Again, since [math]\mathcal{C}_2\subset M_2[/math], we get [math]\sigma(\mathcal{C}_2)=\B_2\subset M_2[/math] and thus [math]B_2=M_2[/math]. By induction we complete the proof.

[math]Consequence:[/math] Let [math]\B_1,...,\B_n[/math] be [math]n[/math] independent [math]\sigma[/math]-Algebras and let [math]m_0=0 \lt m_1 \lt ... \lt m_p=n[/math]. Then the [math]\sigma[/math]-Algebras

[[math]] \begin{align*} \mathcal{D}_1&=\B_1\lor\dotsm\lor\B_n=\sigma(\B_1,...,\B_n)=\sigma\left(\bigcup_{k=1}^n\B_k\right)\\ \mathcal{D}_2&=\B_{m_i+1}\lor\dotsm\lor\B_{n_2}\\ \vdots\\ \mathcal{D}_p&=\B_{n_{p-1}+1}\lor\dotsm\lor\B_{n_p} \end{align*} [[/math]]

are also independent. Indeed, we can apply The previous proposition to the class of sets

[[math]] C_j=\{B_{n_{j-1}+1}\cap\dotsm\cap B_{n_j}\mid B_i\in\mathcal{C}_i, i\in\{n_{j-1}+1,...,n_j\}\}. [[/math]]

In particular if [math]X_1,...,X_n[/math] are independent r.v.'s, then

[[math]] \begin{align*} Y_1&=(X_1,...,X_n)\\ \vdots\\ Y_p&=(X_{n_{p_1}},...,X_{n_p}) \end{align*} [[/math]]

are also independent.

Example

Let [math]X_1,...,X_4[/math] be real valued independent r.v.'s. Then [math]Z_1=X_1X_3[/math] and [math]Z_2=X_2^3+X_4[/math] are independent and [math]Z_3=\sigma(X_1,X_3)[/math] and [math]Z_4=\sigma(X_2,X_4)[/math] are measurable. From above [math]\sigma(X_1,X_3)[/math] and [math]\sigma(X_2,X_4)[/math] are independent if for [math]X:\Omega\to\R[/math] we have that [math]Y[/math] is [math]\sigma(X)[/math] measurable if and only if [math]Y=f(X)[/math] with [math]f[/math] being a measurable map, i.e. if [math]Y[/math] is [math]\sigma(X_1,...,X_n)[/math] measurable, then [math]Y=f(X_1,....,X_n)[/math].

Proposition (Independence for an infinite family)

Let [math](\Omega,\A,\p)[/math] be a probability space. Let [math](\B_i)_{i\in I}[/math] be an infinite family of sub [math]\sigma[/math]-Algebras of [math]A[/math]. We say that the family [math](\B_i)_{i\in I}[/math] is independent if for all [math]\{i_1,..,i_p\}\in I[/math], [math]\B_{i_1},...,\B_{i_p}[/math] are independent. If [math](X_i)_{i\in I}[/math] is a family of r.v.'s we say that they are independent if [math](\sigma(X_i))_{i\in I}[/math] is independent.

Proposition

Let [math](\Omega,\A,\p)[/math] be a probability space. Let [math](X_n)_{n\geq 1}[/math] be a sequence of independent r.v.'s. Then for all [math]p\in\N[/math] we get that [math]p_1=\sigma(X_1,...,X_p)[/math] and [math]p_2=\sigma(X_{p+1},...,X_n)[/math] are independent.


Show Proof

Apply Proposition 5.9. to [math]\mathcal{C}_1=\sigma(X_1,...,X_p)[/math] and [math]\mathcal{C}_2=\bigcup_{k=p+1}^\infty\sigma(X_{p+1},...,X_n)\in\B_2[/math].

The Borel-Cantelli Lemma

Let [math](\Omega,\A,\p)[/math] be a probability space. Let [math](A_n)_{n\in\N}[/math] be a sequence of events in [math]\A[/math]. Recall that we can write

[[math]] \limsup_{n\to \infty} A_n=\bigcap_{n=0}^\infty\left(\bigcup_{k=n}^\infty A_k\right)\text{and}\liminf_{n\to \infty} A_n=\bigcup_{n=0}^\infty\left(\bigcap_{k=n}^\infty A_k\right). [[/math]]

Moreover, both are again measurable sets. For [math]\omega\in\limsup_n A_n[/math] we get that [math]\omega\in\bigcup_{k=n}^\infty A_k[/math], for all [math]n\geq 0[/math]. Moreover, for all [math]n\geq 0[/math], there exists a [math]k\geq n[/math] such that, [math]\omega\in A_n[/math] and [math]\omega[/math] is in infinitely many [math]A_k[/math]'s. For [math]\omega\in\liminf_n A_n[/math], we get that for all [math]n\geq 0[/math] such that [math]\omega\in\bigcap_{k=n}^\infty A_k[/math], there exists [math]n\geq 0[/math], such that for all [math]k\geq n[/math] we have [math]\omega\in A_k[/math], which shows that [math]\liminf_nA_n\subset \limsup_nA_n[/math].

Lemma (Borel-Cantelli)

Let [math](\Omega,\A,\p)[/math] be a probability space. Let [math](A_n)_{n\in\N}\in\A[/math] be a family of measurable sets.

  • If [math]\sum_{n\geq 1}\p[A_n] \lt \infty[/math], then
    [[math]] \p\left[\limsup_{n\to\infty} A_n\right]=0, [[/math]]
    which means that the set [math]\{n\in\N\mid \omega\in A_n\}[/math] is a.s. finite.
  • If [math]\sum_{n\geq 1}\p[A_n]=\infty[/math], and if the events [math](A_n)_{n\in\N}[/math] are independent, then
    [[math]] \p\left[\limsup_{n\to\infty} A_n\right]=1, [[/math]]
    which means that the set [math]\{n\in\N\mid \omega\in A_n\}[/math] is a.s. finite.


Show Proof

We need to show both points.

  • If [math]\sum_{n\geq 1}\p[A_n] \lt \infty,[/math] then, by Fubini, we get
    [[math]] \E\left[\sum_{n\geq 1}\one_{A_n}\right]=\sum_{n\geq 1}\p[A_n], [[/math]]
    which implies that [math]\sum_{n\geq 1}\one_{A_n} \lt \infty[/math] and [math]\one_{A_n}\not=0[/math] a.s. for finite numbers of [math]n[/math].
  • Fix [math]n_0\in\N[/math] and note that for all [math]n\geq n_0[/math] we have
    [[math]] \p\left[\bigcap_{k=n_0}^nA_k^C\right]=\prod_{k=n_0}^n\p[A_k^C]=\prod_{k=n_0}^n\p[1-A_n]. [[/math]]
    Now we see that
    [[math]] \sum_{n\geq 1}\p[A_n]=\infty [[/math]]
    and thus
    [[math]] \p\left[\bigcap_{k=n_0}^nA_k^C\right]=0. [[/math]]
    Since this is true for every [math]n_0[/math] we have that
    [[math]] \p\left[\bigcup_{n=0}^\infty\bigcap_{k=n_0}^\infty A_k^C\right]\leq \sum_{n\geq 1}\p[A_k^C]=0. [[/math]]
    Hence we get
    [[math]] \p\left[\bigcup_{n=0}^\infty\bigcap_{k=n_0}^\infty A_k^C\right]=\p\left[\bigcap_{n=0}^\infty\bigcup_{k=n}^\infty A_k\right]=\p\left[\limsup_{n\to\infty} A_n\right]=1. [[/math]]

Application 1

Let [math](\Omega,\A,\p)[/math] be a probability space. There does not exist a probability measure on [math]\N[/math] such that the probability of the set of multiples of an integer [math]n[/math] is [math]\frac{1}{n}[/math] for [math]n\geq 1[/math]. Let us assume that such a probability measure exists. Let [math]\tilde{p}[/math] denote the set of prime numbers. For [math]p\in\tilde{p}[/math] we note that [math]A_p=p\N[/math], i.e. the set of all multiples of [math]p[/math]. We first show that the sets [math](A_p)_{p\in\tilde{p}}[/math] are independent. Indeed let [math]p_1,...,p_n\in\tilde{p}[/math] be distinct. Then we have

[[math]] \p[p_1\N\cap\dotsm\cap p_n\N]=\p[p_1,...,p_n\N]=\frac{1}{p_1\dotsm p_n}=\p[p_1\N]\dotsm\p[p_n\N]. [[/math]]

Moreover it is known that

[[math]] \sum_{p\in\tilde{p}}\p[p\N]=\sum_{p\in\tilde{p}}\frac{1}{p}=\infty. [[/math]]

The second part of the Borel-Cantelli lemma implies that all integers [math]n[/math] belong to infinitely many [math]A_p[/math]'s. So it follows that [math]n[/math] is divisible by infinitely many distinct prime numbers.

Application 2

Let [math](\Omega,\A,\p)[/math] be a probability space. Let [math]X[/math] be an exponential r.v. with parameter [math]\lambda=1[/math]. Thus we know that [math]X[/math] has density [math]e^{-x}\one_{\R_+}(x)[/math]. Now consider a sequence [math](X_n)_{n\geq 1}[/math] of independent r.v.'s with the same distribution as [math]X[/math], i.e. for all [math]n\geq 1[/math],we have [math]X_n\sim X[/math]. Then [math]\limsup_n \frac{X_n}{\log(n)}=1[/math] a.s., i.e. there exists an [math]N\in\A[/math] such that [math]\p[N]=0[/math] and for [math]\omega\not\in N[/math] we get

[[math]] \limsup_{n\to\infty} \frac{X_n(\omega)}{\log(n)}=1. [[/math]]

Therefore we can compute the probability

[[math]] \p[X \gt t]=\int_t^\infty e^{-x}dx=e^{-t}. [[/math]]

Now let [math]\epsilon \gt 0[/math] and consider the sets [math]A_n=\{X_n \gt (1+\epsilon)\log(n)\}[/math] and [math]B_n=\{X_n \gt \log(n)\}[/math]. Then

[[math]] \p[A_n]=\p[X_n \gt (1+\epsilon)\log(n)]=\p[X \gt (1+\epsilon)\log(n)]=e^{-(1+\epsilon)\log(n)}=\frac{1}{n^{1+\epsilon}}. [[/math]]

This implies that

[[math]] \sum_{n\geq 1}\p[A_n] \lt \infty. [[/math]]

With the Borel-Cantelli lemma we get that [math]\p\left[\limsup_{n\to\infty} A_n\right]=0[/math]. Let us define

[[math]] N_{\epsilon}=\limsup_{n\to\infty} A_n. [[/math]]

Then we have [math]\p[N_\epsilon]=0[/math] for [math]\omega\not\in N_{\epsilon}[/math], which implies that there exists an [math]n_0(\omega)[/math] such that for all [math]n\geq n_0[/math] we have

[[math]] X_n(\omega)\leq (1+\epsilon)\log(n) [[/math]]

and thus for [math]\omega\not\in N_{\epsilon}[/math], we get [math]\limsup_{n\to\infty}\frac{X_n(\omega)}{\log(n)}\leq 1+\epsilon[/math]. Moreover, let

[[math]] N'=\bigcup_{\epsilon\in \Q_+}N_{\epsilon}. [[/math]]

Therefore we get [math]\p[N']\leq \sum_{\epsilon\in\Q_+}\p[N_{\epsilon}]=0[/math] for [math]\omega\not\in N'[/math]. Hence we get

[[math]] \limsup_{n\to\infty}\frac{X_n(\omega)}{\log(n)}\leq 1. [[/math]]

Now we note that the [math]B_n[/math]'s are independent, since [math]B_n\in\sigma(X_n)[/math] and the fact that the [math]X_n[/math]'s are independent. Moreover,

[[math]] \p[B_n]=\p[X_n \gt \log(n)]=\p[X \gt \log(n)]=\frac{1}{n}, [[/math]]

which gives that

[[math]] \sum_{n\geq 1}\p[B_n]=\infty. [[/math]]

Now we can use Borel-Cantelli to get

[[math]] \p\left[\limsup_{n\to\infty} B_n\right]=1. [[/math]]

If we denote [math]N''=\left(\limsup_{n\to\infty} B_n\right)^C[/math], then for [math]\omega\not\in N''[/math] we get that [math]X_n(\omega) \gt \log(n)[/math] for infinitely many [math]n[/math]. So it follows that for [math]\omega\not\in N''[/math] we have

[[math]] \limsup_{n\to\infty}\frac{X_n(\omega)}{\log(n)}\geq 1. [[/math]]

Finally, take [math]N=N'\cup N''[/math] to obtain [math]\p[N]=0[/math]. Thus for [math]\omega\not\in N[/math] we get

[[math]] \limsup_{n\to\infty} \frac{X_n(\omega)}{\log(n)}=1. [[/math]]

Sums of independent Random Variables

Let us first define the convolution of two probability measures. If [math]\mu[/math] and [math]\nu[/math] are two probability measures on [math]\R^d[/math], we denote by [math]\mu*\nu[/math] the image of the measure [math]\mu\otimes\nu[/math] by the application

[[math]] \R^d\times\R^d\to\R^d,(x,y)\mapsto x+y. [[/math]]

Moreover, for all measurable maps [math]\varphi:\R^d\to \R_+[/math], we have

[[math]] \int_{\R^d}\varphi(z)(\mu*\nu)(dz)=\iint_{\R^d}\varphi(x+y)\mu(dx)\nu(dy). [[/math]]

Proposition

Let [math](\Omega,\A,\p)[/math] be a probability space. Let [math]X[/math] and [math]Y[/math] be two independent r.v.'s with values in [math]\R^d[/math]. Then the following hold.

  • The law of [math]X+Y[/math] is given by [math]\p_X*\p_Y[/math]. In particular if [math]X[/math] has density [math]f[/math] and [math]Y[/math] has density [math]g[/math], then [math]X+Y[/math] has density [math]f*g[/math], where [math]*[/math] denotes the convolution product, which is given by
    [[math]] f*g(\xi)=\int_{\R^d} f(x)g(\xi-x)dx. [[/math]]
  • [math]\Phi_{X+Y}(\xi)=\Phi_X(\xi)\Phi_Y(\xi).[/math]
  • If [math]X[/math] and [math]Y[/math] are in [math]L^2(\Omega,\A,\p)[/math], we get
    [[math]] K_{X+Y}=K_X+K_Y. [[/math]]
    In particular when [math]d=1[/math], we obtain
    [[math]] Var(X+Y)=Var(X)+Var(Y). [[/math]]


Show Proof

We need to show all three points.

  • If [math]X[/math] and [math]Y[/math] are independent r.v.'s, then [math]\p_{(X,Y)}=\p_X\otimes\p_Y[/math]. Consequently, for all measurable maps [math]\varphi:\R^d\to\R_+[/math], we have
    [[math]] \begin{multline*} \E[\varphi(X+Y)]=\iint_{\R^d}\varphi(X+Y)\p_{(X,Y)}(dxdy)=\iint_{\R^d}\varphi(X+Y)\p_X(dx)\p_{Y}(dy)\\=\int_{\R^d}\varphi(\xi)(\p_X*\p_Y)(d\xi). \end{multline*} [[/math]]
    Now since [math]X[/math] and [math]Y[/math] have densities [math]f[/math] and [math]g[/math] respectively, we get
    [[math]] \E[\varphi(Z=X+Y)]=\iint_{\R^d}\varphi(X+Y)f(x)*g(y)dxdy=\iint_{\R^d}\varphi(\xi)\left(\int_{\R^d} f(x)g(\xi-x)dx\right)d\xi. [[/math]]
    Since this identity here is true for all measurable maps [math]\varphi:\R^d\to\R_+[/math], the r.v. [math]Z:=X+Y[/math] has density
    [[math]] h(\xi)=(f*g)(\xi)=\int_{\R^d}f(x)g(\xi-x)dx. [[/math]]
  • By definition of the characteristic function and the independence property, we get
    [[math]] \Phi_{X+Y}(\xi)=\E\left[e^{i\xi(X+Y)}\right]=\E\left[e^{i\xi X}e^{i\xi Y}\right]=\E\left[e^{i\xi X}\right]\E\left[e^{i\xi Y}\right]=\Phi_X(\xi)\Phi_Y(\xi). [[/math]]
  • If [math]X=(X_1,...,X_d)[/math] and [math]Y=(Y_1,...,Y_d)[/math] are independent r.v.'s on [math]\R^d[/math], we get that [math]Cov(X_i,Y_j)=0[/math], for all [math]0\leq i,j\leq d[/math]. By using the multi linearity of the covariance we get that
    [[math]] Cov(X_i+Y_i,X_j+Y_j)=Cov(X_i,X_j)+Cov(Y_j+Y_j), [[/math]]
    and hence [math]K_{X+Y}=K_X+K_Y[/math]. For [math]d=1[/math] we get
    [[math]] \begin{align*} Var(X+Y)&=\E[((X+Y)-\E[X+Y])^2]=\E[((X-\E[X])+(Y-\E[Y]))^2]\\ &=\underbrace{\E[(X-\E[X])^2]}_{Var(X)}+\underbrace{\E[(Y-\E[Y])^2]}_{Var(Y)}+\underbrace{2\E[(X-\E[X])(Y-\E[Y])]}_{2Cov(X,Y)} \end{align*} [[/math]]
    Now since [math]Cov(X,Y)=0[/math], we get the result.
Theorem (Weak law of large numbers)

Let [math](\Omega,\A,\p)[/math] be a probability space. Let [math](X_n)_{n\geq 1}[/math] be a sequence of independent r.v.'s. Moreover, write [math]\mu=\E[X_n][/math] for all [math]n\geq1[/math] and assume [math]\E[(X_n-\mu)^2]\leq C[/math] for all [math]n\geq1[/math] and for some constant [math]C \lt \infty[/math]. We also write [math]S_n=\sum_{j=1}^nX_j[/math] and [math]\tilde X_n=\frac{S_n}{n}[/math] for all [math]n\geq 1[/math]. Then for all [math]\epsilon \gt 0[/math]

[[math]] \p[\vert \tilde X_n-\mu\vert \gt \epsilon]\xrightarrow{n\to\infty}0. [[/math]]
Thus, we also have

[[math]] \E[S_n]=\frac{1}{n}\E\left[\sum_{j=1}^nX_j\right]=\frac{1}{n}n\E[X_j]=\E[X_j]. [[/math]]


Show Proof

We note that

[[math]] \E[(S_n-n\mu)^2]=\sum_{j=1}^n\E[(X_j-\mu)^2]\leq nC. [[/math]]
Hence for [math]\epsilon \gt 0[/math] we get by Markov's inequality

[[math]] \p[\vert \tilde X-\mu \gt \epsilon]=\p[(S_n-n\mu)^2 \gt (n\epsilon)^2]\leq \frac{\E[(S_n-n\mu)^2]}{n^2\epsilon^2}\leq \frac{C}{n\epsilon^2}\xrightarrow{n\to\infty}0 [[/math]]

Corollary

Let [math](\Omega,\A,\p)[/math] be a probability space. Let [math](A_n)_{n\geq 1}\in \A[/math] be a sequence of independent events with the same probabilities, i.e. [math]\p[A_n]=\p[A_m][/math], for all [math]n,m\geq 1[/math]. Then

[[math]] \lim_{n\to\infty}\frac{1}{n}\sum_{i=1}^\infty \one_{A_i}=\p[A_1]a.s. [[/math]]


Show Proof

Note that by the weak law of large numbers, we get for a sequence of independent r.v.'s [math](X_n)_{n\geq 1}[/math] with the same expectation for all [math]n\geq 1[/math]

[[math]] \lim_{n\to\infty}\E\left[\frac{1}{n}\sum_{j=1}^nX_j\right]=\E[X_1] [[/math]]
and thus we can take [math]X_j=\one_{A_j}[/math], since we know that [math]\E[\one_A]=\p[A][/math].

General references

Moshayedi, Nima (2020). "Lectures on Probability Theory". arXiv:2010.16280 [math.PR].