guide:35e8e36b92: Difference between revisions

From Stochiki
No edit summary
 
mNo edit summary
 
(One intermediate revision by one other user not shown)
Line 1: Line 1:
<div class="d-none"><math>
\newcommand{\R}{\mathbb{R}}
\newcommand{\A}{\mathcal{A}}
\newcommand{\B}{\mathcal{B}}
\newcommand{\N}{\mathbb{N}}
\newcommand{\C}{\mathbb{C}}
\newcommand{\Rbar}{\overline{\mathbb{R}}}
\newcommand{\Bbar}{\overline{\mathcal{B}}}
\newcommand{\Q}{\mathbb{Q}}
\newcommand{\E}{\mathbb{E}}
\newcommand{\p}{\mathbb{P}}
\newcommand{\one}{\mathds{1}}
\newcommand{\0}{\mathcal{O}}
\newcommand{\mat}{\textnormal{Mat}}
\newcommand{\sign}{\textnormal{sign}}
\newcommand{\CP}{\mathcal{P}}
\newcommand{\CT}{\mathcal{T}}
\newcommand{\CY}{\mathcal{Y}}
\newcommand{\F}{\mathcal{F}}
\newcommand{\mathds}{\mathbb}</math></div>


===Conditional probability===
Let <math>(\Omega,\F,\p)</math> be a probability space and let <math>A,B\in\F</math> such that <math>\p[B] > 0</math>. Then the conditional probability{{efn|One can look it up for more details in the stochastics I part.}} of <math>A</math> given <math>B</math> is defined as
<math display="block">
\p[A\mid B]=\frac{\p[A\cap B]}{\p[B]}.
</math>
The important fact here is that the application <math>\F\to [0,1]</math>, <math>A\mapsto \p[A\mid B]</math> defines a new probability measure on <math>\F</math> called the conditional probability given <math>B</math>. There are several facts, which we need to recall:
<ul style{{=}}"list-style-type:lower-roman"><li>If <math>A_1,...,A_n\in\F</math> and if <math>\p\left[\bigcap_{k=1}^nA_k\right] > 0</math>, then
<math display="block">
\p\left[\bigcap_{k=1}^nA_k\right]=\prod_{j=1}^n\p\left[A_j\Big|\bigcap_{k=1}^{j-1}A_k\right].
</math>
</li>
<li>Let <math>(E_n)_{n\geq 1}</math> be a measurable partition of <math>\Omega</math>, i.e. for all <math>n\geq 1</math> we have that <math>E_n\in\F</math> and for <math>n\not=m</math> we get <math>E_n\cap E_m=\varnothing</math> and <math>\bigcup_{n\geq 1}E_n=\Omega</math>. Now for <math>A\in \F</math> we get
<math display="block">
\p[A]=\sum_{n\geq 1}\p[A\mid E_n]\p[E_n].
</math>
</li>
<li>(Baye's formula){{efn|Use the previous facts for the proof of Baye's formula. One can also look it up in the stochastics I part.}} Let <math>(E_n)_{n\geq 1}</math> be a measurable partition of <math>\Omega</math> and <math>A\in\F</math> with <math>\p[A] > 0</math>. Then
<math display="block">
\p[E_n\mid A]=\frac{\p[A\mid E_n]\p[E_n]}{\sum_{m\geq 1}\p[A\mid E_m]\p[E_m]}.
</math>
</li>
</ul>
{{alert-info |
We can reformulate the definition of the conditional probability to obtain
<math display="block">
\begin{align*}
\p[A\mid B]\p[B]&=\p[A\cap B]\\
\p[B\mid A]\p[A]&=\p[A\cap B]
\end{align*}
</math>
Therefore one can prove the statements (1) to (3) by using these two equations{{efn|One also has to notice that if <math>A</math> and <math>B</math> are two independent events, then <math>\p[A\mid B]=\frac{\p[A\cap B]}{\p[B]}=\frac{\p[A]\p[B]}{\p[B]}=\p[A]</math>}}.
}}
===Discrete construction of the conditional expectation===
Let <math>X</math> and <math>Y</math> be two r.v.'s on a probability space <math>(\Omega,\F,\p)</math>. Let <math>Y</math> take values in <math>\R</math> and <math>X</math> take values in a countable discrete set <math>\{x_1,x_2,...,x_n,...\}</math>. The goal is to describe the expectation of the r.v. <math>Y</math> by knowing the observed r.v. <math>X</math>. For instance, let <math>X=x_j\in\{x_1,x_2,...,x_n,...\}</math>. Therefore we look at a set <math>\{\omega\in\Omega\mid X(\omega)=x_j\}</math> rather than looking at whole <math>\Omega</math>. For <math>\Lambda\in\F</math>, we thus define
<math display="block">
\Q[\Lambda]=\p[\Lambda\mid \{X=x_j\}],
</math>
a new probability measure <math>\Q</math>, with <math>\p[X=x_j] > 0</math>. Therefore it makes more sense to compute
<math display="block">
\E_\Q[Y]=\int_\Omega Y(\omega)d\Q(\omega)=\int_{\{\omega\in\Omega\mid X(\omega)=x_j\}}Y(\omega)d\p(\omega)
</math>
rather than
<math display="block">
\E_\p[Y]=\int_\Omega Y(\omega)d\p(\omega)=\int_\R yd\p_Y(y).
</math>
{{definitioncard|Conditional expectation (<math>X</math> discrete, <math>Y</math> real valued, single value case)|Let <math>(\Omega,\F,\p)</math> be a probability space. Let <math>X:\Omega\to\{x_1,x_2,...,x_n,...\}</math> be a r.v. taking values in a discrete set and let <math>Y</math> be a real valued r.v. on that space. If <math>\p[X=x_j] > 0</math>, we can define the conditional expectation of <math>Y</math> given <math>\{X=x_j\}</math> to be
<math display="block">
\E[Y\mid X=x_j]=\E_\Q[Y],
</math>
where <math>\Q</math> is the probability measure on <math>\F</math> defined by
<math display="block">
\Q[\Lambda]=\p[\Lambda\mid X=x_j],
</math>
for <math>\Lambda\in\F</math>, provided that <math>\E_\Q[\vert Y\vert] < \infty</math>.}}
{{proofcard|Theorem (Conditional expectation (<math>X</math> discrete, <math>Y</math> discrete, single value case))|thm-1|Let <math>(\Omega,\F,\p)</math> be a probability space. Let <math>X</math> be a r.v. on that space with values in <math>\{x_1,x_2,...,x_n,...\}</math> and let <math>Y</math> also be a r.v. with values in <math>\{y_1,y_2,...,y_n,...\}</math>. If <math>\p[X=x_j] > 0</math>, we can write the conditional expectation of <math>Y</math> given <math>\{X=x_j\}</math> as
<math display="block">
\E[Y\mid X=x_j]=\sum_{k=1}^\infty y_k\p[Y=y_k\mid X=x_j].
</math>
provided that the series is absolutely convergent.
|Apply the definitions above to obtain
<math display="block">
\E[Y\mid X=x_j]=\E_\Q[Y]=\sum_{k=1}^\infty y_k\Q[Y=y_k]=\sum_{k=1}^\infty y_k\p[Y=y_k\mid X=x_j]
</math>}}
Now let again <math>X</math> be a r.v. with values in <math>\{x_1,x_2,...,x_n,...\}</math> and <math>Y</math> a real valued r.v. The next step is to define <math>\E[Y\mid X]</math> as a function <math>f(X)</math>. Therefore we introduce the function
<math display="block">
\begin{equation}
f:\{x_1,x_2,...,x_n,...\}\to \Rf(x)=\begin{cases}\E[Y\mid X=x],&\p[X=x] > 0\\ \text{any value in $\R$},&\p[X=x]=0\end{cases}
\end{equation}
</math>
{{alert-info |It doesn't matter which value we assign to <math>f</math> for <math>\p[X=x]=0</math>, since it doesn't affect the expectation because it's defined on a null set. For convention we want to assign to it the value 0.}}
{{definitioncard|Conditional expectation (<math>X</math> discrete, <math>Y</math> real valued, complete case)|Let <math>(\Omega,\F,\p)</math> be a probability space. Let <math>X</math> be a countably valued r.v. and let <math>Y</math> be a real valued r.v. The conditional expectation of <math>Y</math> given <math>X</math> is defined by
<math display="block">
\E[Y\mid X]=f(X),
</math>
with <math>f</math> as in (2), provided that for all <math>j</math>: if <math>\Q_j[\Lambda]=\p[\Lambda\mid X=x_j]</math>, with <math>\p[X=x_j] > 0</math>, we get <math>\E_{\Q_j}[\vert Y\vert] < \infty</math>.}}
{{alert-info |
The above definition does not define <math>\E[Y\mid X]</math> everywhere but rather almost everywhere, since on each set <math>\{X=x\}</math>, where <math>\p[X=x]=0</math>, its value is arbitrary.
}}
'''Example'''
Let{{efn|Recall that this means that <math>X</math> is Poisson distributed: <math>\p[X=k]=e^{-\lambda}\frac{\lambda^k}{k!}</math> for <math>k\in\N</math>}} <math>X\sim\Pi(\lambda)</math>. Let us consider a tossing game, where we say that when <math>X=n</math>, we do <math>n</math> independent tossing of a coin where each time one obtains 1 with probability <math>p\in[0,1]</math> and 0 with probability <math>1-p</math>. Define also <math>S</math> to be the r.v. giving the total number of 1 obtained in the game. Therefore, if <math>X=n</math> is given, we get that <math>S</math> is binomial distributed with parameters <math>(p,n)</math>. We want to compute
<ul style{{=}}"list-style-type:lower-roman"><li><math>\E[S\mid X]</math>
</li>
<li><math>\E[X\mid S]</math>
</li>
</ul>
{{alert-info |
It is more natural to ask for the expectation of the amount of 1 obtained for the whole game by knowing how many games were played. The reverse is a bit more difficult. Logically, we may also notice that it definitely doesn't make sense to say <math>S\geq X</math>, because we can not obtain more wins in a game than the amount of games that were played.
}}
<ul style{{=}}"list-style-type:lower-roman"><li>First we compute <math>\E[S\mid X=n]</math>: If <math>X=n</math>, we know that <math>S</math> is binomial distributed with parameters <math>(p,n)</math> (<math>S\sim \B(p,n)</math>) and therefore we already know{{efn|If <math>X\sim \B(p,n)</math> then <math>\E[X]=pn</math>. For further calculation, one can look it up in the stochastics I notes}}
<math display="block">
\E[S\mid X=n]=pn.
</math>
Now we need to identify the function <math>f</math> defined as in (2) by
<math display="block">
\begin{align*}
f:\N&\longrightarrow\R\\
n&\longmapsto pn.
\end{align*}
</math>
Therefore we get by definition
<math display="block">
\E[S\mid X]=pX.
</math>
</li>
<li>
Next we want to compute <math>\E[X\mid S=k]</math>: For <math>n\geq k</math> we have
<math display="block">
\p[X=n\mid S=k]=\frac{\p[S=k\mid X=n]\p[X=n]}{\p[S=k]}=\frac{\binom{n}{k} p^k(1-p)^{n-k}e^{-\lambda}\frac{\lambda^n}{n!}}{\sum_{m=k}^\infty\binom{m}{k}p^k(1-p)^{m-k}e^{-\lambda}\frac{\lambda^m}{m!}},
</math>
since <math>\{S=k\}=\bigsqcup_{m\geq k}\{S=k,X=m\}</math>. By some algebra we obtain that
<math display="block">
\frac{\binom{n}{k}p^k(1-p)^{n-k}e^{-\lambda}\frac{\lambda^n}{n!}}{\sum_{m=k}^\infty\binom{m}{k}p^k(1-p)^{m-k}e^{-\lambda}\frac{\lambda^m}{m!}}=\frac{(\lambda(1-p))^{n-k}e^{-\lambda(1-p)}}{(n-k)!}
</math>
Hence we get that
<math display="block">
\E[X\mid S=k]=\sum_{n\geq k}n\p[X=n\mid S=k]=k+\lambda(1-p).
</math>
Therefore <math>\E[X\mid S]=S+\lambda(1-p)</math>.
</li>
</ul>
===Continuous construction of the conditional expectation===
Now we want to define <math>\E[Y\mid X]</math>, where <math>X</math> is no longer assumed to be countably valued. Therefore we want to recall the following two facts:
{{definitioncard|<math>\sigma</math>-Algebra generated by a random variable|Let <math>(\Omega,\F,\p)</math> be a probability space. Let <math>X:(\Omega,\F,\p)\to (\R^n,\B(\R^n),\lambda)</math> be a r.v. on that space. The <math>\sigma</math>-Algebra generated by <math>X</math> is given by
<math display="block">
\sigma(X)=X^{-1}(\B(\R^n))=\{A\in\Omega\mid A=X^{-1}(B),B\in\B(\R^n)\}.
</math>}}
{{proofcard|Theorem|thm-2|Let <math>(\Omega,\F,\p)</math> be a probability space. Let <math>X:(\Omega,\F,\p)\to(\R^n,\B(\R^n),\lambda)</math> be a r.v. on that space and let <math>Y</math> be a real valued r.v. on that space. <math>Y</math> is measurable with respect to <math>\sigma(X)</math> if and only if there exists a Borel measurable function <math>f:\R^n\to\R</math> such that
<math display="block">
Y=f(X).
</math>|}}
{{alert-info | We want to make use of the fact that for the Hilbert space <math>L^2(\Omega,\F,\p)</math> we get that <math>L^2(\Omega,\sigma(X),\p)\subset L^2(\Omega,\F,\p)</math> is a complete subspace, since <math>\sigma(X)\subset\F</math>. This allows us to use the orthogonal projections and to interpret the conditional expectation as such a projection.
}}
{{definitioncard|Conditional expectation (as a projection onto a closed subspace)|Let <math>(\Omega,\F,\p)</math> be a probability space. Let <math>Y\in L^2(\Omega,\F,\p)</math>. Then the conditional expectation of <math>Y</math> given <math>X</math> is the unique element <math>\hat Y\in L^2(\Omega,\sigma(X),\p)</math> such that for all <math>Z\in L^2(\Omega,\sigma(X),\p)</math>
<math display="block">
\begin{equation}
\E[YZ]=\E[\hat Y Z].
\end{equation}
</math>
This result is due to the fact that if <math>Y-\hat Y\in L^2(\Omega,\F,\p)</math> then for all <math>Z\in L^2(\Omega,\sigma(X),\p)</math> we get <math>\langle Y-\hat Y,Z\rangle=0</math>. We write <math>\E[Y\mid X]</math> for <math>\hat Y</math>.}}
{{alert-info |
<math>\hat Y</math> is the orthogonal projection of <math>Y</math> onto <math>L^2(\Omega,\sigma(X),\p)</math>.
}}
{{alert-info |
Since <math>X</math> takes values in <math>\R^n</math>, there exists a Borel measurable function <math>f:\R^n\to\R</math> such that
<math display="block">
\E[Y\mid X]=f(X)
</math>
with <math>\E[f^2(X)] < \infty</math>. We can also rewrite (3) as: for all Borel measurable <math>g:\R^n\to\R</math>, such that <math>\E[g^2(X)] < \infty</math>, we get
<math display="block">
\E[Yg(X)]=\E[f(X)g(X)].
</math>
}}
Now let <math>\mathcal{G}\subset\F</math> be a sub <math>\sigma</math>-Algebra of <math>\F</math> and consider the space <math>L^2(\Omega,\mathcal{G},\p)\subset L^2(\Omega,\F,\p)</math>. It is clear that <math>L^2(\Omega,\mathcal{G},\p)</math> is a Hilbert space and thus we can project to it.
{{definitioncard|Conditional expectation (projection case)|Let <math>(\Omega,\F,\p)</math> be a probability space. Let <math>Y\in L^2(\Omega,\F,\p)</math> and let <math>\mathcal{G}\subset\F</math> be a sub <math>\sigma</math>-Algebra of <math>\F</math>. Then the conditional expectation of <math>Y</math> given <math>\mathcal{G}</math> is defined as the unique element <math>\E[Y\mid \mathcal{G}]\in L^2(\Omega,\mathcal{G},\p)</math> such that for all <math>Z\in L^2(\Omega,\mathcal{G},\p)</math>
<math display="block">
\begin{equation}
\label{4}
\E[YZ]=\E[\E[Y\mid \mathcal{G}]Z].
\end{equation}
</math>
}}
{{alert-info |
In (3) or (1), it is enough{{efn|Since we can always consider linear combinations of <math>\one_A</math> and then apply density theorems to it}} to restrict the test r.v. <math>Z</math> to the class of r.v.'s of the form
<math display="block">
Z=\one_A,A\in\mathcal{G}.
</math>
}}
{{alert-info |
The conditional expectation is in <math>L^2</math>, so it's only defined a.s. and not everywhere in a unique way. So in particular, any statement like <math>\E[Y\mid\mathcal{G}]\geq0</math> or <math>\E[Y\mid \mathcal{G}]=Z</math> has to be understood with an implicit a.s.
}}
{{proofcard|Theorem|thm-3|Let <math>(\Omega,\F,\p)</math> be a probability space. Let <math>Y\in L^2(\Omega,\F,\p)</math> and let <math>\mathcal{G}\subset \F</math> be a sub <math>\sigma</math>-Algebra of <math>\F</math>.
<ul style{{=}}"list-style-type:lower-roman"><li>If <math>Y\geq 0</math>, then <math>\E[Y\mid \mathcal{G}]\geq 0</math>
</li>
<li><math>\E[\E[Y\mid\mathcal{G}]]=\E[Y]</math>
</li>
<li>The map <math>Y\mapsto\E[Y\mid\mathcal{G}]</math> is linear.
</li>
</ul>
|For <math>(i)</math> take <math>Z=\one_{\{\E[Y\mid\mathcal{G}] < 0\}}</math> to obtain
<math display="block">
\underbrace{\E[YZ]}_{\geq 0}=\underbrace{\E[\E[Y\mid \mathcal{G}]Z]}_{\leq  0}.
</math>
This implies that <math>\p[\E[Y\mid \mathcal{G}] < 0]=0</math>. For <math>(ii)</math> take <math>Z=\one_{\Omega}</math> and plug into (4). For <math>(iii)</math> notice that linearity comes from the orthogonal projection operator. But we can also do it directly by taking <math>Y,Y'\in L^2(\Omega,\F,\p)</math>, <math>\alpha,\beta\in \R</math> and <math>Z\in L^2(\Omega,\mathcal{G},\p)</math> to obtain
<math display="block">
\E[(\alpha Y+\beta Y')Z]=\E[YZ]+\beta\E[Y'Z]=\alpha\E[\E[Y\mid\mathcal{G}]Z]+\beta\E[\E[Y'\mid\mathcal{G}]Z]=\E[(\alpha\E[Y\mid\mathcal{G}]+\beta\E[Y'\mid\mathcal{G}])Z].
</math>
Now we can conclude by using the uniqueness property that
<math display="block">
\E[\alpha Y+\beta Y'\mid \mathcal{G}]=\alpha\E[Y\mid \mathcal{G}]+\beta\E[Y'\mid \mathcal{G}].
</math>}}
Now we want to extend the definition of the conditional expectation to r.v.'s in <math>L^1(\Omega,\F,\p)</math> or to <math>L^+(\Omega,\F,\p)</math>, which is the space of non negative r.v.'s allowing the value <math>\infty</math>.
{{proofcard|Lemma|lem-1|Let <math>(\Omega,\F,\p)</math> be a probability space. Let <math>Y\in L^+(\Omega,\F,\p)</math> and let <math>\mathcal{G}\subset \F</math> be a sub <math>\sigma</math>-Algebra of <math>\F</math>. Then there exists a unique element <math>\E[Y\mid \mathcal{G}]\in L^+(\Omega,\mathcal{G},\p)</math> such that for all <math>X\in L^+(\Omega,\mathcal{G},\p)</math>
<math display="block">
\begin{equation}
\E[YX]=\E[\E[Y\mid \mathcal{G}]X]
\end{equation}
</math>
and this conditional expectation agrees with the previous definition when <math>Y\in L^2(\Omega,\F,\p)</math>. Moreover, if <math>0\leq  Y\leq  Y'</math>, then
<math display="block">
\E[Y\mid \mathcal{G}]\leq \E[Y'\mid \mathcal{G}].
</math>|If <math>Y\leq  0</math> and <math>Y\in L^2(\Omega,\F,\p)</math>, then we define <math>\E[Y\mid\mathcal{G}]</math> as before. If <math>X\in L^+(\Omega,\mathcal{G},\p)</math>, we get that <math>X_n=X\land n</math>, is in <math>L^2(\Omega,\mathcal{G},\p)</math> and is positive with <math>X_n\uparrow X</math> for <math>n\to\infty</math>. Using the monotone convergence theorem we get
<math display="block">
\E[YX]=\E[Y\lim_{n\to\infty}X_n]=\lim_{n\to\infty}\E[YX_n]=\lim_{n\to\infty}\E[\E[Y\mid\mathcal{G}]X_n]=\E[\E[Y\mid\mathcal{G}]\lim_{n\to\infty}X]=\E[\E[Y\mid\mathcal{G}]X].
</math>
This shows that (5) is true whenever <math>Y\in L^2(\Omega,\F,\p)</math> with <math>Y\geq 0</math> and <math>X\in L^+(\Omega,\mathcal{G},\p)</math>. Now let <math>Y\in L^1(\Omega,\F,\p)</math>. Define <math>Y_m=Y\land m</math>. Hence we get <math>Y_m\in L^2(\Omega,\F,\p)</math> and <math>Y_m\uparrow Y</math> as <math>n\to\infty</math>. Each <math>\E[Y_m\mid\mathcal{G}]</math> is well defined{{efn|because for <math>Y\in L^2</math> and <math>U\in L^2</math> we get <math>Y\geq U\Longrightarrow Y-U\geq 0\Longrightarrow \E[Y\mid\mathcal{G}]\geq \E[U\mid\mathcal{G}]</math>}} and positive and increasing. We define
<math display="block">
\E[Y\mid\mathcal{G}]=\lim_{n\to\infty}\E[Y_m\mid \mathcal{G}].
</math>
Several applications of the monotone convergence theorem will give us for <math>X\in L^+(\Omega,\mathcal{G},\p)</math>
<math display="block">
\E[YX]=\lim_{m\to\infty}\E[Y_mX]=\lim_{m\to\infty}\E[\E[Y_m\mid\mathcal{G}]X]=\E[\E[Y\mid \mathcal{G}]X].
</math>
Furthermore if <math>0\leq  Y\leq  Y'</math>, then <math>Y\land m\leq  Y'\land m</math> and therefore
<math display="block">
\E[Y\mid\mathcal{G}]\leq \E[Y'\mid\mathcal{G}].
</math>
Now we need to show uniqueness{{efn|Note that for any <math>W\in L^+</math>, the set <math>E</math> on which <math>W=\infty</math> is a null set. For suppose not, then <math>\E[W]\geq \E[\infty \one_E]=\infty\p[E]</math>. But since <math>\p[E] > 0</math> this cannot happen}}. Let <math>U</math> and <math>V</math> be two versions of <math>\E[Y\mid \mathcal{G}]</math>. Let
<math display="block">
\Lambda_n=\{U < V\leq  n\}\in\mathcal{G}
</math>
and assume <math>\p[\Lambda_n] > 0</math>. We then have
<math display="block">
\E[Y\one_{\Lambda_n}]=\underbrace{\E[U\one_{\Lambda_n}]=\E[V\one_{\Lambda_n}]}_{\E[(U-V)\one_{\Lambda_n}]=0}.
</math>
This contradicts the fact that <math>\p[\Lambda_n] > 0</math>. Moreover, <math>\{U < V\}=\bigcup_{n\geq 1}\Lambda_n</math> and therefore
<math display="block">
\p[U < V]=0
</math>
and similarly <math>\p[V < U]=0</math>. This implies
<math display="block">
\p[U=V]=1.
</math>}}
{{proofcard|Theorem|thm-4|Let <math>(\Omega,\F,\p)</math> be a probability space. Let <math>Y\in L^1(\Omega,\F,\p)</math> and let <math>\mathcal{G}\subset\F</math> be a sub <math>\sigma</math>-Algebra of <math>\F</math>. Then there exists a unique element <math>\E[Y\mid \mathcal{G}]\in L^1(\Omega,\mathcal{G},\p)</math> such that for every <math>X</math> bounded and <math>\mathcal{G}</math>-measurable
<math display="block">
\begin{equation}
\E[YX]=\E[\E[Y\mid \mathcal{G}]X].
\end{equation}
</math>
This conditional expectation agrees with the definition for the <math>L^2</math>. Moreover it satisfies:
<ul style{{=}}"list-style-type:lower-roman"><li>If <math>Y\geq 0</math>, then <math>\E[Y\mid\mathcal{G}]\geq 0</math>
</li>
<li>The map <math>Y\mapsto \E[Y\mid\mathcal{G}]</math> is linear.
</li>
</ul>
|We will only prove the existence, since the rest is exactly the same as before. Write <math>Y=Y^+-Y^-</math> with <math>Y^+,Y^-\in L^1(\Omega,\F,\p)</math> and <math>Y^+,Y^-\geq 0</math>. So <math>\E[Y^+\mid\mathcal{G}]</math> and <math>\E[Y^-\mid\mathcal{G}]</math> are well defined. Now we set
<math display="block">
\E[Y\mid\mathcal{G}]=\E[Y^+\mid \mathcal{G}]-\E[Y^-\mid\mathcal{G}].
</math>
This is well defined because
<math display="block">
\E[\E[Y^\pm\mid\mathcal{G}]]=\E[Y^\pm] < \infty
</math>
if we let <math>X=\one_\Omega</math> in the previous lemma and therefore <math>\E[Y^+\mid\mathcal{G}]</math> and <math>\E[Y^-\mid \mathcal{G}]\in L^1(\Omega,\mathcal{G},\p)</math>. For all <math>X</math> bounded and <math>\mathcal{G}</math>-measurable we can also write <math>X=X^+-X^-</math> and it follows from the previous lemma that
<math display="block">
\E[\E[Y^\pm\mid\mathcal{G}]X]=\E[Y^\pm X].
</math>
This implies that <math>\E[Y\mid\mathcal{G}]</math> satisfies (6).}}
{{proofcard|Corollary|cor-1|Let <math>(\Omega,\F,\p)</math> be a probability space. Let <math>X\in L^1(\Omega,\F,\p)</math> be a r.v. on that space. Then
<math display="block">
\E[\E[X\mid\mathcal{G}]]=\E[X].
</math>
|Take equation (4) and set <math>Z=\one_\Omega</math>.}}
{{proofcard|Corollary|cor-2|Let <math>(\Omega,\F,\p)</math> be a probability space. Let <math>X\in L^1(\Omega,\F,\p)</math> be a r.v. on that space. Then
<math display="block">
\vert\E[X\mid\mathcal{G}]\vert\leq \E[\vert X\vert\mid\mathcal{G}].
</math>
In particular
<math display="block">
\E[\vert\E[X\mid\mathcal{G}]\vert]\leq \E[\vert X\vert].
</math>
|We can always write <math>X=X^+-X^-</math> and also <math>\vert X\vert=X^++X^-</math>. Therefore we get
<math display="block">
\vert\E[X\mid\mathcal{G}]\vert=\vert\E[X^+\mid\mathcal{G}]-\E[X^-\mid\mathcal{G}]\vert\leq  \E[X^+\mid\mathcal{G}]+\E[X^-\mid\mathcal{G}]=\E[X^++X^-\mid\mathcal{G}]=\E[\vert X\vert\mid\mathcal{G}].
</math>}}
{{proofcard|Proposition|prop-1|Let <math>(\Omega,\F,\p)</math> be a probability space. Let <math>Y\in L^1(\Omega,\F,\p)</math> be a r.v. on that space and assume that <math>Y</math> is independent of the sub <math>\sigma</math>-Algebra <math>\mathcal{G}\subset\F</math>, i.e. <math>\sigma(Y)</math> is independent of <math>\mathcal{G}</math>. Then
<math display="block">
\E[Y\mid\mathcal{G}]=\E[Y].
</math>
|Let <math>Z</math> be a bounded and <math>\mathcal{G}</math>-measurable r.v. and therefore <math>Y</math> and <math>Z</math> are independent. Hence we get
<math display="block">
\E[YZ]=\E[Y]\E[Z]=\E[\E[Y]Z].
</math>
This implies that, since <math>\E[Y]</math> is constant, that <math>\E[Y]\in L^1(\Omega,\mathcal{G},\p)</math> and satisfies (4). Therefore by uniqueness we get that <math>\E[Y\mid\mathcal{G}]=\E[Y]</math>.}}
{{proofcard|Theorem|thm-5|Let <math>(\Omega,\F,\p)</math> be a probability space. Let <math>X</math> and <math>Y</math> be two r.v.'s on that space and let <math>\mathcal{G}\subset\F</math> be a sub <math>\sigma</math>-Algebra of <math>\F</math>. Assume further that at least one of these two holds:
<ul style{{=}}"list-style-type:lower-roman"><li><math>X,Y</math> and <math>XY</math> are in <math>L^1(\Omega,\F,\p)</math> with <math>X</math> being <math>\mathcal{G}</math>-measurable.
</li>
<li><math>X\geq 0</math>, <math>Y\geq 0</math> with <math>X</math> being <math>\mathcal{G}</math>-mearuable.
</li>
</ul>
Then
<math display="block">
\E[XY\mid\mathcal{G}]=\E[Y\mid\mathcal{G}]X.
</math>
In particular, if <math>X</math> is a positive r.v. or in <math>L^1(\Omega,\mathcal{G},\p)</math> and <math>\mathcal{G}</math>-measurable, then
<math display="block">
\E[X\mid\mathcal{G}]=X.
</math>|For <math>(ii)</math> assume first that <math>X,Y\leq  0</math>. Let <math>Z</math> be a positive and <math>\mathcal{G}</math>-measurable r.v. Then we can obtain
<math display="block">
\E[(XY)Z]=\E[Y(XZ)]=\E[\E[Y\mid\mathcal{G}]XZ]=\E[(\E[Y\mid\mathcal{G}]X)Z].
</math>
Note that <math>\E[(\E[Y\mid\mathcal{G}]X)Z]</math> is a positive r.v. and <math>\mathcal{G}</math>-measurable. Hence <math>\E[XY\mid\mathcal{G}]=X\E[Y\mid\mathcal{G}]</math>. For <math>(i)</math> we can write <math>X=X^++X^-</math> and use <math>(ii)</math>. This is an easy exercise.}}
{{alert-info |Next we want to show that the classical limit theorems from measure theory also make sense in terms of the conditional expectation{{efn|Recall the classical limit theorems for integrals: <math>Monotone</math> <math>convergence:</math> Let <math>(f_n)_{n\geq 1}</math> be an  increasing sequence of positive and measurable functions and let <math>f=\lim_{n\to\infty}\uparrow f_n</math>. Then <math>\int fd\mu=\lim_{n\to\infty}f_nd\mu</math>. <math>Fatou:</math> Let <math>(f_n)_{n\geq 1}</math> be a sequence of measurable and positive functions. Then <math>\int\liminf_n f_n d\mu\leq  \liminf_n \int f_nd\mu</math>. <math>Dominated</math> <math>convergence:</math> Let <math>(f_n)_{n\geq 1}</math> be a sequence of integrable functions with <math>\vert f_n\vert\leq  g</math> for all <math>n</math> with <math>g</math> integrable. Denote <math>f=\lim_{n\to\infty}f_n</math>. Then <math>\lim_{n\to\infty}\int f_nd\mu=\int fd\mu</math>}}.
}}
{{proofcard|Theorem (Limit theorems for the conditional expectation)|thm-6|Let <math>(\Omega,\F,\p)</math> be a probability space. Let <math>(Y_n)_{n\geq 1}</math> be a sequence of r.v.'s on that space and let <math>\mathcal{G}\subset\F</math> be a sub <math>\sigma</math>-Algebra of <math>\F</math>. Then we have:
<ul style{{=}}"list-style-type:lower-roman"><li>(''Monotone convergence'') Assume that <math>(Y_n)_{n\geq 1}</math> is a sequence of positive r.v.'s for all <math>n</math> such that <math>\lim_{n\to\infty}\uparrow Y_n=Y</math> a.s. Then
<math display="block">
\lim_{n\to\infty}\E[Y_n\mid\mathcal{G}]=\E[Y\mid \mathcal{G}].
</math>
</li>
<li>(''Fatou'') Assume that <math>(Y_n)_{n\geq 1}</math> is a sequence of positive r.v.'s for all <math>n</math>. Then
<math display="block">
\E[\liminf_n Y_n\mid\mathcal{G}]=\liminf_n\E[Y_n\mid\mathcal{G}].
</math>
</li>
<li>(''Dominated convergence'') Assume that <math>Y_n\xrightarrow{n\to\infty}Y</math> a.s. and that there exists <math>Z\in L^1(\Omega,\F,\p)</math> such that <math>\vert Y_n\vert\leq  Z</math> for all <math>n</math>. Then
<math display="block">
\lim_{n\to\infty}\E[Y_n\mid \mathcal{G}]=\E[Y\mid\mathcal{G}].
</math>
</li>
</ul>
|We will only prove <math>(i)</math>, since <math>(ii)</math> and <math>(iii)</math> are proved in a similar way (it's a good exercise to do the proof). Since <math>(Y_n)_{n\geq 1}</math> is an increasing sequence, it follows that
<math display="block">
\E[Y_{n+1}\mid\mathcal{G}]\geq \E[Y_n\mid\mathcal{G}].
</math>
Hence we can deduce that <math>\lim_{n\to\infty}\uparrow \E[Y_n\mid\mathcal{G}]</math> exists and we denote it by <math>Y'</math>. Moreover, note that <math>Y'</math> is <math>\mathcal{G}</math>-measurable, since it is a limit of <math>\mathcal{G}</math>-measurable r.v.'s. Let <math>X</math> be a positive and <math>\mathcal{G}</math>-measurable r.v. and obtain then
<math display="block">
\E[Y'X]=\E[\lim_{n\to\infty}\E[Y_n\mid\mathcal{G}]X]=\lim_{n\to\infty}\uparrow\E[\E[Y_n\mid\mathcal{G}]X]=\lim_{n\to\infty}\E[Y_n X]=\E[YX],
</math>
where we have used monotone convergence twice and equation (4). Therefore we get
<math display="block">
\lim_{n\to\infty}\E[Y_n\mid\mathcal{G}]=\E[Y\mid\mathcal{G}].
</math>}}
{{proofcard|Theorem (Jensen's inequality)|thm-7|Let <math>(\Omega,\F,\p)</math> be a probability space. Let <math>\varphi:\R\to\R</math> be a real, convex function. Let <math>X \in L^1(\Omega,\F,\p)</math> such that <math>\varphi(X)\in L^1(\Omega,\F,\p)</math>. Then
<math display="block">
\varphi(\E[X\mid\mathcal{G}])\leq  \E[\varphi(X)\mid\mathcal{G}]
</math>
for all sub <math>\sigma</math>-Algebras <math>\mathcal{G}\subset\F</math>.
|Exercise.}}
'''Example'''
Let <math>(\Omega,\F,\p)</math> be a probability space. Let <math>\varphi(X)=X^2</math> and let <math>X\in L^2(\Omega,\F,\p)</math>. Then
<math display="block">
(\E[X\mid \mathcal{G}])^2\leq  \E[X^2\mid\mathcal{G}]
</math>
for all sub <math>\sigma</math>-Algebras <math>\mathcal{G}\subset \F</math>.
{{proofcard|Theorem (Tower property)|thm-8|Let <math>(\Omega,\F,\p)</math> be a probability space. Let <math>X\in L^1(\Omega,\F,\p)</math> be a positive r.v. on that space. Let <math>\mathcal{C}\subset\mathcal{G}\subset \F</math> be a tower of sub <math>\sigma</math>-Algebras of <math>\F</math>. Then
<math display="block">
\E[\E[X\mid\mathcal{G}]\mathcal{C}]=\E[X\mid\mathcal{C}].
</math>|Let <math>Z</math> be a bounded and <math>\mathcal{C}</math>-measurable r.v. Then we obtain
<math display="block">
\E[XZ]=\E[\E[X\mid\mathcal{C}]Z].
</math>
But <math>Z</math> is also <math>\mathcal{G}</math>-measurable and hence we get
<math display="block">
\E[XZ]=\E[\E[X\mid\mathcal{G}]Z].
</math>
Therefore, for all <math>Z</math> bounded and <math>\mathcal{C}</math>-measurable r.v.'s, we get
<math display="block">
\E[\E[X\mid\mathcal{G}]Z]=\E[\E[X\mid\mathcal{C}]Z]
</math>
and thus
<math display="block">
\E[\E[X\mid\mathcal{G}]\mathcal{C}]=\E[X\mid\mathcal{C}].
</math>}}
==General references==
{{cite arXiv|last=Moshayedi|first=Nima|year=2020|title=Lectures on Probability Theory|eprint=2010.16280|class=math.PR}}
==Notes==
{{notelist}}

Latest revision as of 17:39, 8 May 2024

[math] \newcommand{\R}{\mathbb{R}} \newcommand{\A}{\mathcal{A}} \newcommand{\B}{\mathcal{B}} \newcommand{\N}{\mathbb{N}} \newcommand{\C}{\mathbb{C}} \newcommand{\Rbar}{\overline{\mathbb{R}}} \newcommand{\Bbar}{\overline{\mathcal{B}}} \newcommand{\Q}{\mathbb{Q}} \newcommand{\E}{\mathbb{E}} \newcommand{\p}{\mathbb{P}} \newcommand{\one}{\mathds{1}} \newcommand{\0}{\mathcal{O}} \newcommand{\mat}{\textnormal{Mat}} \newcommand{\sign}{\textnormal{sign}} \newcommand{\CP}{\mathcal{P}} \newcommand{\CT}{\mathcal{T}} \newcommand{\CY}{\mathcal{Y}} \newcommand{\F}{\mathcal{F}} \newcommand{\mathds}{\mathbb}[/math]

Conditional probability

Let [math](\Omega,\F,\p)[/math] be a probability space and let [math]A,B\in\F[/math] such that [math]\p[B] \gt 0[/math]. Then the conditional probability[a] of [math]A[/math] given [math]B[/math] is defined as

[[math]] \p[A\mid B]=\frac{\p[A\cap B]}{\p[B]}. [[/math]]

The important fact here is that the application [math]\F\to [0,1][/math], [math]A\mapsto \p[A\mid B][/math] defines a new probability measure on [math]\F[/math] called the conditional probability given [math]B[/math]. There are several facts, which we need to recall:

  • If [math]A_1,...,A_n\in\F[/math] and if [math]\p\left[\bigcap_{k=1}^nA_k\right] \gt 0[/math], then
    [[math]] \p\left[\bigcap_{k=1}^nA_k\right]=\prod_{j=1}^n\p\left[A_j\Big|\bigcap_{k=1}^{j-1}A_k\right]. [[/math]]
  • Let [math](E_n)_{n\geq 1}[/math] be a measurable partition of [math]\Omega[/math], i.e. for all [math]n\geq 1[/math] we have that [math]E_n\in\F[/math] and for [math]n\not=m[/math] we get [math]E_n\cap E_m=\varnothing[/math] and [math]\bigcup_{n\geq 1}E_n=\Omega[/math]. Now for [math]A\in \F[/math] we get
    [[math]] \p[A]=\sum_{n\geq 1}\p[A\mid E_n]\p[E_n]. [[/math]]
  • (Baye's formula)[b] Let [math](E_n)_{n\geq 1}[/math] be a measurable partition of [math]\Omega[/math] and [math]A\in\F[/math] with [math]\p[A] \gt 0[/math]. Then
    [[math]] \p[E_n\mid A]=\frac{\p[A\mid E_n]\p[E_n]}{\sum_{m\geq 1}\p[A\mid E_m]\p[E_m]}. [[/math]]

We can reformulate the definition of the conditional probability to obtain

[[math]] \begin{align*} \p[A\mid B]\p[B]&=\p[A\cap B]\\ \p[B\mid A]\p[A]&=\p[A\cap B] \end{align*} [[/math]]
Therefore one can prove the statements (1) to (3) by using these two equations[c].

Discrete construction of the conditional expectation

Let [math]X[/math] and [math]Y[/math] be two r.v.'s on a probability space [math](\Omega,\F,\p)[/math]. Let [math]Y[/math] take values in [math]\R[/math] and [math]X[/math] take values in a countable discrete set [math]\{x_1,x_2,...,x_n,...\}[/math]. The goal is to describe the expectation of the r.v. [math]Y[/math] by knowing the observed r.v. [math]X[/math]. For instance, let [math]X=x_j\in\{x_1,x_2,...,x_n,...\}[/math]. Therefore we look at a set [math]\{\omega\in\Omega\mid X(\omega)=x_j\}[/math] rather than looking at whole [math]\Omega[/math]. For [math]\Lambda\in\F[/math], we thus define

[[math]] \Q[\Lambda]=\p[\Lambda\mid \{X=x_j\}], [[/math]]

a new probability measure [math]\Q[/math], with [math]\p[X=x_j] \gt 0[/math]. Therefore it makes more sense to compute

[[math]] \E_\Q[Y]=\int_\Omega Y(\omega)d\Q(\omega)=\int_{\{\omega\in\Omega\mid X(\omega)=x_j\}}Y(\omega)d\p(\omega) [[/math]]

rather than

[[math]] \E_\p[Y]=\int_\Omega Y(\omega)d\p(\omega)=\int_\R yd\p_Y(y). [[/math]]

Definition (Conditional expectation ([math]X[/math] discrete, [math]Y[/math] real valued, single value case))

Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math]X:\Omega\to\{x_1,x_2,...,x_n,...\}[/math] be a r.v. taking values in a discrete set and let [math]Y[/math] be a real valued r.v. on that space. If [math]\p[X=x_j] \gt 0[/math], we can define the conditional expectation of [math]Y[/math] given [math]\{X=x_j\}[/math] to be

[[math]] \E[Y\mid X=x_j]=\E_\Q[Y], [[/math]]
where [math]\Q[/math] is the probability measure on [math]\F[/math] defined by

[[math]] \Q[\Lambda]=\p[\Lambda\mid X=x_j], [[/math]]
for [math]\Lambda\in\F[/math], provided that [math]\E_\Q[\vert Y\vert] \lt \infty[/math].

Theorem (Conditional expectation ([math]X[/math] discrete, [math]Y[/math] discrete, single value case))

Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math]X[/math] be a r.v. on that space with values in [math]\{x_1,x_2,...,x_n,...\}[/math] and let [math]Y[/math] also be a r.v. with values in [math]\{y_1,y_2,...,y_n,...\}[/math]. If [math]\p[X=x_j] \gt 0[/math], we can write the conditional expectation of [math]Y[/math] given [math]\{X=x_j\}[/math] as

[[math]] \E[Y\mid X=x_j]=\sum_{k=1}^\infty y_k\p[Y=y_k\mid X=x_j]. [[/math]]
provided that the series is absolutely convergent.


Show Proof

Apply the definitions above to obtain

[[math]] \E[Y\mid X=x_j]=\E_\Q[Y]=\sum_{k=1}^\infty y_k\Q[Y=y_k]=\sum_{k=1}^\infty y_k\p[Y=y_k\mid X=x_j] [[/math]]

Now let again [math]X[/math] be a r.v. with values in [math]\{x_1,x_2,...,x_n,...\}[/math] and [math]Y[/math] a real valued r.v. The next step is to define [math]\E[Y\mid X][/math] as a function [math]f(X)[/math]. Therefore we introduce the function

[[math]] \begin{equation} f:\{x_1,x_2,...,x_n,...\}\to \Rf(x)=\begin{cases}\E[Y\mid X=x],&\p[X=x] \gt 0\\ \text{any value in $\R$},&\p[X=x]=0\end{cases} \end{equation} [[/math]]

It doesn't matter which value we assign to [math]f[/math] for [math]\p[X=x]=0[/math], since it doesn't affect the expectation because it's defined on a null set. For convention we want to assign to it the value 0.
Definition (Conditional expectation ([math]X[/math] discrete, [math]Y[/math] real valued, complete case))

Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math]X[/math] be a countably valued r.v. and let [math]Y[/math] be a real valued r.v. The conditional expectation of [math]Y[/math] given [math]X[/math] is defined by

[[math]] \E[Y\mid X]=f(X), [[/math]]
with [math]f[/math] as in (2), provided that for all [math]j[/math]: if [math]\Q_j[\Lambda]=\p[\Lambda\mid X=x_j][/math], with [math]\p[X=x_j] \gt 0[/math], we get [math]\E_{\Q_j}[\vert Y\vert] \lt \infty[/math].

The above definition does not define [math]\E[Y\mid X][/math] everywhere but rather almost everywhere, since on each set [math]\{X=x\}[/math], where [math]\p[X=x]=0[/math], its value is arbitrary.

Example

Let[d] [math]X\sim\Pi(\lambda)[/math]. Let us consider a tossing game, where we say that when [math]X=n[/math], we do [math]n[/math] independent tossing of a coin where each time one obtains 1 with probability [math]p\in[0,1][/math] and 0 with probability [math]1-p[/math]. Define also [math]S[/math] to be the r.v. giving the total number of 1 obtained in the game. Therefore, if [math]X=n[/math] is given, we get that [math]S[/math] is binomial distributed with parameters [math](p,n)[/math]. We want to compute

  • [math]\E[S\mid X][/math]
  • [math]\E[X\mid S][/math]

It is more natural to ask for the expectation of the amount of 1 obtained for the whole game by knowing how many games were played. The reverse is a bit more difficult. Logically, we may also notice that it definitely doesn't make sense to say [math]S\geq X[/math], because we can not obtain more wins in a game than the amount of games that were played.

  • First we compute [math]\E[S\mid X=n][/math]: If [math]X=n[/math], we know that [math]S[/math] is binomial distributed with parameters [math](p,n)[/math] ([math]S\sim \B(p,n)[/math]) and therefore we already know[e]
    [[math]] \E[S\mid X=n]=pn. [[/math]]
    Now we need to identify the function [math]f[/math] defined as in (2) by
    [[math]] \begin{align*} f:\N&\longrightarrow\R\\ n&\longmapsto pn. \end{align*} [[/math]]
    Therefore we get by definition
    [[math]] \E[S\mid X]=pX. [[/math]]
  • Next we want to compute [math]\E[X\mid S=k][/math]: For [math]n\geq k[/math] we have
    [[math]] \p[X=n\mid S=k]=\frac{\p[S=k\mid X=n]\p[X=n]}{\p[S=k]}=\frac{\binom{n}{k} p^k(1-p)^{n-k}e^{-\lambda}\frac{\lambda^n}{n!}}{\sum_{m=k}^\infty\binom{m}{k}p^k(1-p)^{m-k}e^{-\lambda}\frac{\lambda^m}{m!}}, [[/math]]
    since [math]\{S=k\}=\bigsqcup_{m\geq k}\{S=k,X=m\}[/math]. By some algebra we obtain that
    [[math]] \frac{\binom{n}{k}p^k(1-p)^{n-k}e^{-\lambda}\frac{\lambda^n}{n!}}{\sum_{m=k}^\infty\binom{m}{k}p^k(1-p)^{m-k}e^{-\lambda}\frac{\lambda^m}{m!}}=\frac{(\lambda(1-p))^{n-k}e^{-\lambda(1-p)}}{(n-k)!} [[/math]]
    Hence we get that
    [[math]] \E[X\mid S=k]=\sum_{n\geq k}n\p[X=n\mid S=k]=k+\lambda(1-p). [[/math]]
    Therefore [math]\E[X\mid S]=S+\lambda(1-p)[/math].

Continuous construction of the conditional expectation

Now we want to define [math]\E[Y\mid X][/math], where [math]X[/math] is no longer assumed to be countably valued. Therefore we want to recall the following two facts:

Definition ([math]\sigma[/math]-Algebra generated by a random variable)

Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math]X:(\Omega,\F,\p)\to (\R^n,\B(\R^n),\lambda)[/math] be a r.v. on that space. The [math]\sigma[/math]-Algebra generated by [math]X[/math] is given by

[[math]] \sigma(X)=X^{-1}(\B(\R^n))=\{A\in\Omega\mid A=X^{-1}(B),B\in\B(\R^n)\}. [[/math]]

Theorem

Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math]X:(\Omega,\F,\p)\to(\R^n,\B(\R^n),\lambda)[/math] be a r.v. on that space and let [math]Y[/math] be a real valued r.v. on that space. [math]Y[/math] is measurable with respect to [math]\sigma(X)[/math] if and only if there exists a Borel measurable function [math]f:\R^n\to\R[/math] such that

[[math]] Y=f(X). [[/math]]

We want to make use of the fact that for the Hilbert space [math]L^2(\Omega,\F,\p)[/math] we get that [math]L^2(\Omega,\sigma(X),\p)\subset L^2(\Omega,\F,\p)[/math] is a complete subspace, since [math]\sigma(X)\subset\F[/math]. This allows us to use the orthogonal projections and to interpret the conditional expectation as such a projection.
Definition (Conditional expectation (as a projection onto a closed subspace))

Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math]Y\in L^2(\Omega,\F,\p)[/math]. Then the conditional expectation of [math]Y[/math] given [math]X[/math] is the unique element [math]\hat Y\in L^2(\Omega,\sigma(X),\p)[/math] such that for all [math]Z\in L^2(\Omega,\sigma(X),\p)[/math]

[[math]] \begin{equation} \E[YZ]=\E[\hat Y Z]. \end{equation} [[/math]]
This result is due to the fact that if [math]Y-\hat Y\in L^2(\Omega,\F,\p)[/math] then for all [math]Z\in L^2(\Omega,\sigma(X),\p)[/math] we get [math]\langle Y-\hat Y,Z\rangle=0[/math]. We write [math]\E[Y\mid X][/math] for [math]\hat Y[/math].

[math]\hat Y[/math] is the orthogonal projection of [math]Y[/math] onto [math]L^2(\Omega,\sigma(X),\p)[/math].

Since [math]X[/math] takes values in [math]\R^n[/math], there exists a Borel measurable function [math]f:\R^n\to\R[/math] such that

[[math]] \E[Y\mid X]=f(X) [[/math]]
with [math]\E[f^2(X)] \lt \infty[/math]. We can also rewrite (3) as: for all Borel measurable [math]g:\R^n\to\R[/math], such that [math]\E[g^2(X)] \lt \infty[/math], we get

[[math]] \E[Yg(X)]=\E[f(X)g(X)]. [[/math]]

Now let [math]\mathcal{G}\subset\F[/math] be a sub [math]\sigma[/math]-Algebra of [math]\F[/math] and consider the space [math]L^2(\Omega,\mathcal{G},\p)\subset L^2(\Omega,\F,\p)[/math]. It is clear that [math]L^2(\Omega,\mathcal{G},\p)[/math] is a Hilbert space and thus we can project to it.

Definition (Conditional expectation (projection case))

Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math]Y\in L^2(\Omega,\F,\p)[/math] and let [math]\mathcal{G}\subset\F[/math] be a sub [math]\sigma[/math]-Algebra of [math]\F[/math]. Then the conditional expectation of [math]Y[/math] given [math]\mathcal{G}[/math] is defined as the unique element [math]\E[Y\mid \mathcal{G}]\in L^2(\Omega,\mathcal{G},\p)[/math] such that for all [math]Z\in L^2(\Omega,\mathcal{G},\p)[/math]

[[math]] \begin{equation} \label{4} \E[YZ]=\E[\E[Y\mid \mathcal{G}]Z]. \end{equation} [[/math]]

In (3) or (1), it is enough[f] to restrict the test r.v. [math]Z[/math] to the class of r.v.'s of the form

[[math]] Z=\one_A,A\in\mathcal{G}. [[/math]]

The conditional expectation is in [math]L^2[/math], so it's only defined a.s. and not everywhere in a unique way. So in particular, any statement like [math]\E[Y\mid\mathcal{G}]\geq0[/math] or [math]\E[Y\mid \mathcal{G}]=Z[/math] has to be understood with an implicit a.s.

Theorem

Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math]Y\in L^2(\Omega,\F,\p)[/math] and let [math]\mathcal{G}\subset \F[/math] be a sub [math]\sigma[/math]-Algebra of [math]\F[/math].

  • If [math]Y\geq 0[/math], then [math]\E[Y\mid \mathcal{G}]\geq 0[/math]
  • [math]\E[\E[Y\mid\mathcal{G}]]=\E[Y][/math]
  • The map [math]Y\mapsto\E[Y\mid\mathcal{G}][/math] is linear.


Show Proof

For [math](i)[/math] take [math]Z=\one_{\{\E[Y\mid\mathcal{G}] \lt 0\}}[/math] to obtain

[[math]] \underbrace{\E[YZ]}_{\geq 0}=\underbrace{\E[\E[Y\mid \mathcal{G}]Z]}_{\leq 0}. [[/math]]
This implies that [math]\p[\E[Y\mid \mathcal{G}] \lt 0]=0[/math]. For [math](ii)[/math] take [math]Z=\one_{\Omega}[/math] and plug into (4). For [math](iii)[/math] notice that linearity comes from the orthogonal projection operator. But we can also do it directly by taking [math]Y,Y'\in L^2(\Omega,\F,\p)[/math], [math]\alpha,\beta\in \R[/math] and [math]Z\in L^2(\Omega,\mathcal{G},\p)[/math] to obtain

[[math]] \E[(\alpha Y+\beta Y')Z]=\E[YZ]+\beta\E[Y'Z]=\alpha\E[\E[Y\mid\mathcal{G}]Z]+\beta\E[\E[Y'\mid\mathcal{G}]Z]=\E[(\alpha\E[Y\mid\mathcal{G}]+\beta\E[Y'\mid\mathcal{G}])Z]. [[/math]]
Now we can conclude by using the uniqueness property that

[[math]] \E[\alpha Y+\beta Y'\mid \mathcal{G}]=\alpha\E[Y\mid \mathcal{G}]+\beta\E[Y'\mid \mathcal{G}]. [[/math]]

Now we want to extend the definition of the conditional expectation to r.v.'s in [math]L^1(\Omega,\F,\p)[/math] or to [math]L^+(\Omega,\F,\p)[/math], which is the space of non negative r.v.'s allowing the value [math]\infty[/math].

Lemma

Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math]Y\in L^+(\Omega,\F,\p)[/math] and let [math]\mathcal{G}\subset \F[/math] be a sub [math]\sigma[/math]-Algebra of [math]\F[/math]. Then there exists a unique element [math]\E[Y\mid \mathcal{G}]\in L^+(\Omega,\mathcal{G},\p)[/math] such that for all [math]X\in L^+(\Omega,\mathcal{G},\p)[/math]

[[math]] \begin{equation} \E[YX]=\E[\E[Y\mid \mathcal{G}]X] \end{equation} [[/math]]
and this conditional expectation agrees with the previous definition when [math]Y\in L^2(\Omega,\F,\p)[/math]. Moreover, if [math]0\leq Y\leq Y'[/math], then

[[math]] \E[Y\mid \mathcal{G}]\leq \E[Y'\mid \mathcal{G}]. [[/math]]

Show Proof

If [math]Y\leq 0[/math] and [math]Y\in L^2(\Omega,\F,\p)[/math], then we define [math]\E[Y\mid\mathcal{G}][/math] as before. If [math]X\in L^+(\Omega,\mathcal{G},\p)[/math], we get that [math]X_n=X\land n[/math], is in [math]L^2(\Omega,\mathcal{G},\p)[/math] and is positive with [math]X_n\uparrow X[/math] for [math]n\to\infty[/math]. Using the monotone convergence theorem we get

[[math]] \E[YX]=\E[Y\lim_{n\to\infty}X_n]=\lim_{n\to\infty}\E[YX_n]=\lim_{n\to\infty}\E[\E[Y\mid\mathcal{G}]X_n]=\E[\E[Y\mid\mathcal{G}]\lim_{n\to\infty}X]=\E[\E[Y\mid\mathcal{G}]X]. [[/math]]
This shows that (5) is true whenever [math]Y\in L^2(\Omega,\F,\p)[/math] with [math]Y\geq 0[/math] and [math]X\in L^+(\Omega,\mathcal{G},\p)[/math]. Now let [math]Y\in L^1(\Omega,\F,\p)[/math]. Define [math]Y_m=Y\land m[/math]. Hence we get [math]Y_m\in L^2(\Omega,\F,\p)[/math] and [math]Y_m\uparrow Y[/math] as [math]n\to\infty[/math]. Each [math]\E[Y_m\mid\mathcal{G}][/math] is well defined[g] and positive and increasing. We define

[[math]] \E[Y\mid\mathcal{G}]=\lim_{n\to\infty}\E[Y_m\mid \mathcal{G}]. [[/math]]
Several applications of the monotone convergence theorem will give us for [math]X\in L^+(\Omega,\mathcal{G},\p)[/math]

[[math]] \E[YX]=\lim_{m\to\infty}\E[Y_mX]=\lim_{m\to\infty}\E[\E[Y_m\mid\mathcal{G}]X]=\E[\E[Y\mid \mathcal{G}]X]. [[/math]]
Furthermore if [math]0\leq Y\leq Y'[/math], then [math]Y\land m\leq Y'\land m[/math] and therefore

[[math]] \E[Y\mid\mathcal{G}]\leq \E[Y'\mid\mathcal{G}]. [[/math]]
Now we need to show uniqueness[h]. Let [math]U[/math] and [math]V[/math] be two versions of [math]\E[Y\mid \mathcal{G}][/math]. Let

[[math]] \Lambda_n=\{U \lt V\leq n\}\in\mathcal{G} [[/math]]
and assume [math]\p[\Lambda_n] \gt 0[/math]. We then have

[[math]] \E[Y\one_{\Lambda_n}]=\underbrace{\E[U\one_{\Lambda_n}]=\E[V\one_{\Lambda_n}]}_{\E[(U-V)\one_{\Lambda_n}]=0}. [[/math]]
This contradicts the fact that [math]\p[\Lambda_n] \gt 0[/math]. Moreover, [math]\{U \lt V\}=\bigcup_{n\geq 1}\Lambda_n[/math] and therefore

[[math]] \p[U \lt V]=0 [[/math]]
and similarly [math]\p[V \lt U]=0[/math]. This implies

[[math]] \p[U=V]=1. [[/math]]

Theorem

Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math]Y\in L^1(\Omega,\F,\p)[/math] and let [math]\mathcal{G}\subset\F[/math] be a sub [math]\sigma[/math]-Algebra of [math]\F[/math]. Then there exists a unique element [math]\E[Y\mid \mathcal{G}]\in L^1(\Omega,\mathcal{G},\p)[/math] such that for every [math]X[/math] bounded and [math]\mathcal{G}[/math]-measurable

[[math]] \begin{equation} \E[YX]=\E[\E[Y\mid \mathcal{G}]X]. \end{equation} [[/math]]
This conditional expectation agrees with the definition for the [math]L^2[/math]. Moreover it satisfies:

  • If [math]Y\geq 0[/math], then [math]\E[Y\mid\mathcal{G}]\geq 0[/math]
  • The map [math]Y\mapsto \E[Y\mid\mathcal{G}][/math] is linear.


Show Proof

We will only prove the existence, since the rest is exactly the same as before. Write [math]Y=Y^+-Y^-[/math] with [math]Y^+,Y^-\in L^1(\Omega,\F,\p)[/math] and [math]Y^+,Y^-\geq 0[/math]. So [math]\E[Y^+\mid\mathcal{G}][/math] and [math]\E[Y^-\mid\mathcal{G}][/math] are well defined. Now we set

[[math]] \E[Y\mid\mathcal{G}]=\E[Y^+\mid \mathcal{G}]-\E[Y^-\mid\mathcal{G}]. [[/math]]
This is well defined because

[[math]] \E[\E[Y^\pm\mid\mathcal{G}]]=\E[Y^\pm] \lt \infty [[/math]]
if we let [math]X=\one_\Omega[/math] in the previous lemma and therefore [math]\E[Y^+\mid\mathcal{G}][/math] and [math]\E[Y^-\mid \mathcal{G}]\in L^1(\Omega,\mathcal{G},\p)[/math]. For all [math]X[/math] bounded and [math]\mathcal{G}[/math]-measurable we can also write [math]X=X^+-X^-[/math] and it follows from the previous lemma that

[[math]] \E[\E[Y^\pm\mid\mathcal{G}]X]=\E[Y^\pm X]. [[/math]]
This implies that [math]\E[Y\mid\mathcal{G}][/math] satisfies (6).

Corollary

Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math]X\in L^1(\Omega,\F,\p)[/math] be a r.v. on that space. Then

[[math]] \E[\E[X\mid\mathcal{G}]]=\E[X]. [[/math]]


Show Proof

Take equation (4) and set [math]Z=\one_\Omega[/math].

Corollary

Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math]X\in L^1(\Omega,\F,\p)[/math] be a r.v. on that space. Then

[[math]] \vert\E[X\mid\mathcal{G}]\vert\leq \E[\vert X\vert\mid\mathcal{G}]. [[/math]]
In particular

[[math]] \E[\vert\E[X\mid\mathcal{G}]\vert]\leq \E[\vert X\vert]. [[/math]]


Show Proof

We can always write [math]X=X^+-X^-[/math] and also [math]\vert X\vert=X^++X^-[/math]. Therefore we get

[[math]] \vert\E[X\mid\mathcal{G}]\vert=\vert\E[X^+\mid\mathcal{G}]-\E[X^-\mid\mathcal{G}]\vert\leq \E[X^+\mid\mathcal{G}]+\E[X^-\mid\mathcal{G}]=\E[X^++X^-\mid\mathcal{G}]=\E[\vert X\vert\mid\mathcal{G}]. [[/math]]

Proposition

Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math]Y\in L^1(\Omega,\F,\p)[/math] be a r.v. on that space and assume that [math]Y[/math] is independent of the sub [math]\sigma[/math]-Algebra [math]\mathcal{G}\subset\F[/math], i.e. [math]\sigma(Y)[/math] is independent of [math]\mathcal{G}[/math]. Then

[[math]] \E[Y\mid\mathcal{G}]=\E[Y]. [[/math]]


Show Proof

Let [math]Z[/math] be a bounded and [math]\mathcal{G}[/math]-measurable r.v. and therefore [math]Y[/math] and [math]Z[/math] are independent. Hence we get

[[math]] \E[YZ]=\E[Y]\E[Z]=\E[\E[Y]Z]. [[/math]]
This implies that, since [math]\E[Y][/math] is constant, that [math]\E[Y]\in L^1(\Omega,\mathcal{G},\p)[/math] and satisfies (4). Therefore by uniqueness we get that [math]\E[Y\mid\mathcal{G}]=\E[Y][/math].

Theorem

Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math]X[/math] and [math]Y[/math] be two r.v.'s on that space and let [math]\mathcal{G}\subset\F[/math] be a sub [math]\sigma[/math]-Algebra of [math]\F[/math]. Assume further that at least one of these two holds:

  • [math]X,Y[/math] and [math]XY[/math] are in [math]L^1(\Omega,\F,\p)[/math] with [math]X[/math] being [math]\mathcal{G}[/math]-measurable.
  • [math]X\geq 0[/math], [math]Y\geq 0[/math] with [math]X[/math] being [math]\mathcal{G}[/math]-mearuable.

Then

[[math]] \E[XY\mid\mathcal{G}]=\E[Y\mid\mathcal{G}]X. [[/math]]
In particular, if [math]X[/math] is a positive r.v. or in [math]L^1(\Omega,\mathcal{G},\p)[/math] and [math]\mathcal{G}[/math]-measurable, then

[[math]] \E[X\mid\mathcal{G}]=X. [[/math]]

Show Proof

For [math](ii)[/math] assume first that [math]X,Y\leq 0[/math]. Let [math]Z[/math] be a positive and [math]\mathcal{G}[/math]-measurable r.v. Then we can obtain

[[math]] \E[(XY)Z]=\E[Y(XZ)]=\E[\E[Y\mid\mathcal{G}]XZ]=\E[(\E[Y\mid\mathcal{G}]X)Z]. [[/math]]
Note that [math]\E[(\E[Y\mid\mathcal{G}]X)Z][/math] is a positive r.v. and [math]\mathcal{G}[/math]-measurable. Hence [math]\E[XY\mid\mathcal{G}]=X\E[Y\mid\mathcal{G}][/math]. For [math](i)[/math] we can write [math]X=X^++X^-[/math] and use [math](ii)[/math]. This is an easy exercise.

Next we want to show that the classical limit theorems from measure theory also make sense in terms of the conditional expectation[i].
Theorem (Limit theorems for the conditional expectation)

Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math](Y_n)_{n\geq 1}[/math] be a sequence of r.v.'s on that space and let [math]\mathcal{G}\subset\F[/math] be a sub [math]\sigma[/math]-Algebra of [math]\F[/math]. Then we have:

  • (Monotone convergence) Assume that [math](Y_n)_{n\geq 1}[/math] is a sequence of positive r.v.'s for all [math]n[/math] such that [math]\lim_{n\to\infty}\uparrow Y_n=Y[/math] a.s. Then
    [[math]] \lim_{n\to\infty}\E[Y_n\mid\mathcal{G}]=\E[Y\mid \mathcal{G}]. [[/math]]
  • (Fatou) Assume that [math](Y_n)_{n\geq 1}[/math] is a sequence of positive r.v.'s for all [math]n[/math]. Then
    [[math]] \E[\liminf_n Y_n\mid\mathcal{G}]=\liminf_n\E[Y_n\mid\mathcal{G}]. [[/math]]
  • (Dominated convergence) Assume that [math]Y_n\xrightarrow{n\to\infty}Y[/math] a.s. and that there exists [math]Z\in L^1(\Omega,\F,\p)[/math] such that [math]\vert Y_n\vert\leq Z[/math] for all [math]n[/math]. Then
    [[math]] \lim_{n\to\infty}\E[Y_n\mid \mathcal{G}]=\E[Y\mid\mathcal{G}]. [[/math]]


Show Proof

We will only prove [math](i)[/math], since [math](ii)[/math] and [math](iii)[/math] are proved in a similar way (it's a good exercise to do the proof). Since [math](Y_n)_{n\geq 1}[/math] is an increasing sequence, it follows that

[[math]] \E[Y_{n+1}\mid\mathcal{G}]\geq \E[Y_n\mid\mathcal{G}]. [[/math]]
Hence we can deduce that [math]\lim_{n\to\infty}\uparrow \E[Y_n\mid\mathcal{G}][/math] exists and we denote it by [math]Y'[/math]. Moreover, note that [math]Y'[/math] is [math]\mathcal{G}[/math]-measurable, since it is a limit of [math]\mathcal{G}[/math]-measurable r.v.'s. Let [math]X[/math] be a positive and [math]\mathcal{G}[/math]-measurable r.v. and obtain then

[[math]] \E[Y'X]=\E[\lim_{n\to\infty}\E[Y_n\mid\mathcal{G}]X]=\lim_{n\to\infty}\uparrow\E[\E[Y_n\mid\mathcal{G}]X]=\lim_{n\to\infty}\E[Y_n X]=\E[YX], [[/math]]
where we have used monotone convergence twice and equation (4). Therefore we get

[[math]] \lim_{n\to\infty}\E[Y_n\mid\mathcal{G}]=\E[Y\mid\mathcal{G}]. [[/math]]

Theorem (Jensen's inequality)

Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math]\varphi:\R\to\R[/math] be a real, convex function. Let [math]X \in L^1(\Omega,\F,\p)[/math] such that [math]\varphi(X)\in L^1(\Omega,\F,\p)[/math]. Then

[[math]] \varphi(\E[X\mid\mathcal{G}])\leq \E[\varphi(X)\mid\mathcal{G}] [[/math]]
for all sub [math]\sigma[/math]-Algebras [math]\mathcal{G}\subset\F[/math].


Show Proof

Exercise.

Example


Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math]\varphi(X)=X^2[/math] and let [math]X\in L^2(\Omega,\F,\p)[/math]. Then

[[math]] (\E[X\mid \mathcal{G}])^2\leq \E[X^2\mid\mathcal{G}] [[/math]]

for all sub [math]\sigma[/math]-Algebras [math]\mathcal{G}\subset \F[/math].

Theorem (Tower property)

Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math]X\in L^1(\Omega,\F,\p)[/math] be a positive r.v. on that space. Let [math]\mathcal{C}\subset\mathcal{G}\subset \F[/math] be a tower of sub [math]\sigma[/math]-Algebras of [math]\F[/math]. Then

[[math]] \E[\E[X\mid\mathcal{G}]\mathcal{C}]=\E[X\mid\mathcal{C}]. [[/math]]

Show Proof

Let [math]Z[/math] be a bounded and [math]\mathcal{C}[/math]-measurable r.v. Then we obtain

[[math]] \E[XZ]=\E[\E[X\mid\mathcal{C}]Z]. [[/math]]
But [math]Z[/math] is also [math]\mathcal{G}[/math]-measurable and hence we get

[[math]] \E[XZ]=\E[\E[X\mid\mathcal{G}]Z]. [[/math]]
Therefore, for all [math]Z[/math] bounded and [math]\mathcal{C}[/math]-measurable r.v.'s, we get

[[math]] \E[\E[X\mid\mathcal{G}]Z]=\E[\E[X\mid\mathcal{C}]Z] [[/math]]
and thus

[[math]] \E[\E[X\mid\mathcal{G}]\mathcal{C}]=\E[X\mid\mathcal{C}]. [[/math]]

General references

Moshayedi, Nima (2020). "Lectures on Probability Theory". arXiv:2010.16280 [math.PR].

Notes

  1. One can look it up for more details in the stochastics I part.
  2. Use the previous facts for the proof of Baye's formula. One can also look it up in the stochastics I part.
  3. One also has to notice that if [math]A[/math] and [math]B[/math] are two independent events, then [math]\p[A\mid B]=\frac{\p[A\cap B]}{\p[B]}=\frac{\p[A]\p[B]}{\p[B]}=\p[A][/math]
  4. Recall that this means that [math]X[/math] is Poisson distributed: [math]\p[X=k]=e^{-\lambda}\frac{\lambda^k}{k!}[/math] for [math]k\in\N[/math]
  5. If [math]X\sim \B(p,n)[/math] then [math]\E[X]=pn[/math]. For further calculation, one can look it up in the stochastics I notes
  6. Since we can always consider linear combinations of [math]\one_A[/math] and then apply density theorems to it
  7. because for [math]Y\in L^2[/math] and [math]U\in L^2[/math] we get [math]Y\geq U\Longrightarrow Y-U\geq 0\Longrightarrow \E[Y\mid\mathcal{G}]\geq \E[U\mid\mathcal{G}][/math]
  8. Note that for any [math]W\in L^+[/math], the set [math]E[/math] on which [math]W=\infty[/math] is a null set. For suppose not, then [math]\E[W]\geq \E[\infty \one_E]=\infty\p[E][/math]. But since [math]\p[E] \gt 0[/math] this cannot happen
  9. Recall the classical limit theorems for integrals: [math]Monotone[/math] [math]convergence:[/math] Let [math](f_n)_{n\geq 1}[/math] be an increasing sequence of positive and measurable functions and let [math]f=\lim_{n\to\infty}\uparrow f_n[/math]. Then [math]\int fd\mu=\lim_{n\to\infty}f_nd\mu[/math]. [math]Fatou:[/math] Let [math](f_n)_{n\geq 1}[/math] be a sequence of measurable and positive functions. Then [math]\int\liminf_n f_n d\mu\leq \liminf_n \int f_nd\mu[/math]. [math]Dominated[/math] [math]convergence:[/math] Let [math](f_n)_{n\geq 1}[/math] be a sequence of integrable functions with [math]\vert f_n\vert\leq g[/math] for all [math]n[/math] with [math]g[/math] integrable. Denote [math]f=\lim_{n\to\infty}f_n[/math]. Then [math]\lim_{n\to\infty}\int f_nd\mu=\int fd\mu[/math]