guide:35e8e36b92: Difference between revisions
No edit summary |
No edit summary |
||
Line 1: | Line 1: | ||
<div class="d-none"><math> | |||
\newcommand{\R}{\mathbb{R}} | |||
\newcommand{\A}{\mathcal{A}} | |||
\newcommand{\B}{\mathcal{B}} | |||
\newcommand{\N}{\mathbb{N}} | |||
\newcommand{\C}{\mathbb{C}} | |||
\newcommand{\Rbar}{\overline{\mathbb{R}}} | |||
\newcommand{\Bbar}{\overline{\mathcal{B}}} | |||
\newcommand{\Q}{\mathbb{Q}} | |||
\newcommand{\E}{\mathbb{E}} | |||
\newcommand{\p}{\mathbb{P}} | |||
\newcommand{\one}{\mathds{1}} | |||
\newcommand{\0}{\mathcal{O}} | |||
\newcommand{\mat}{\textnormal{Mat}} | |||
\newcommand{\sign}{\textnormal{sign}} | |||
\newcommand{\CP}{\mathcal{P}} | |||
\newcommand{\CT}{\mathcal{T}} | |||
\newcommand{\CY}{\mathcal{Y}} | |||
\newcommand{\F}{\mathcal{F}} | |||
\newcommand{\mathds}{\mathbb}</math></div> | |||
===Conditional probability=== | |||
Let <math>(\Omega,\F,\p)</math> be a probability space and let <math>A,B\in\F</math> such that <math>\p[B] > 0</math>. Then the conditional probability{{efn|One can look it up for more details in the stochastics I part.}} of <math>A</math> given <math>B</math> is defined as | |||
<math display="block"> | |||
\p[A\mid B]=\frac{\p[A\cap B]}{\p[B]}. | |||
</math> | |||
The important fact here is that the application <math>\F\to [0,1]</math>, <math>A\mapsto \p[A\mid B]</math> defines a new probability measure on <math>\F</math> called the conditional probability given <math>B</math>. There are several facts, which we need to recall: | |||
<ul style{{=}}"list-style-type:lower-roman"><li>If <math>A_1,...,A_n\in\F</math> and if <math>\p\left[\bigcap_{k=1}^nA_k\right] > 0</math>, then | |||
<math display="block"> | |||
\p\left[\bigcap_{k=1}^nA_k\right]=\prod_{j=1}^n\p\left[A_j\Big|\bigcap_{k=1}^{j-1}A_k\right]. | |||
</math> | |||
</li> | |||
<li>Let <math>(E_n)_{n\geq 1}</math> be a measurable partition of <math>\Omega</math>, i.e. for all <math>n\geq 1</math> we have that <math>E_n\in\F</math> and for <math>n\not=m</math> we get <math>E_n\cap E_m=\varnothing</math> and <math>\bigcup_{n\geq 1}E_n=\Omega</math>. Now for <math>A\in \F</math> we get | |||
<math display="block"> | |||
\p[A]=\sum_{n\geq 1}\p[A\mid E_n]\p[E_n]. | |||
</math> | |||
</li> | |||
<li>(Baye's formula){{efn|Use the previous facts for the proof of Baye's formula. One can also look it up in the stochastics I part.}} Let <math>(E_n)_{n\geq 1}</math> be a measurable partition of <math>\Omega</math> and <math>A\in\F</math> with <math>\p[A] > 0</math>. Then | |||
<math display="block"> | |||
\p[E_n\mid A]=\frac{\p[A\mid E_n]\p[E_n]}{\sum_{m\geq 1}\p[A\mid E_m]\p[E_m]}. | |||
</math> | |||
</li> | |||
</ul> | |||
{{alert-info | | |||
We can reformulate the definition of the conditional probability to obtain | |||
<math display="block"> | |||
\begin{align*} | |||
\p[A\mid B]\p[B]&=\p[A\cap B]\\ | |||
\p[B\mid A]\p[A]&=\p[A\cap B] | |||
\end{align*} | |||
</math> | |||
Therefore one can prove the statements (1) to (3) by using these two equations{{efn|One also has to notice that if <math>A</math> and <math>B</math> are two independent events, then <math>\p[A\mid B]=\frac{\p[A\cap B]}{\p[B]}=\frac{\p[A]\p[B]}{\p[B]}=\p[A]</math>}}. | |||
}} | |||
===Discrete construction of the conditional expectation=== | |||
Let <math>X</math> and <math>Y</math> be two r.v.'s on a probability space <math>(\Omega,\F,\p)</math>. Let <math>Y</math> take values in <math>\R</math> and <math>X</math> take values in a countable discrete set <math>\{x_1,x_2,...,x_n,...\}</math>. The goal is to describe the expectation of the r.v. <math>Y</math> by knowing the observed r.v. <math>X</math>. For instance, let <math>X=x_j\in\{x_1,x_2,...,x_n,...\}</math>. Therefore we look at a set <math>\{\omega\in\Omega\mid X(\omega)=x_j\}</math> rather than looking at whole <math>\Omega</math>. For <math>\Lambda\in\F</math>, we thus define | |||
<math display="block"> | |||
\Q[\Lambda]=\p[\Lambda\mid \{X=x_j\}], | |||
</math> | |||
a new probability measure <math>\Q</math>, with <math>\p[X=x_j] > 0</math>. Therefore it makes more sense to compute | |||
<math display="block"> | |||
\E_\Q[Y]=\int_\Omega Y(\omega)d\Q(\omega)=\int_{\{\omega\in\Omega\mid X(\omega)=x_j\}}Y(\omega)d\p(\omega) | |||
</math> | |||
rather than | |||
<math display="block"> | |||
\E_\p[Y]=\int_\Omega Y(\omega)d\p(\omega)=\int_\R yd\p_Y(y). | |||
</math> | |||
{{definitioncard|Conditional expectation (<math>X</math> discrete, <math>Y</math> real valued, single value case)|Let <math>(\Omega,\F,\p)</math> be a probability space. Let <math>X:\Omega\to\{x_1,x_2,...,x_n,...\}</math> be a r.v. taking values in a discrete set and let <math>Y</math> be a real valued r.v. on that space. If <math>\p[X=x_j] > 0</math>, we can define the conditional expectation of <math>Y</math> given <math>\{X=x_j\}</math> to be | |||
<math display="block"> | |||
\E[Y\mid X=x_j]=\E_\Q[Y], | |||
</math> | |||
where <math>\Q</math> is the probability measure on <math>\F</math> defined by | |||
<math display="block"> | |||
\Q[\Lambda]=\p[\Lambda\mid X=x_j], | |||
</math> | |||
for <math>\Lambda\in\F</math>, provided that <math>\E_\Q[\vert Y\vert] < \infty</math>.}} | |||
{{proofcard|Theorem (Conditional expectation (<math>X</math> discrete, <math>Y</math> discrete, single value case))|thm-1|Let <math>(\Omega,\F,\p)</math> be a probability space. Let <math>X</math> be a r.v. on that space with values in <math>\{x_1,x_2,...,x_n,...\}</math> and let <math>Y</math> also be a r.v. with values in <math>\{y_1,y_2,...,y_n,...\}</math>. If <math>\p[X=x_j] > 0</math>, we can write the conditional expectation of <math>Y</math> given <math>\{X=x_j\}</math> as | |||
<math display="block"> | |||
\E[Y\mid X=x_j]=\sum_{k=1}^\infty y_k\p[Y=y_k\mid X=x_j]. | |||
</math> | |||
provided that the series is absolutely convergent. | |||
|Apply the definitions above to obtain | |||
<math display="block"> | |||
\E[Y\mid X=x_j]=\E_\Q[Y]=\sum_{k=1}^\infty y_k\Q[Y=y_k]=\sum_{k=1}^\infty y_k\p[Y=y_k\mid X=x_j] | |||
</math>}} | |||
Now let again <math>X</math> be a r.v. with values in <math>\{x_1,x_2,...,x_n,...\}</math> and <math>Y</math> a real valued r.v. The next step is to define <math>\E[Y\mid X]</math> as a function <math>f(X)</math>. Therefore we introduce the function | |||
<math display="block"> | |||
\begin{equation} | |||
f:\{x_1,x_2,...,x_n,...\}\to \Rf(x)=\begin{cases}\E[Y\mid X=x],&\p[X=x] > 0\\ \text{any value in $\R$},&\p[X=x]=0\end{cases} | |||
\end{equation} | |||
</math> | |||
{{alert-info | | |||
It doesn't matter which value we assign to <math>f</math> for <math>\p[X=x]=0</math>, since it doesn't affect the expectation because it's defined on a null set. For convention we want to assign to it the value 0. | |||
}} | |||
{{definitioncard|Conditional expectation (<math>X</math> discrete, <math>Y</math> real valued, complete case)|Let <math>(\Omega,\F,\p)</math> be a probability space. Let <math>X</math> be a countably valued r.v. and let <math>Y</math> be a real valued r.v. The conditional expectation of <math>Y</math> given <math>X</math> is defined by | |||
<math display="block"> | |||
\E[Y\mid X]=f(X), | |||
</math> | |||
with <math>f</math> as in (2), provided that for all <math>j</math>: if <math>\Q_j[\Lambda]=\p[\Lambda\mid X=x_j]</math>, with <math>\p[X=x_j] > 0</math>, we get <math>\E_{\Q_j}[\vert Y\vert] < \infty</math>.}} | |||
{{alert-info | | |||
The above definition does not define <math>\E[Y\mid X]</math> everywhere but rather almost everywhere, since on each set <math>\{X=x\}</math>, where <math>\p[X=x]=0</math>, its value is arbitrary. | |||
}} | |||
'''Example''' | |||
Let{{efn|Recall that this means that <math>X</math> is Poisson distributed: <math>\p[X=k]=e^{-\lambda}\frac{\lambda^k}{k!}</math> for <math>k\in\N</math>}} <math>X\sim\Pi(\lambda)</math>. Let us consider a tossing game, where we say that when <math>X=n</math>, we do <math>n</math> independent tossing of a coin where each time one obtains 1 with probability <math>p\in[0,1]</math> and 0 with probability <math>1-p</math>. Define also <math>S</math> to be the r.v. giving the total number of 1 obtained in the game. Therefore, if <math>X=n</math> is given, we get that <math>S</math> is binomial distributed with parameters <math>(p,n)</math>. We want to compute | |||
<ul style{{=}}"list-style-type:lower-roman"><li><math>\E[S\mid X]</math> | |||
</li> | |||
<li><math>\E[X\mid S]</math> | |||
</li> | |||
</ul> | |||
{{alert-info | | |||
It is more natural to ask for the expectation of the amount of 1 obtained for the whole game by knowing how many games were played. The reverse is a bit more difficult. Logically, we may also notice that it definitely doesn't make sense to say <math>S\geq X</math>, because we can not obtain more wins in a game than the amount of games that were played. | |||
}} | |||
<ul style{{=}}"list-style-type:lower-roman"><li>First we compute <math>\E[S\mid X=n]</math>: If <math>X=n</math>, we know that <math>S</math> is binomial distributed with parameters <math>(p,n)</math> (<math>S\sim \B(p,n)</math>) and therefore we already know{{efn|If <math>X\sim \B(p,n)</math> then <math>\E[X]=pn</math>. For further calculation, one can look it up in the stochastics I notes}} | |||
<math display="block"> | |||
\E[S\mid X=n]=pn. | |||
</math> | |||
Now we need to identify the function <math>f</math> defined as in (2) by | |||
<math display="block"> | |||
\begin{align*} | |||
f:\N&\longrightarrow\R\\ | |||
n&\longmapsto pn. | |||
\end{align*} | |||
</math> | |||
Therefore we get by definition | |||
<math display="block"> | |||
\E[S\mid X]=pX. | |||
</math> | |||
</li> | |||
<li> | |||
Next we want to compute <math>\E[X\mid S=k]</math>: For <math>n\geq k</math> we have | |||
<math display="block"> | |||
\p[X=n\mid S=k]=\frac{\p[S=k\mid X=n]\p[X=n]}{\p[S=k]}=\frac{\begin{binom}nk\end{binom}p^k(1-p)^{n-k}e^{-\lambda}\frac{\lambda^n}{n!}}{\sum_{m=k}^\infty\begin{binom}mk\end{binom}p^k(1-p)^{m-k}e^{-\lambda}\frac{\lambda^m}{m!}}, | |||
</math> | |||
since <math>\{S=k\}=\bigsqcup_{m\geq k}\{S=k,X=m\}</math>. By some algebra we obtain that | |||
<math display="block"> | |||
\frac{\begin{binom}nk\end{binom}p^k(1-p)^{n-k}e^{-\lambda}\frac{\lambda^n}{n!}}{\sum_{m=k}^\infty\begin{binom}mk\end{binom}p^k(1-p)^{m-k}e^{-\lambda}\frac{\lambda^m}{m!}}=\frac{(\lambda(1-p))^{n-k}e^{-\lambda(1-p)}}{(n-k)!} | |||
</math> | |||
Hence we get that | |||
<math display="block"> | |||
\E[X\mid S=k]=\sum_{n\geq k}n\p[X=n\mid S=k]=k+\lambda(1-p). | |||
</math> | |||
Therefore <math>\E[X\mid S]=S+\lambda(1-p)</math>. | |||
</li> | |||
</ul> | |||
===Continuous construction of the conditional expectation=== | |||
Now we want to define <math>\E[Y\mid X]</math>, where <math>X</math> is no longer assumed to be countably valued. Therefore we want to recall the following two facts: | |||
{{definitioncard|<math>\sigma</math>-Algebra generated by a random variable|Let <math>(\Omega,\F,\p)</math> be a probability space. Let <math>X:(\Omega,\F,\p)\to (\R^n,\B(\R^n),\lambda)</math> be a r.v. on that space. The <math>\sigma</math>-Algebra generated by <math>X</math> is given by | |||
<math display="block"> | |||
\sigma(X)=X^{-1}(\B(\R^n))=\{A\in\Omega\mid A=X^{-1}(B),B\in\B(\R^n)\}. | |||
</math>}} | |||
{{proofcard|Theorem|thm-2|Let <math>(\Omega,\F,\p)</math> be a probability space. Let <math>X:(\Omega,\F,\p)\to(\R^n,\B(\R^n),\lambda)</math> be a r.v. on that space and let <math>Y</math> be a real valued r.v. on that space. <math>Y</math> is measurable with respect to <math>\sigma(X)</math> if and only if there exists a Borel measurable function <math>f:\R^n\to\R</math> such that | |||
<math display="block"> | |||
Y=f(X). | |||
</math>|}} | |||
{{alert-info | | |||
We want to make use of the fact that for the Hilbert space <math>L^2(\Omega,\F,\p)</math> we get that <math>L^2(\Omega,\sigma(X),\p)\subset L^2(\Omega,\F,\p)</math> is a complete subspace, since <math>\sigma(X)\subset\F</math>. This allows us to use the orthogonal projections and to interpret the conditional expectation as such a projection. | |||
}} | |||
{{definitioncard|Conditional expectation (as a projection onto a closed subspace)|Let <math>(\Omega,\F,\p)</math> be a probability space. Let <math>Y\in L^2(\Omega,\F,\p)</math>. Then the conditional expectation of <math>Y</math> given <math>X</math> is the unique element <math>\hat Y\in L^2(\Omega,\sigma(X),\p)</math> such that for all <math>Z\in L^2(\Omega,\sigma(X),\p)</math> | |||
<math display="block"> | |||
\begin{equation} | |||
\E[YZ]=\E[\hat Y Z]. | |||
\end{equation} | |||
</math> | |||
This result is due to the fact that if <math>Y-\hat Y\in L^2(\Omega,\F,\p)</math> then for all <math>Z\in L^2(\Omega,\sigma(X),\p)</math> we get <math>\langle Y-\hat Y,Z\rangle=0</math>. We write <math>\E[Y\mid X]</math> for <math>\hat Y</math>.}} | |||
{{alert-info | | |||
<math>\hat Y</math> is the orthogonal projection of <math>Y</math> onto <math>L^2(\Omega,\sigma(X),\p)</math>. | |||
}} | |||
{{alert-info | | |||
Since <math>X</math> takes values in <math>\R^n</math>, there exists a Borel measurable function <math>f:\R^n\to\R</math> such that | |||
<math display="block"> | |||
\E[Y\mid X]=f(X) | |||
</math> | |||
with <math>\E[f^2(X)] < \infty</math>. We can also rewrite (3) as: for all Borel measurable <math>g:\R^n\to\R</math>, such that <math>\E[g^2(X)] < \infty</math>, we get | |||
<math display="block"> | |||
\E[Yg(X)]=\E[f(X)g(X)]. | |||
</math> | |||
}} | |||
Now let <math>\mathcal{G}\subset\F</math> be a sub <math>\sigma</math>-Algebra of <math>\F</math> and consider the space <math>L^2(\Omega,\mathcal{G},\p)\subset L^2(\Omega,\F,\p)</math>. It is clear that <math>L^2(\Omega,\mathcal{G},\p)</math> is a Hilbert space and thus we can project to it. | |||
{{definitioncard|Conditional expectation (projection case)|Let <math>(\Omega,\F,\p)</math> be a probability space. Let <math>Y\in L^2(\Omega,\F,\p)</math> and let <math>\mathcal{G}\subset\F</math> be a sub <math>\sigma</math>-Algebra of <math>\F</math>. Then the conditional expectation of <math>Y</math> given <math>\mathcal{G}</math> is defined as the unique element <math>\E[Y\mid \mathcal{G}]\in L^2(\Omega,\mathcal{G},\p)</math> such that for all <math>Z\in L^2(\Omega,\mathcal{G},\p)</math> | |||
<math display="block"> | |||
\begin{equation} | |||
\label{4} | |||
\E[YZ]=\E[\E[Y\mid \mathcal{G}]Z]. | |||
\end{equation} | |||
</math> | |||
}} | |||
{{alert-info | | |||
In (3) or (1), it is enough{{efn|Since we can always consider linear combinations of <math>\one_A</math> and then apply density theorems to it}} to restrict the test r.v. <math>Z</math> to the class of r.v.'s of the form | |||
<math display="block"> | |||
Z=\one_A,A\in\mathcal{G}. | |||
</math> | |||
}} | |||
{{alert-info | | |||
The conditional expectation is in <math>L^2</math>, so it's only defined a.s. and not everywhere in a unique way. So in particular, any statement like <math>\E[Y\mid\mathcal{G}]\geq0</math> or <math>\E[Y\mid \mathcal{G}]=Z</math> has to be understood with an implicit a.s. | |||
}} | |||
{{proofcard|Theorem|thm-3|Let <math>(\Omega,\F,\p)</math> be a probability space. Let <math>Y\in L^2(\Omega,\F,\p)</math> and let <math>\mathcal{G}\subset \F</math> be a sub <math>\sigma</math>-Algebra of <math>\F</math>. | |||
<ul style{{=}}"list-style-type:lower-roman"><li>If <math>Y\geq 0</math>, then <math>\E[Y\mid \mathcal{G}]\geq 0</math> | |||
</li> | |||
<li><math>\E[\E[Y\mid\mathcal{G}]]=\E[Y]</math> | |||
</li> | |||
<li>The map <math>Y\mapsto\E[Y\mid\mathcal{G}]</math> is linear. | |||
</li> | |||
</ul> | |||
|For <math>(i)</math> take <math>Z=\one_{\{\E[Y\mid\mathcal{G}] < 0\}}</math> to obtain | |||
<math display="block"> | |||
\underbrace{\E[YZ]}_{\geq 0}=\underbrace{\E[\E[Y\mid \mathcal{G}]Z]}_{\leq 0}. | |||
</math> | |||
This implies that <math>\p[\E[Y\mid \mathcal{G}] < 0]=0</math>. For <math>(ii)</math> take <math>Z=\one_{\Omega}</math> and plug into (4). For <math>(iii)</math> notice that linearity comes from the orthogonal projection operator. But we can also do it directly by taking <math>Y,Y'\in L^2(\Omega,\F,\p)</math>, <math>\alpha,\beta\in \R</math> and <math>Z\in L^2(\Omega,\mathcal{G},\p)</math> to obtain | |||
<math display="block"> | |||
\E[(\alpha Y+\beta Y')Z]=\E[YZ]+\beta\E[Y'Z]=\alpha\E[\E[Y\mid\mathcal{G}]Z]+\beta\E[\E[Y'\mid\mathcal{G}]Z]=\E[(\alpha\E[Y\mid\mathcal{G}]+\beta\E[Y'\mid\mathcal{G}])Z]. | |||
</math> | |||
Now we can conclude by using the uniqueness property that | |||
<math display="block"> | |||
\E[\alpha Y+\beta Y'\mid \mathcal{G}]=\alpha\E[Y\mid \mathcal{G}]+\beta\E[Y'\mid \mathcal{G}]. | |||
</math>}} | |||
Now we want to extend the definition of the conditional expectation to r.v.'s in <math>L^1(\Omega,\F,\p)</math> or to <math>L^+(\Omega,\F,\p)</math>, which is the space of non negative r.v.'s allowing the value <math>\infty</math>. | |||
{{proofcard|Lemma|lem-1|Let <math>(\Omega,\F,\p)</math> be a probability space. Let <math>Y\in L^+(\Omega,\F,\p)</math> and let <math>\mathcal{G}\subset \F</math> be a sub <math>\sigma</math>-Algebra of <math>\F</math>. Then there exists a unique element <math>\E[Y\mid \mathcal{G}]\in L^+(\Omega,\mathcal{G},\p)</math> such that for all <math>X\in L^+(\Omega,\mathcal{G},\p)</math> | |||
<math display="block"> | |||
\begin{equation} | |||
\E[YX]=\E[\E[Y\mid \mathcal{G}]X] | |||
\end{equation} | |||
</math> | |||
and this conditional expectation agrees with the previous definition when <math>Y\in L^2(\Omega,\F,\p)</math>. Moreover, if <math>0\leq Y\leq Y'</math>, then | |||
<math display="block"> | |||
\E[Y\mid \mathcal{G}]\leq \E[Y'\mid \mathcal{G}]. | |||
</math> | |||
|If <math>Y\leq 0</math> and <math>Y\in L^2(\Omega,\F,\p)</math>, then we define <math>\E[Y\mid\mathcal{G}]</math> as before. If <math>X\in L^+(\Omega,\mathcal{G},\p)</math>, we get that <math>X_n=X\land n</math>, is in <math>L^2(\Omega,\mathcal{G},\p)</math> and is positive with <math>X_n\uparrow X</math> for <math>n\to\infty</math>. Using the monotone convergence theorem we get | |||
<math display="block"> | |||
\E[YX]=\E[Y\lim_{n\to\infty}X_n]=\lim_{n\to\infty}\E[YX_n]=\lim_{n\to\infty}\E[\E[Y\mid\mathcal{G}]X_n]=\E[\E[Y\mid\mathcal{G}]\lim_{n\to\infty}X]=\E[\E[Y\mid\mathcal{G}]X]. | |||
</math> | |||
This shows that (5) is true whenever <math>Y\in L^2(\Omega,\F,\p)</math> with <math>Y\geq 0</math> and <math>X\in L^+(\Omega,\mathcal{G},\p)</math>. Now let <math>Y\in L^1(\Omega,\F,\p)</math>. Define <math>Y_m=Y\land m</math>. Hence we get <math>Y_m\in L^2(\Omega,\F,\p)</math> and <math>Y_m\uparrow Y</math> as <math>n\to\infty</math>. Each <math>\E[Y_m\mid\mathcal{G}]</math> is well defined{{efn|because for <math>Y\in L^2</math> and <math>U\in L^2</math> we get <math>Y\geq U\Longrightarrow Y-U\geq 0\Longrightarrow \E[Y\mid\mathcal{G}]\geq \E[U\mid\mathcal{G}]</math>}} and positive and increasing. We define | |||
<math display="block"> | |||
\E[Y\mid\mathcal{G}]=\lim_{n\to\infty}\E[Y_m\mid \mathcal{G}]. | |||
</math> | |||
Several applications of the monotone convergence theorem will give us for <math>X\in L^+(\Omega,\mathcal{G},\p)</math> | |||
<math display="block"> | |||
\E[YX]=\lim_{m\to\infty}\E[Y_mX]=\lim_{m\to\infty}\E[\E[Y_m\mid\mathcal{G}]X]=\E[\E[Y\mid \mathcal{G}]X]. | |||
</math> | |||
Furthermore if <math>0\leq Y\leq Y'</math>, then <math>Y\land m\leq Y'\land m</math> and therefore | |||
<math display="block"> | |||
\E[Y\mid\mathcal{G}]\leq \E[Y'\mid\mathcal{G}]. | |||
</math> | |||
Now we need to show uniqueness{{efn|Note that for any <math>W\in L^+</math>, the set <math>E</math> on which <math>W=\infty</math> is a null set. For suppose not, then <math>\E[W]\geq \E[\infty \one_E]=\infty\p[E]</math>. But since <math>\p[E] > 0</math> this cannot happen}}. Let <math>U</math> and <math>V</math> be two versions of <math>\E[Y\mid \mathcal{G}]</math>. Let | |||
<math display="block"> | |||
\Lambda_n=\{U < V\leq n\}\in\mathcal{G} | |||
</math> | |||
and assume <math>\p[\Lambda_n] > 0</math>. We then have | |||
<math display="block"> | |||
\E[Y\one_{\Lambda_n}]=\underbrace{\E[U\one_{\Lambda_n}]=\E[V\one_{\Lambda_n}]}_{\E[(U-V)\one_{\Lambda_n}]=0}. | |||
</math> | |||
This contradicts the fact that <math>\p[\Lambda_n] > 0</math>. Moreover, <math>\{U < V\}=\bigcup_{n\geq 1}\Lambda_n</math> and therefore | |||
<math display="block"> | |||
\p[U < V]=0 | |||
</math> | |||
and similarly <math>\p[V < U]=0</math>. This implies | |||
<math display="block"> | |||
\p[U=V]=1. | |||
</math>}} | |||
{{proofcard|Theorem|thm-4|Let <math>(\Omega,\F,\p)</math> be a probability space. Let <math>Y\in L^1(\Omega,\F,\p)</math> and let <math>\mathcal{G}\subset\F</math> be a sub <math>\sigma</math>-Algebra of <math>\F</math>. Then there exists a unique element <math>\E[Y\mid \mathcal{G}]\in L^1(\Omega,\mathcal{G},\p)</math> such that for every <math>X</math> bounded and <math>\mathcal{G}</math>-measurable | |||
<math display="block"> | |||
\begin{equation} | |||
\E[YX]=\E[\E[Y\mid \mathcal{G}]X]. | |||
\end{equation} | |||
</math> | |||
This conditional expectation agrees with the definition for the <math>L^2</math>. Moreover it satisfies: | |||
<ul style{{=}}"list-style-type:lower-roman"><li>If <math>Y\geq 0</math>, then <math>\E[Y\mid\mathcal{G}]\geq 0</math> | |||
</li> | |||
<li>The map <math>Y\mapsto \E[Y\mid\mathcal{G}]</math> is linear. | |||
</li> | |||
</ul> | |||
|We will only prove the existence, since the rest is exactly the same as before. Write <math>Y=Y^+-Y^-</math> with <math>Y^+,Y^-\in L^1(\Omega,\F,\p)</math> and <math>Y^+,Y^-\geq 0</math>. So <math>\E[Y^+\mid\mathcal{G}]</math> and <math>\E[Y^-\mid\mathcal{G}]</math> are well defined. Now we set | |||
<math display="block"> | |||
\E[Y\mid\mathcal{G}]=\E[Y^+\mid \mathcal{G}]-\E[Y^-\mid\mathcal{G}]. | |||
</math> | |||
This is well defined because | |||
<math display="block"> | |||
\E[\E[Y^\pm\mid\mathcal{G}]]=\E[Y^\pm] < \infty | |||
</math> | |||
if we let <math>X=\one_\Omega</math> in the previous lemma and therefore <math>\E[Y^+\mid\mathcal{G}]</math> and <math>\E[Y^-\mid \mathcal{G}]\in L^1(\Omega,\mathcal{G},\p)</math>. For all <math>X</math> bounded and <math>\mathcal{G}</math>-measurable we can also write <math>X=X^+-X^-</math> and it follows from the previous lemma that | |||
<math display="block"> | |||
\E[\E[Y^\pm\mid\mathcal{G}]X]=\E[Y^\pm X]. | |||
</math> | |||
This implies that <math>\E[Y\mid\mathcal{G}]</math> satisfies (6).}} | |||
{{proofcard|Corollary|cor-1|Let <math>(\Omega,\F,\p)</math> be a probability space. Let <math>X\in L^1(\Omega,\F,\p)</math> be a r.v. on that space. Then | |||
<math display="block"> | |||
\E[\E[X\mid\mathcal{G}]]=\E[X]. | |||
</math> | |||
|Take equation (4) and set <math>Z=\one_\Omega</math>.}} | |||
{{proofcard|Corollary|cor-2|Let <math>(\Omega,\F,\p)</math> be a probability space. Let <math>X\in L^1(\Omega,\F,\p)</math> be a r.v. on that space. Then | |||
<math display="block"> | |||
\vert\E[X\mid\mathcal{G}]\vert\leq \E[\vert X\vert\mid\mathcal{G}]. | |||
</math> | |||
In particular | |||
<math display="block"> | |||
\E[\vert\E[X\mid\mathcal{G}]\vert]\leq \E[\vert X\vert]. | |||
</math> | |||
|We can always write <math>X=X^+-X^-</math> and also <math>\vert X\vert=X^++X^-</math>. Therefore we get | |||
<math display="block"> | |||
\vert\E[X\mid\mathcal{G}]\vert=\vert\E[X^+\mid\mathcal{G}]-\E[X^-\mid\mathcal{G}]\vert\leq \E[X^+\mid\mathcal{G}]+\E[X^-\mid\mathcal{G}]=\E[X^++X^-\mid\mathcal{G}]=\E[\vert X\vert\mid\mathcal{G}]. | |||
</math>}} | |||
{{proofcard|Proposition|prop-1|Let <math>(\Omega,\F,\p)</math> be a probability space. Let <math>Y\in L^1(\Omega,\F,\p)</math> be a r.v. on that space and assume that <math>Y</math> is independent of the sub <math>\sigma</math>-Algebra <math>\mathcal{G}\subset\F</math>, i.e. <math>\sigma(Y)</math> is independent of <math>\mathcal{G}</math>. Then | |||
<math display="block"> | |||
\E[Y\mid\mathcal{G}]=\E[Y]. | |||
</math> | |||
|Let <math>Z</math> be a bounded and <math>\mathcal{G}</math>-measurable r.v. and therefore <math>Y</math> and <math>Z</math> are independent. Hence we get | |||
<math display="block"> | |||
\E[YZ]=\E[Y]\E[Z]=\E[\E[Y]Z]. | |||
</math> | |||
This implies that, since <math>\E[Y]</math> is constant, that <math>\E[Y]\in L^1(\Omega,\mathcal{G},\p)</math> and satisfies (4). Therefore by uniqueness we get that <math>\E[Y\mid\mathcal{G}]=\E[Y]</math>.}} | |||
{{proofcard|Theorem|thm-5|Let <math>(\Omega,\F,\p)</math> be a probability space. Let <math>X</math> and <math>Y</math> be two r.v.'s on that space and let <math>\mathcal{G}\subset\F</math> be a sub <math>\sigma</math>-Algebra of <math>\F</math>. Assume further that at least one of these two holds: | |||
<ul style{{=}}"list-style-type:lower-roman"><li><math>X,Y</math> and <math>XY</math> are in <math>L^1(\Omega,\F,\p)</math> with <math>X</math> being <math>\mathcal{G}</math>-measurable. | |||
</li> | |||
<li><math>X\geq 0</math>, <math>Y\geq 0</math> with <math>X</math> being <math>\mathcal{G}</math>-mearuable. | |||
</li> | |||
</ul> | |||
Then | |||
<math display="block"> | |||
\E[XY\mid\mathcal{G}]=\E[Y\mid\mathcal{G}]X. | |||
</math> | |||
In particular, if <math>X</math> is a positive r.v. or in <math>L^1(\Omega,\mathcal{G},\p)</math> and <math>\mathcal{G}</math>-measurable, then | |||
<math display="block"> | |||
\E[X\mid\mathcal{G}]=X. | |||
</math> | |||
|For <math>(ii)</math> assume first that <math>X,Y\leq 0</math>. Let <math>Z</math> be a positive and <math>\mathcal{G}</math>-measurable r.v. Then we can obtain | |||
<math display="block"> | |||
\E[(XY)Z]=\E[Y(XZ)]=\E[\E[Y\mid\mathcal{G}]XZ]=\E[(\E[Y\mid\mathcal{G}]X)Z]. | |||
</math> | |||
Note that <math>\E[(\E[Y\mid\mathcal{G}]X)Z]</math> is a positive r.v. and <math>\mathcal{G}</math>-measurable. Hence <math>\E[XY\mid\mathcal{G}]=X\E[Y\mid\mathcal{G}]</math>. For <math>(i)</math> we can write <math>X=X^++X^-</math> and use <math>(ii)</math>. This is an easy exercise.}} | |||
{{alert-info | | |||
Next we want to show that the classical limit theorems from measure theory also make sense in terms of the conditional expectation{{efn|Recall the classical limit theorems for integrals: <math>Monotone</math> <math>convergence:</math> Let <math>(f_n)_{n\geq 1}</math> be an increasing sequence of positive and measurable functions and let <math>f=\lim_{n\to\infty}\uparrow f_n</math>. Then <math>\int fd\mu=\lim_{n\to\infty}f_nd\mu</math>. <math>Fatou:</math> Let <math>(f_n)_{n\geq 1}</math> be a sequence of measurable and positive functions. Then <math>\int\liminf_n f_n d\mu\leq \liminf_n \int f_nd\mu</math>. <math>Dominated</math> <math>convergence:</math> Let <math>(f_n)_{n\geq 1}</math> be a sequence of integrable functions with <math>\vert f_n\vert\leq g</math> for all <math>n</math> with <math>g</math> integrable. Denote <math>f=\lim_{n\to\infty}f_n</math>. Then <math>\lim_{n\to\infty}\int f_nd\mu=\int fd\mu</math>}}. | |||
}} | |||
{{proofcard|Theorem (Limit theorems for the conditional expectation)|thm-6|Let <math>(\Omega,\F,\p)</math> be a probability space. Let <math>(Y_n)_{n\geq 1}</math> be a sequence of r.v.'s on that space and let <math>\mathcal{G}\subset\F</math> be a sub <math>\sigma</math>-Algebra of <math>\F</math>. Then we have: | |||
<ul style{{=}}"list-style-type:lower-roman"><li>(''Monotone convergence'') Assume that <math>(Y_n)_{n\geq 1}</math> is a sequence of positive r.v.'s for all <math>n</math> such that <math>\lim_{n\to\infty}\uparrow Y_n=Y</math> a.s. Then | |||
<math display="block"> | |||
\lim_{n\to\infty}\E[Y_n\mid\mathcal{G}]=\E[Y\mid \mathcal{G}]. | |||
</math> | |||
</li> | |||
<li>(''Fatou'') Assume that <math>(Y_n)_{n\geq 1}</math> is a sequence of positive r.v.'s for all <math>n</math>. Then | |||
<math display="block"> | |||
\E[\liminf_n Y_n\mid\mathcal{G}]=\liminf_n\E[Y_n\mid\mathcal{G}]. | |||
</math> | |||
</li> | |||
<li>(''Dominated convergence'') Assume that <math>Y_n\xrightarrow{n\to\infty}Y</math> a.s. and that there exists <math>Z\in L^1(\Omega,\F,\p)</math> such that <math>\vert Y_n\vert\leq Z</math> for all <math>n</math>. Then | |||
<math display="block"> | |||
\lim_{n\to\infty}\E[Y_n\mid \mathcal{G}]=\E[Y\mid\mathcal{G}]. | |||
</math> | |||
</li> | |||
</ul> | |||
|We will only prove <math>(i)</math>, since <math>(ii)</math> and <math>(iii)</math> are proved in a similar way (it's a good exercise to do the proof). Since <math>(Y_n)_{n\geq 1}</math> is an increasing sequence, it follows that | |||
<math display="block"> | |||
\E[Y_{n+1}\mid\mathcal{G}]\geq \E[Y_n\mid\mathcal{G}]. | |||
</math> | |||
Hence we can deduce that <math>\lim_{n\to\infty}\uparrow \E[Y_n\mid\mathcal{G}]</math> exists and we denote it by <math>Y'</math>. Moreover, note that <math>Y'</math> is <math>\mathcal{G}</math>-measurable, since it is a limit of <math>\mathcal{G}</math>-measurable r.v.'s. Let <math>X</math> be a positive and <math>\mathcal{G}</math>-measurable r.v. and obtain then | |||
<math display="block"> | |||
\E[Y'X]=\E[\lim_{n\to\infty}\E[Y_n\mid\mathcal{G}]X]=\lim_{n\to\infty}\uparrow\E[\E[Y_n\mid\mathcal{G}]X]=\lim_{n\to\infty}\E[Y_n X]=\E[YX], | |||
</math> | |||
where we have used monotone convergence twice and equation (4). Therefore we get | |||
<math display="block"> | |||
\lim_{n\to\infty}\E[Y_n\mid\mathcal{G}]=\E[Y\mid\mathcal{G}]. | |||
</math>}} | |||
{{proofcard|Theorem (Jensen's inequality)|thm-7|Let <math>(\Omega,\F,\p)</math> be a probability space. Let <math>\varphi:\R\to\R</math> be a real, convex function. Let <math>X\in\L^1(\Omega,\F,\p)</math> such that <math>\varphi(X)\in L^1(\Omega,\F,\p)</math>. Then | |||
<math display="block"> | |||
\varphi(\E[X\mid\mathcal{G}])\leq \E[\varphi(X)\mid\mathcal{G}] | |||
</math> | |||
for all sub <math>\sigma</math>-Algebras <math>\mathcal{G}\subset\F</math>. | |||
|Exercise.}} | |||
'''Example''' | |||
Let <math>(\Omega,\F,\p)</math> be a probability space. Let <math>\varphi(X)=X^2</math> and let <math>X\in L^2(\Omega,\F,\p)</math>. Then | |||
<math display="block"> | |||
(\E[X\mid \mathcal{G}])^2\leq \E[X^2\mid\mathcal{G}] | |||
</math> | |||
for all sub <math>\sigma</math>-Algebras <math>\mathcal{G}\subset \F</math>. | |||
{{proofcard|Theorem (Tower property)|thm-8|Let <math>(\Omega,\F,\p)</math> be a probability space. Let <math>X\in L^1(\Omega,\F,\p)</math> be a positive r.v. on that space. Let <math>\mathcal{C}\subset\mathcal{G}\subset \F</math> be a tower of sub <math>\sigma</math>-Algebras of <math>\F</math>. Then | |||
<math display="block"> | |||
\E[\E[X\mid\mathcal{G}]\mathcal{C}]=\E[X\mid\mathcal{C}]. | |||
</math> | |||
|Let <math>Z</math> be a bounded and <math>\mathcal{C}</math>-measurable r.v. Then we obtain | |||
<math display="block"> | |||
\E[XZ]=\E[\E[X\mid\mathcal{C}]Z]. | |||
</math> | |||
But <math>Z</math> is also <math>\mathcal{G}</math>-measurable and hence we get | |||
<math display="block"> | |||
\E[XZ]=\E[\E[X\mid\mathcal{G}]Z]. | |||
</math> | |||
Therefore, for all <math>Z</math> bounded and <math>\mathcal{C}</math>-measurable r.v.'s, we get | |||
<math display="block"> | |||
\E[\E[X\mid\mathcal{G}]Z]=\E[\E[X\mid\mathcal{C}]Z] | |||
</math> | |||
and thus | |||
<math display="block"> | |||
\E[\E[X\mid\mathcal{G}]\mathcal{C}]=\E[X\mid\mathcal{C}]. | |||
</math>}} | |||
==General references== | |||
{{cite arXiv|last=Moshayedi|first=Nima|year=2020|title=Lectures on Probability Theory|eprint=2010.16280|class=math.PR}} | |||
==Notes== | |||
{{notelist}} |
Revision as of 01:53, 8 May 2024
Conditional probability
Let [math](\Omega,\F,\p)[/math] be a probability space and let [math]A,B\in\F[/math] such that [math]\p[B] \gt 0[/math]. Then the conditional probability[a] of [math]A[/math] given [math]B[/math] is defined as
The important fact here is that the application [math]\F\to [0,1][/math], [math]A\mapsto \p[A\mid B][/math] defines a new probability measure on [math]\F[/math] called the conditional probability given [math]B[/math]. There are several facts, which we need to recall:
- If [math]A_1,...,A_n\in\F[/math] and if [math]\p\left[\bigcap_{k=1}^nA_k\right] \gt 0[/math], then
[[math]] \p\left[\bigcap_{k=1}^nA_k\right]=\prod_{j=1}^n\p\left[A_j\Big|\bigcap_{k=1}^{j-1}A_k\right]. [[/math]]
- Let [math](E_n)_{n\geq 1}[/math] be a measurable partition of [math]\Omega[/math], i.e. for all [math]n\geq 1[/math] we have that [math]E_n\in\F[/math] and for [math]n\not=m[/math] we get [math]E_n\cap E_m=\varnothing[/math] and [math]\bigcup_{n\geq 1}E_n=\Omega[/math]. Now for [math]A\in \F[/math] we get
[[math]] \p[A]=\sum_{n\geq 1}\p[A\mid E_n]\p[E_n]. [[/math]]
- (Baye's formula)[b] Let [math](E_n)_{n\geq 1}[/math] be a measurable partition of [math]\Omega[/math] and [math]A\in\F[/math] with [math]\p[A] \gt 0[/math]. Then
[[math]] \p[E_n\mid A]=\frac{\p[A\mid E_n]\p[E_n]}{\sum_{m\geq 1}\p[A\mid E_m]\p[E_m]}. [[/math]]
We can reformulate the definition of the conditional probability to obtain
Discrete construction of the conditional expectation
Let [math]X[/math] and [math]Y[/math] be two r.v.'s on a probability space [math](\Omega,\F,\p)[/math]. Let [math]Y[/math] take values in [math]\R[/math] and [math]X[/math] take values in a countable discrete set [math]\{x_1,x_2,...,x_n,...\}[/math]. The goal is to describe the expectation of the r.v. [math]Y[/math] by knowing the observed r.v. [math]X[/math]. For instance, let [math]X=x_j\in\{x_1,x_2,...,x_n,...\}[/math]. Therefore we look at a set [math]\{\omega\in\Omega\mid X(\omega)=x_j\}[/math] rather than looking at whole [math]\Omega[/math]. For [math]\Lambda\in\F[/math], we thus define
a new probability measure [math]\Q[/math], with [math]\p[X=x_j] \gt 0[/math]. Therefore it makes more sense to compute
rather than
Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math]X:\Omega\to\{x_1,x_2,...,x_n,...\}[/math] be a r.v. taking values in a discrete set and let [math]Y[/math] be a real valued r.v. on that space. If [math]\p[X=x_j] \gt 0[/math], we can define the conditional expectation of [math]Y[/math] given [math]\{X=x_j\}[/math] to be
Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math]X[/math] be a r.v. on that space with values in [math]\{x_1,x_2,...,x_n,...\}[/math] and let [math]Y[/math] also be a r.v. with values in [math]\{y_1,y_2,...,y_n,...\}[/math]. If [math]\p[X=x_j] \gt 0[/math], we can write the conditional expectation of [math]Y[/math] given [math]\{X=x_j\}[/math] as
Apply the definitions above to obtain
Now let again [math]X[/math] be a r.v. with values in [math]\{x_1,x_2,...,x_n,...\}[/math] and [math]Y[/math] a real valued r.v. The next step is to define [math]\E[Y\mid X][/math] as a function [math]f(X)[/math]. Therefore we introduce the function
It doesn't matter which value we assign to [math]f[/math] for [math]\p[X=x]=0[/math], since it doesn't affect the expectation because it's defined on a null set. For convention we want to assign to it the value 0.
Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math]X[/math] be a countably valued r.v. and let [math]Y[/math] be a real valued r.v. The conditional expectation of [math]Y[/math] given [math]X[/math] is defined by
The above definition does not define [math]\E[Y\mid X][/math] everywhere but rather almost everywhere, since on each set [math]\{X=x\}[/math], where [math]\p[X=x]=0[/math], its value is arbitrary.
Example
Let[d] [math]X\sim\Pi(\lambda)[/math]. Let us consider a tossing game, where we say that when [math]X=n[/math], we do [math]n[/math] independent tossing of a coin where each time one obtains 1 with probability [math]p\in[0,1][/math] and 0 with probability [math]1-p[/math]. Define also [math]S[/math] to be the r.v. giving the total number of 1 obtained in the game. Therefore, if [math]X=n[/math] is given, we get that [math]S[/math] is binomial distributed with parameters [math](p,n)[/math]. We want to compute
- [math]\E[S\mid X][/math]
- [math]\E[X\mid S][/math]
It is more natural to ask for the expectation of the amount of 1 obtained for the whole game by knowing how many games were played. The reverse is a bit more difficult. Logically, we may also notice that it definitely doesn't make sense to say [math]S\geq X[/math], because we can not obtain more wins in a game than the amount of games that were played.
- First we compute [math]\E[S\mid X=n][/math]: If [math]X=n[/math], we know that [math]S[/math] is binomial distributed with parameters [math](p,n)[/math] ([math]S\sim \B(p,n)[/math]) and therefore we already know[e]
[[math]] \E[S\mid X=n]=pn. [[/math]]Now we need to identify the function [math]f[/math] defined as in (2) by[[math]] \begin{align*} f:\N&\longrightarrow\R\\ n&\longmapsto pn. \end{align*} [[/math]]Therefore we get by definition[[math]] \E[S\mid X]=pX. [[/math]]
-
Next we want to compute [math]\E[X\mid S=k][/math]: For [math]n\geq k[/math] we have
[[math]] \p[X=n\mid S=k]=\frac{\p[S=k\mid X=n]\p[X=n]}{\p[S=k]}=\frac{\begin{binom}nk\end{binom}p^k(1-p)^{n-k}e^{-\lambda}\frac{\lambda^n}{n!}}{\sum_{m=k}^\infty\begin{binom}mk\end{binom}p^k(1-p)^{m-k}e^{-\lambda}\frac{\lambda^m}{m!}}, [[/math]]since [math]\{S=k\}=\bigsqcup_{m\geq k}\{S=k,X=m\}[/math]. By some algebra we obtain that[[math]] \frac{\begin{binom}nk\end{binom}p^k(1-p)^{n-k}e^{-\lambda}\frac{\lambda^n}{n!}}{\sum_{m=k}^\infty\begin{binom}mk\end{binom}p^k(1-p)^{m-k}e^{-\lambda}\frac{\lambda^m}{m!}}=\frac{(\lambda(1-p))^{n-k}e^{-\lambda(1-p)}}{(n-k)!} [[/math]]Hence we get that[[math]] \E[X\mid S=k]=\sum_{n\geq k}n\p[X=n\mid S=k]=k+\lambda(1-p). [[/math]]Therefore [math]\E[X\mid S]=S+\lambda(1-p)[/math].
Continuous construction of the conditional expectation
Now we want to define [math]\E[Y\mid X][/math], where [math]X[/math] is no longer assumed to be countably valued. Therefore we want to recall the following two facts:
Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math]X:(\Omega,\F,\p)\to (\R^n,\B(\R^n),\lambda)[/math] be a r.v. on that space. The [math]\sigma[/math]-Algebra generated by [math]X[/math] is given by
Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math]X:(\Omega,\F,\p)\to(\R^n,\B(\R^n),\lambda)[/math] be a r.v. on that space and let [math]Y[/math] be a real valued r.v. on that space. [math]Y[/math] is measurable with respect to [math]\sigma(X)[/math] if and only if there exists a Borel measurable function [math]f:\R^n\to\R[/math] such that
We want to make use of the fact that for the Hilbert space [math]L^2(\Omega,\F,\p)[/math] we get that [math]L^2(\Omega,\sigma(X),\p)\subset L^2(\Omega,\F,\p)[/math] is a complete subspace, since [math]\sigma(X)\subset\F[/math]. This allows us to use the orthogonal projections and to interpret the conditional expectation as such a projection.
Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math]Y\in L^2(\Omega,\F,\p)[/math]. Then the conditional expectation of [math]Y[/math] given [math]X[/math] is the unique element [math]\hat Y\in L^2(\Omega,\sigma(X),\p)[/math] such that for all [math]Z\in L^2(\Omega,\sigma(X),\p)[/math]
[math]\hat Y[/math] is the orthogonal projection of [math]Y[/math] onto [math]L^2(\Omega,\sigma(X),\p)[/math].
Since [math]X[/math] takes values in [math]\R^n[/math], there exists a Borel measurable function [math]f:\R^n\to\R[/math] such that
Now let [math]\mathcal{G}\subset\F[/math] be a sub [math]\sigma[/math]-Algebra of [math]\F[/math] and consider the space [math]L^2(\Omega,\mathcal{G},\p)\subset L^2(\Omega,\F,\p)[/math]. It is clear that [math]L^2(\Omega,\mathcal{G},\p)[/math] is a Hilbert space and thus we can project to it.
Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math]Y\in L^2(\Omega,\F,\p)[/math] and let [math]\mathcal{G}\subset\F[/math] be a sub [math]\sigma[/math]-Algebra of [math]\F[/math]. Then the conditional expectation of [math]Y[/math] given [math]\mathcal{G}[/math] is defined as the unique element [math]\E[Y\mid \mathcal{G}]\in L^2(\Omega,\mathcal{G},\p)[/math] such that for all [math]Z\in L^2(\Omega,\mathcal{G},\p)[/math]
In (3) or (1), it is enough[f] to restrict the test r.v. [math]Z[/math] to the class of r.v.'s of the form
The conditional expectation is in [math]L^2[/math], so it's only defined a.s. and not everywhere in a unique way. So in particular, any statement like [math]\E[Y\mid\mathcal{G}]\geq0[/math] or [math]\E[Y\mid \mathcal{G}]=Z[/math] has to be understood with an implicit a.s.
Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math]Y\in L^2(\Omega,\F,\p)[/math] and let [math]\mathcal{G}\subset \F[/math] be a sub [math]\sigma[/math]-Algebra of [math]\F[/math].
- If [math]Y\geq 0[/math], then [math]\E[Y\mid \mathcal{G}]\geq 0[/math]
- [math]\E[\E[Y\mid\mathcal{G}]]=\E[Y][/math]
- The map [math]Y\mapsto\E[Y\mid\mathcal{G}][/math] is linear.
For [math](i)[/math] take [math]Z=\one_{\{\E[Y\mid\mathcal{G}] \lt 0\}}[/math] to obtain
Now we want to extend the definition of the conditional expectation to r.v.'s in [math]L^1(\Omega,\F,\p)[/math] or to [math]L^+(\Omega,\F,\p)[/math], which is the space of non negative r.v.'s allowing the value [math]\infty[/math].
Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math]Y\in L^+(\Omega,\F,\p)[/math] and let [math]\mathcal{G}\subset \F[/math] be a sub [math]\sigma[/math]-Algebra of [math]\F[/math]. Then there exists a unique element [math]\E[Y\mid \mathcal{G}]\in L^+(\Omega,\mathcal{G},\p)[/math] such that for all [math]X\in L^+(\Omega,\mathcal{G},\p)[/math]
If [math]Y\leq 0[/math] and [math]Y\in L^2(\Omega,\F,\p)[/math], then we define [math]\E[Y\mid\mathcal{G}][/math] as before. If [math]X\in L^+(\Omega,\mathcal{G},\p)[/math], we get that [math]X_n=X\land n[/math], is in [math]L^2(\Omega,\mathcal{G},\p)[/math] and is positive with [math]X_n\uparrow X[/math] for [math]n\to\infty[/math]. Using the monotone convergence theorem we get
Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math]Y\in L^1(\Omega,\F,\p)[/math] and let [math]\mathcal{G}\subset\F[/math] be a sub [math]\sigma[/math]-Algebra of [math]\F[/math]. Then there exists a unique element [math]\E[Y\mid \mathcal{G}]\in L^1(\Omega,\mathcal{G},\p)[/math] such that for every [math]X[/math] bounded and [math]\mathcal{G}[/math]-measurable
- If [math]Y\geq 0[/math], then [math]\E[Y\mid\mathcal{G}]\geq 0[/math]
- The map [math]Y\mapsto \E[Y\mid\mathcal{G}][/math] is linear.
We will only prove the existence, since the rest is exactly the same as before. Write [math]Y=Y^+-Y^-[/math] with [math]Y^+,Y^-\in L^1(\Omega,\F,\p)[/math] and [math]Y^+,Y^-\geq 0[/math]. So [math]\E[Y^+\mid\mathcal{G}][/math] and [math]\E[Y^-\mid\mathcal{G}][/math] are well defined. Now we set
Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math]X\in L^1(\Omega,\F,\p)[/math] be a r.v. on that space. Then
Take equation (4) and set [math]Z=\one_\Omega[/math].
Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math]X\in L^1(\Omega,\F,\p)[/math] be a r.v. on that space. Then
We can always write [math]X=X^+-X^-[/math] and also [math]\vert X\vert=X^++X^-[/math]. Therefore we get
Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math]Y\in L^1(\Omega,\F,\p)[/math] be a r.v. on that space and assume that [math]Y[/math] is independent of the sub [math]\sigma[/math]-Algebra [math]\mathcal{G}\subset\F[/math], i.e. [math]\sigma(Y)[/math] is independent of [math]\mathcal{G}[/math]. Then
Let [math]Z[/math] be a bounded and [math]\mathcal{G}[/math]-measurable r.v. and therefore [math]Y[/math] and [math]Z[/math] are independent. Hence we get
Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math]X[/math] and [math]Y[/math] be two r.v.'s on that space and let [math]\mathcal{G}\subset\F[/math] be a sub [math]\sigma[/math]-Algebra of [math]\F[/math]. Assume further that at least one of these two holds:
- [math]X,Y[/math] and [math]XY[/math] are in [math]L^1(\Omega,\F,\p)[/math] with [math]X[/math] being [math]\mathcal{G}[/math]-measurable.
- [math]X\geq 0[/math], [math]Y\geq 0[/math] with [math]X[/math] being [math]\mathcal{G}[/math]-mearuable.
Then
For [math](ii)[/math] assume first that [math]X,Y\leq 0[/math]. Let [math]Z[/math] be a positive and [math]\mathcal{G}[/math]-measurable r.v. Then we can obtain
Next we want to show that the classical limit theorems from measure theory also make sense in terms of the conditional expectation[i].
Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math](Y_n)_{n\geq 1}[/math] be a sequence of r.v.'s on that space and let [math]\mathcal{G}\subset\F[/math] be a sub [math]\sigma[/math]-Algebra of [math]\F[/math]. Then we have:
- (Monotone convergence) Assume that [math](Y_n)_{n\geq 1}[/math] is a sequence of positive r.v.'s for all [math]n[/math] such that [math]\lim_{n\to\infty}\uparrow Y_n=Y[/math] a.s. Then
[[math]] \lim_{n\to\infty}\E[Y_n\mid\mathcal{G}]=\E[Y\mid \mathcal{G}]. [[/math]]
- (Fatou) Assume that [math](Y_n)_{n\geq 1}[/math] is a sequence of positive r.v.'s for all [math]n[/math]. Then
[[math]] \E[\liminf_n Y_n\mid\mathcal{G}]=\liminf_n\E[Y_n\mid\mathcal{G}]. [[/math]]
- (Dominated convergence) Assume that [math]Y_n\xrightarrow{n\to\infty}Y[/math] a.s. and that there exists [math]Z\in L^1(\Omega,\F,\p)[/math] such that [math]\vert Y_n\vert\leq Z[/math] for all [math]n[/math]. Then
[[math]] \lim_{n\to\infty}\E[Y_n\mid \mathcal{G}]=\E[Y\mid\mathcal{G}]. [[/math]]
We will only prove [math](i)[/math], since [math](ii)[/math] and [math](iii)[/math] are proved in a similar way (it's a good exercise to do the proof). Since [math](Y_n)_{n\geq 1}[/math] is an increasing sequence, it follows that
Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math]\varphi:\R\to\R[/math] be a real, convex function. Let [math]X\in\L^1(\Omega,\F,\p)[/math] such that [math]\varphi(X)\in L^1(\Omega,\F,\p)[/math]. Then
Exercise.
Example
Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math]\varphi(X)=X^2[/math] and let [math]X\in L^2(\Omega,\F,\p)[/math]. Then
for all sub [math]\sigma[/math]-Algebras [math]\mathcal{G}\subset \F[/math].
Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math]X\in L^1(\Omega,\F,\p)[/math] be a positive r.v. on that space. Let [math]\mathcal{C}\subset\mathcal{G}\subset \F[/math] be a tower of sub [math]\sigma[/math]-Algebras of [math]\F[/math]. Then
Let [math]Z[/math] be a bounded and [math]\mathcal{C}[/math]-measurable r.v. Then we obtain
General references
Moshayedi, Nima (2020). "Lectures on Probability Theory". arXiv:2010.16280 [math.PR].
Notes
- One can look it up for more details in the stochastics I part.
- Use the previous facts for the proof of Baye's formula. One can also look it up in the stochastics I part.
- One also has to notice that if [math]A[/math] and [math]B[/math] are two independent events, then [math]\p[A\mid B]=\frac{\p[A\cap B]}{\p[B]}=\frac{\p[A]\p[B]}{\p[B]}=\p[A][/math]
- Recall that this means that [math]X[/math] is Poisson distributed: [math]\p[X=k]=e^{-\lambda}\frac{\lambda^k}{k!}[/math] for [math]k\in\N[/math]
- If [math]X\sim \B(p,n)[/math] then [math]\E[X]=pn[/math]. For further calculation, one can look it up in the stochastics I notes
- Since we can always consider linear combinations of [math]\one_A[/math] and then apply density theorems to it
- because for [math]Y\in L^2[/math] and [math]U\in L^2[/math] we get [math]Y\geq U\Longrightarrow Y-U\geq 0\Longrightarrow \E[Y\mid\mathcal{G}]\geq \E[U\mid\mathcal{G}][/math]
- Note that for any [math]W\in L^+[/math], the set [math]E[/math] on which [math]W=\infty[/math] is a null set. For suppose not, then [math]\E[W]\geq \E[\infty \one_E]=\infty\p[E][/math]. But since [math]\p[E] \gt 0[/math] this cannot happen
- Recall the classical limit theorems for integrals: [math]Monotone[/math] [math]convergence:[/math] Let [math](f_n)_{n\geq 1}[/math] be an increasing sequence of positive and measurable functions and let [math]f=\lim_{n\to\infty}\uparrow f_n[/math]. Then [math]\int fd\mu=\lim_{n\to\infty}f_nd\mu[/math]. [math]Fatou:[/math] Let [math](f_n)_{n\geq 1}[/math] be a sequence of measurable and positive functions. Then [math]\int\liminf_n f_n d\mu\leq \liminf_n \int f_nd\mu[/math]. [math]Dominated[/math] [math]convergence:[/math] Let [math](f_n)_{n\geq 1}[/math] be a sequence of integrable functions with [math]\vert f_n\vert\leq g[/math] for all [math]n[/math] with [math]g[/math] integrable. Denote [math]f=\lim_{n\to\infty}f_n[/math]. Then [math]\lim_{n\to\infty}\int f_nd\mu=\int fd\mu[/math]