guide:35e8e36b92: Difference between revisions

Latest revision as of 17:39, 8 May 2024

Conditional probability

Let [math](\Omega,\F,\p)[/math] be a probability space and let [math]A,B\in\F[/math] such that [math]\p[B] \gt 0[/math]. Then the conditional probability^[a] of [math]A[/math] given [math]B[/math] is defined as

[[math]] \p[A\mid B]=\frac{\p[A\cap B]}{\p[B]}. [[/math]]

The important fact here is that the application [math]\F\to [0,1][/math], [math]A\mapsto \p[A\mid B][/math] defines a new probability measure on [math]\F[/math] called the conditional probability given [math]B[/math]. There are several facts, which we need to recall:

If [math]A_1,...,A_n\in\F[/math] and if [math]\p\left[\bigcap_{k=1}^nA_k\right] \gt 0[/math], then
[[math]] \p\left[\bigcap_{k=1}^nA_k\right]=\prod_{j=1}^n\p\left[A_j\Big|\bigcap_{k=1}^{j-1}A_k\right]. [[/math]]
Let [math](E_n)_{n\geq 1}[/math] be a measurable partition of [math]\Omega[/math], i.e. for all [math]n\geq 1[/math] we have that [math]E_n\in\F[/math] and for [math]n\not=m[/math] we get [math]E_n\cap E_m=\varnothing[/math] and [math]\bigcup_{n\geq 1}E_n=\Omega[/math]. Now for [math]A\in \F[/math] we get
[[math]] \p[A]=\sum_{n\geq 1}\p[A\mid E_n]\p[E_n]. [[/math]]
(Baye's formula)^[b] Let [math](E_n)_{n\geq 1}[/math] be a measurable partition of [math]\Omega[/math] and [math]A\in\F[/math] with [math]\p[A] \gt 0[/math]. Then
[[math]] \p[E_n\mid A]=\frac{\p[A\mid E_n]\p[E_n]}{\sum_{m\geq 1}\p[A\mid E_m]\p[E_m]}. [[/math]]

We can reformulate the definition of the conditional probability to obtain

[[math]] \begin{align*} \p[A\mid B]\p[B]&=\p[A\cap B]\\ \p[B\mid A]\p[A]&=\p[A\cap B] \end{align*} [[/math]]

Therefore one can prove the statements (1) to (3) by using these two equations^[c].

Discrete construction of the conditional expectation

Let [math]X[/math] and [math]Y[/math] be two r.v.'s on a probability space [math](\Omega,\F,\p)[/math]. Let [math]Y[/math] take values in [math]\R[/math] and [math]X[/math] take values in a countable discrete set [math]\{x_1,x_2,...,x_n,...\}[/math]. The goal is to describe the expectation of the r.v. [math]Y[/math] by knowing the observed r.v. [math]X[/math]. For instance, let [math]X=x_j\in\{x_1,x_2,...,x_n,...\}[/math]. Therefore we look at a set [math]\{\omega\in\Omega\mid X(\omega)=x_j\}[/math] rather than looking at whole [math]\Omega[/math]. For [math]\Lambda\in\F[/math], we thus define

[[math]] \Q[\Lambda]=\p[\Lambda\mid \{X=x_j\}], [[/math]]

a new probability measure [math]\Q[/math], with [math]\p[X=x_j] \gt 0[/math]. Therefore it makes more sense to compute

[[math]] \E_\Q[Y]=\int_\Omega Y(\omega)d\Q(\omega)=\int_{\{\omega\in\Omega\mid X(\omega)=x_j\}}Y(\omega)d\p(\omega) [[/math]]

rather than

[[math]] \E_\p[Y]=\int_\Omega Y(\omega)d\p(\omega)=\int_\R yd\p_Y(y). [[/math]]

Definition (Conditional expectation ([math]X[/math] discrete, [math]Y[/math] real valued, single value case))

Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math]X:\Omega\to\{x_1,x_2,...,x_n,...\}[/math] be a r.v. taking values in a discrete set and let [math]Y[/math] be a real valued r.v. on that space. If [math]\p[X=x_j] \gt 0[/math], we can define the conditional expectation of [math]Y[/math] given [math]\{X=x_j\}[/math] to be

[[math]] \E[Y\mid X=x_j]=\E_\Q[Y], [[/math]]

where [math]\Q[/math] is the probability measure on [math]\F[/math] defined by

[[math]] \Q[\Lambda]=\p[\Lambda\mid X=x_j], [[/math]]

for [math]\Lambda\in\F[/math], provided that [math]\E_\Q[\vert Y\vert] \lt \infty[/math].

Theorem (Conditional expectation ([math]X[/math] discrete, [math]Y[/math] discrete, single value case))

Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math]X[/math] be a r.v. on that space with values in [math]\{x_1,x_2,...,x_n,...\}[/math] and let [math]Y[/math] also be a r.v. with values in [math]\{y_1,y_2,...,y_n,...\}[/math]. If [math]\p[X=x_j] \gt 0[/math], we can write the conditional expectation of [math]Y[/math] given [math]\{X=x_j\}[/math] as

[[math]] \E[Y\mid X=x_j]=\sum_{k=1}^\infty y_k\p[Y=y_k\mid X=x_j]. [[/math]]

provided that the series is absolutely convergent.

Show Proof

Apply the definitions above to obtain

[[math]] \E[Y\mid X=x_j]=\E_\Q[Y]=\sum_{k=1}^\infty y_k\Q[Y=y_k]=\sum_{k=1}^\infty y_k\p[Y=y_k\mid X=x_j] [[/math]]

■

Now let again [math]X[/math] be a r.v. with values in [math]\{x_1,x_2,...,x_n,...\}[/math] and [math]Y[/math] a real valued r.v. The next step is to define [math]\E[Y\mid X][/math] as a function [math]f(X)[/math]. Therefore we introduce the function

[[math]] \begin{equation} f:\{x_1,x_2,...,x_n,...\}\to \Rf(x)=\begin{cases}\E[Y\mid X=x],&\p[X=x] \gt 0\\ \text{any value in $\R$},&\p[X=x]=0\end{cases} \end{equation} [[/math]]

It doesn't matter which value we assign to [math]f[/math] for [math]\p[X=x]=0[/math], since it doesn't affect the expectation because it's defined on a null set. For convention we want to assign to it the value 0.

Definition (Conditional expectation ([math]X[/math] discrete, [math]Y[/math] real valued, complete case))

Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math]X[/math] be a countably valued r.v. and let [math]Y[/math] be a real valued r.v. The conditional expectation of [math]Y[/math] given [math]X[/math] is defined by

[[math]] \E[Y\mid X]=f(X), [[/math]]

with [math]f[/math] as in (2), provided that for all [math]j[/math]: if [math]\Q_j[\Lambda]=\p[\Lambda\mid X=x_j][/math], with [math]\p[X=x_j] \gt 0[/math], we get [math]\E_{\Q_j}[\vert Y\vert] \lt \infty[/math].

The above definition does not define [math]\E[Y\mid X][/math] everywhere but rather almost everywhere, since on each set [math]\{X=x\}[/math], where [math]\p[X=x]=0[/math], its value is arbitrary.

Example

Let^[d] [math]X\sim\Pi(\lambda)[/math]. Let us consider a tossing game, where we say that when [math]X=n[/math], we do [math]n[/math] independent tossing of a coin where each time one obtains 1 with probability [math]p\in[0,1][/math] and 0 with probability [math]1-p[/math]. Define also [math]S[/math] to be the r.v. giving the total number of 1 obtained in the game. Therefore, if [math]X=n[/math] is given, we get that [math]S[/math] is binomial distributed with parameters [math](p,n)[/math]. We want to compute

[math]\E[S\mid X][/math]
[math]\E[X\mid S][/math]

It is more natural to ask for the expectation of the amount of 1 obtained for the whole game by knowing how many games were played. The reverse is a bit more difficult. Logically, we may also notice that it definitely doesn't make sense to say [math]S\geq X[/math], because we can not obtain more wins in a game than the amount of games that were played.

First we compute [math]\E[S\mid X=n][/math]: If [math]X=n[/math], we know that [math]S[/math] is binomial distributed with parameters [math](p,n)[/math] ([math]S\sim \B(p,n)[/math]) and therefore we already know^[e]
[[math]] \E[S\mid X=n]=pn. [[/math]]
Now we need to identify the function [math]f[/math] defined as in (2) by
[[math]] \begin{align*} f:\N&\longrightarrow\R\\ n&\longmapsto pn. \end{align*} [[/math]]
Therefore we get by definition
[[math]] \E[S\mid X]=pX. [[/math]]
Next we want to compute [math]\E[X\mid S=k][/math]: For [math]n\geq k[/math] we have
[[math]] \p[X=n\mid S=k]=\frac{\p[S=k\mid X=n]\p[X=n]}{\p[S=k]}=\frac{\binom{n}{k} p^k(1-p)^{n-k}e^{-\lambda}\frac{\lambda^n}{n!}}{\sum_{m=k}^\infty\binom{m}{k}p^k(1-p)^{m-k}e^{-\lambda}\frac{\lambda^m}{m!}}, [[/math]]
since [math]\{S=k\}=\bigsqcup_{m\geq k}\{S=k,X=m\}[/math]. By some algebra we obtain that
[[math]] \frac{\binom{n}{k}p^k(1-p)^{n-k}e^{-\lambda}\frac{\lambda^n}{n!}}{\sum_{m=k}^\infty\binom{m}{k}p^k(1-p)^{m-k}e^{-\lambda}\frac{\lambda^m}{m!}}=\frac{(\lambda(1-p))^{n-k}e^{-\lambda(1-p)}}{(n-k)!} [[/math]]
Hence we get that
[[math]] \E[X\mid S=k]=\sum_{n\geq k}n\p[X=n\mid S=k]=k+\lambda(1-p). [[/math]]
Therefore [math]\E[X\mid S]=S+\lambda(1-p)[/math].

Continuous construction of the conditional expectation

Now we want to define [math]\E[Y\mid X][/math], where [math]X[/math] is no longer assumed to be countably valued. Therefore we want to recall the following two facts:

Definition ([math]\sigma[/math]-Algebra generated by a random variable)

Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math]X:(\Omega,\F,\p)\to (\R^n,\B(\R^n),\lambda)[/math] be a r.v. on that space. The [math]\sigma[/math]-Algebra generated by [math]X[/math] is given by

[[math]] \sigma(X)=X^{-1}(\B(\R^n))=\{A\in\Omega\mid A=X^{-1}(B),B\in\B(\R^n)\}. [[/math]]

Theorem

Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math]X:(\Omega,\F,\p)\to(\R^n,\B(\R^n),\lambda)[/math] be a r.v. on that space and let [math]Y[/math] be a real valued r.v. on that space. [math]Y[/math] is measurable with respect to [math]\sigma(X)[/math] if and only if there exists a Borel measurable function [math]f:\R^n\to\R[/math] such that

[[math]] Y=f(X). [[/math]]

We want to make use of the fact that for the Hilbert space [math]L^2(\Omega,\F,\p)[/math] we get that [math]L^2(\Omega,\sigma(X),\p)\subset L^2(\Omega,\F,\p)[/math] is a complete subspace, since [math]\sigma(X)\subset\F[/math]. This allows us to use the orthogonal projections and to interpret the conditional expectation as such a projection.

Definition (Conditional expectation (as a projection onto a closed subspace))

Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math]Y\in L^2(\Omega,\F,\p)[/math]. Then the conditional expectation of [math]Y[/math] given [math]X[/math] is the unique element [math]\hat Y\in L^2(\Omega,\sigma(X),\p)[/math] such that for all [math]Z\in L^2(\Omega,\sigma(X),\p)[/math]

[[math]] \begin{equation} \E[YZ]=\E[\hat Y Z]. \end{equation} [[/math]]

This result is due to the fact that if [math]Y-\hat Y\in L^2(\Omega,\F,\p)[/math] then for all [math]Z\in L^2(\Omega,\sigma(X),\p)[/math] we get [math]\langle Y-\hat Y,Z\rangle=0[/math]. We write [math]\E[Y\mid X][/math] for [math]\hat Y[/math].

[math]\hat Y[/math] is the orthogonal projection of [math]Y[/math] onto [math]L^2(\Omega,\sigma(X),\p)[/math].

Since [math]X[/math] takes values in [math]\R^n[/math], there exists a Borel measurable function [math]f:\R^n\to\R[/math] such that

[[math]] \E[Y\mid X]=f(X) [[/math]]

with [math]\E[f^2(X)] \lt \infty[/math]. We can also rewrite (3) as: for all Borel measurable [math]g:\R^n\to\R[/math], such that [math]\E[g^2(X)] \lt \infty[/math], we get

[[math]] \E[Yg(X)]=\E[f(X)g(X)]. [[/math]]

Now let [math]\mathcal{G}\subset\F[/math] be a sub [math]\sigma[/math]-Algebra of [math]\F[/math] and consider the space [math]L^2(\Omega,\mathcal{G},\p)\subset L^2(\Omega,\F,\p)[/math]. It is clear that [math]L^2(\Omega,\mathcal{G},\p)[/math] is a Hilbert space and thus we can project to it.

Definition (Conditional expectation (projection case))

Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math]Y\in L^2(\Omega,\F,\p)[/math] and let [math]\mathcal{G}\subset\F[/math] be a sub [math]\sigma[/math]-Algebra of [math]\F[/math]. Then the conditional expectation of [math]Y[/math] given [math]\mathcal{G}[/math] is defined as the unique element [math]\E[Y\mid \mathcal{G}]\in L^2(\Omega,\mathcal{G},\p)[/math] such that for all [math]Z\in L^2(\Omega,\mathcal{G},\p)[/math]

[[math]] \begin{equation} \label{4} \E[YZ]=\E[\E[Y\mid \mathcal{G}]Z]. \end{equation} [[/math]]

In (3) or (1), it is enough^[f] to restrict the test r.v. [math]Z[/math] to the class of r.v.'s of the form

[[math]] Z=\one_A,A\in\mathcal{G}. [[/math]]

The conditional expectation is in [math]L^2[/math], so it's only defined a.s. and not everywhere in a unique way. So in particular, any statement like [math]\E[Y\mid\mathcal{G}]\geq0[/math] or [math]\E[Y\mid \mathcal{G}]=Z[/math] has to be understood with an implicit a.s.

Theorem

Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math]Y\in L^2(\Omega,\F,\p)[/math] and let [math]\mathcal{G}\subset \F[/math] be a sub [math]\sigma[/math]-Algebra of [math]\F[/math].

If [math]Y\geq 0[/math], then [math]\E[Y\mid \mathcal{G}]\geq 0[/math]
[math]\E[\E[Y\mid\mathcal{G}]]=\E[Y][/math]
The map [math]Y\mapsto\E[Y\mid\mathcal{G}][/math] is linear.

Show Proof

For [math](i)[/math] take [math]Z=\one_{\{\E[Y\mid\mathcal{G}] \lt 0\}}[/math] to obtain

[[math]] \underbrace{\E[YZ]}_{\geq 0}=\underbrace{\E[\E[Y\mid \mathcal{G}]Z]}_{\leq 0}. [[/math]]

This implies that [math]\p[\E[Y\mid \mathcal{G}] \lt 0]=0[/math]. For [math](ii)[/math] take [math]Z=\one_{\Omega}[/math] and plug into (4). For [math](iii)[/math] notice that linearity comes from the orthogonal projection operator. But we can also do it directly by taking [math]Y,Y'\in L^2(\Omega,\F,\p)[/math], [math]\alpha,\beta\in \R[/math] and [math]Z\in L^2(\Omega,\mathcal{G},\p)[/math] to obtain

[[math]] \E[(\alpha Y+\beta Y')Z]=\E[YZ]+\beta\E[Y'Z]=\alpha\E[\E[Y\mid\mathcal{G}]Z]+\beta\E[\E[Y'\mid\mathcal{G}]Z]=\E[(\alpha\E[Y\mid\mathcal{G}]+\beta\E[Y'\mid\mathcal{G}])Z]. [[/math]]

Now we can conclude by using the uniqueness property that

[[math]] \E[\alpha Y+\beta Y'\mid \mathcal{G}]=\alpha\E[Y\mid \mathcal{G}]+\beta\E[Y'\mid \mathcal{G}]. [[/math]]

■

Now we want to extend the definition of the conditional expectation to r.v.'s in [math]L^1(\Omega,\F,\p)[/math] or to [math]L^+(\Omega,\F,\p)[/math], which is the space of non negative r.v.'s allowing the value [math]\infty[/math].

Lemma

Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math]Y\in L^+(\Omega,\F,\p)[/math] and let [math]\mathcal{G}\subset \F[/math] be a sub [math]\sigma[/math]-Algebra of [math]\F[/math]. Then there exists a unique element [math]\E[Y\mid \mathcal{G}]\in L^+(\Omega,\mathcal{G},\p)[/math] such that for all [math]X\in L^+(\Omega,\mathcal{G},\p)[/math]

[[math]] \begin{equation} \E[YX]=\E[\E[Y\mid \mathcal{G}]X] \end{equation} [[/math]]

and this conditional expectation agrees with the previous definition when [math]Y\in L^2(\Omega,\F,\p)[/math]. Moreover, if [math]0\leq Y\leq Y'[/math], then

[[math]] \E[Y\mid \mathcal{G}]\leq \E[Y'\mid \mathcal{G}]. [[/math]]

Show Proof

If [math]Y\leq 0[/math] and [math]Y\in L^2(\Omega,\F,\p)[/math], then we define [math]\E[Y\mid\mathcal{G}][/math] as before. If [math]X\in L^+(\Omega,\mathcal{G},\p)[/math], we get that [math]X_n=X\land n[/math], is in [math]L^2(\Omega,\mathcal{G},\p)[/math] and is positive with [math]X_n\uparrow X[/math] for [math]n\to\infty[/math]. Using the monotone convergence theorem we get

[[math]] \E[YX]=\E[Y\lim_{n\to\infty}X_n]=\lim_{n\to\infty}\E[YX_n]=\lim_{n\to\infty}\E[\E[Y\mid\mathcal{G}]X_n]=\E[\E[Y\mid\mathcal{G}]\lim_{n\to\infty}X]=\E[\E[Y\mid\mathcal{G}]X]. [[/math]]

This shows that (5) is true whenever [math]Y\in L^2(\Omega,\F,\p)[/math] with [math]Y\geq 0[/math] and [math]X\in L^+(\Omega,\mathcal{G},\p)[/math]. Now let [math]Y\in L^1(\Omega,\F,\p)[/math]. Define [math]Y_m=Y\land m[/math]. Hence we get [math]Y_m\in L^2(\Omega,\F,\p)[/math] and [math]Y_m\uparrow Y[/math] as [math]n\to\infty[/math]. Each [math]\E[Y_m\mid\mathcal{G}][/math] is well defined^[g] and positive and increasing. We define

[[math]] \E[Y\mid\mathcal{G}]=\lim_{n\to\infty}\E[Y_m\mid \mathcal{G}]. [[/math]]

Several applications of the monotone convergence theorem will give us for [math]X\in L^+(\Omega,\mathcal{G},\p)[/math]

[[math]] \E[YX]=\lim_{m\to\infty}\E[Y_mX]=\lim_{m\to\infty}\E[\E[Y_m\mid\mathcal{G}]X]=\E[\E[Y\mid \mathcal{G}]X]. [[/math]]

Furthermore if [math]0\leq Y\leq Y'[/math], then [math]Y\land m\leq Y'\land m[/math] and therefore

[[math]] \E[Y\mid\mathcal{G}]\leq \E[Y'\mid\mathcal{G}]. [[/math]]

Now we need to show uniqueness^[h]. Let [math]U[/math] and [math]V[/math] be two versions of [math]\E[Y\mid \mathcal{G}][/math]. Let

[[math]] \Lambda_n=\{U \lt V\leq n\}\in\mathcal{G} [[/math]]

and assume [math]\p[\Lambda_n] \gt 0[/math]. We then have

[[math]] \E[Y\one_{\Lambda_n}]=\underbrace{\E[U\one_{\Lambda_n}]=\E[V\one_{\Lambda_n}]}_{\E[(U-V)\one_{\Lambda_n}]=0}. [[/math]]

This contradicts the fact that [math]\p[\Lambda_n] \gt 0[/math]. Moreover, [math]\{U \lt V\}=\bigcup_{n\geq 1}\Lambda_n[/math] and therefore

[[math]] \p[U \lt V]=0 [[/math]]

and similarly [math]\p[V \lt U]=0[/math]. This implies

[[math]] \p[U=V]=1. [[/math]]

■

Theorem

Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math]Y\in L^1(\Omega,\F,\p)[/math] and let [math]\mathcal{G}\subset\F[/math] be a sub [math]\sigma[/math]-Algebra of [math]\F[/math]. Then there exists a unique element [math]\E[Y\mid \mathcal{G}]\in L^1(\Omega,\mathcal{G},\p)[/math] such that for every [math]X[/math] bounded and [math]\mathcal{G}[/math]-measurable

[[math]] \begin{equation} \E[YX]=\E[\E[Y\mid \mathcal{G}]X]. \end{equation} [[/math]]

This conditional expectation agrees with the definition for the [math]L^2[/math]. Moreover it satisfies:

If [math]Y\geq 0[/math], then [math]\E[Y\mid\mathcal{G}]\geq 0[/math]
The map [math]Y\mapsto \E[Y\mid\mathcal{G}][/math] is linear.

Show Proof

We will only prove the existence, since the rest is exactly the same as before. Write [math]Y=Y^+-Y^-[/math] with [math]Y^+,Y^-\in L^1(\Omega,\F,\p)[/math] and [math]Y^+,Y^-\geq 0[/math]. So [math]\E[Y^+\mid\mathcal{G}][/math] and [math]\E[Y^-\mid\mathcal{G}][/math] are well defined. Now we set

[[math]] \E[Y\mid\mathcal{G}]=\E[Y^+\mid \mathcal{G}]-\E[Y^-\mid\mathcal{G}]. [[/math]]

This is well defined because

[[math]] \E[\E[Y^\pm\mid\mathcal{G}]]=\E[Y^\pm] \lt \infty [[/math]]

if we let [math]X=\one_\Omega[/math] in the previous lemma and therefore [math]\E[Y^+\mid\mathcal{G}][/math] and [math]\E[Y^-\mid \mathcal{G}]\in L^1(\Omega,\mathcal{G},\p)[/math]. For all [math]X[/math] bounded and [math]\mathcal{G}[/math]-measurable we can also write [math]X=X^+-X^-[/math] and it follows from the previous lemma that

[[math]] \E[\E[Y^\pm\mid\mathcal{G}]X]=\E[Y^\pm X]. [[/math]]

This implies that [math]\E[Y\mid\mathcal{G}][/math] satisfies (6).

■

Corollary

Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math]X\in L^1(\Omega,\F,\p)[/math] be a r.v. on that space. Then

[[math]] \E[\E[X\mid\mathcal{G}]]=\E[X]. [[/math]]

Show Proof

Take equation (4) and set [math]Z=\one_\Omega[/math].

■

Corollary

Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math]X\in L^1(\Omega,\F,\p)[/math] be a r.v. on that space. Then

[[math]] \vert\E[X\mid\mathcal{G}]\vert\leq \E[\vert X\vert\mid\mathcal{G}]. [[/math]]

In particular

[[math]] \E[\vert\E[X\mid\mathcal{G}]\vert]\leq \E[\vert X\vert]. [[/math]]

Show Proof

We can always write [math]X=X^+-X^-[/math] and also [math]\vert X\vert=X^++X^-[/math]. Therefore we get

[[math]] \vert\E[X\mid\mathcal{G}]\vert=\vert\E[X^+\mid\mathcal{G}]-\E[X^-\mid\mathcal{G}]\vert\leq \E[X^+\mid\mathcal{G}]+\E[X^-\mid\mathcal{G}]=\E[X^++X^-\mid\mathcal{G}]=\E[\vert X\vert\mid\mathcal{G}]. [[/math]]

■

Proposition

Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math]Y\in L^1(\Omega,\F,\p)[/math] be a r.v. on that space and assume that [math]Y[/math] is independent of the sub [math]\sigma[/math]-Algebra [math]\mathcal{G}\subset\F[/math], i.e. [math]\sigma(Y)[/math] is independent of [math]\mathcal{G}[/math]. Then

[[math]] \E[Y\mid\mathcal{G}]=\E[Y]. [[/math]]

Show Proof

Let [math]Z[/math] be a bounded and [math]\mathcal{G}[/math]-measurable r.v. and therefore [math]Y[/math] and [math]Z[/math] are independent. Hence we get

[[math]] \E[YZ]=\E[Y]\E[Z]=\E[\E[Y]Z]. [[/math]]

This implies that, since [math]\E[Y][/math] is constant, that [math]\E[Y]\in L^1(\Omega,\mathcal{G},\p)[/math] and satisfies (4). Therefore by uniqueness we get that [math]\E[Y\mid\mathcal{G}]=\E[Y][/math].

■

Theorem

Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math]X[/math] and [math]Y[/math] be two r.v.'s on that space and let [math]\mathcal{G}\subset\F[/math] be a sub [math]\sigma[/math]-Algebra of [math]\F[/math]. Assume further that at least one of these two holds:

[math]X,Y[/math] and [math]XY[/math] are in [math]L^1(\Omega,\F,\p)[/math] with [math]X[/math] being [math]\mathcal{G}[/math]-measurable.
[math]X\geq 0[/math], [math]Y\geq 0[/math] with [math]X[/math] being [math]\mathcal{G}[/math]-mearuable.

Then

[[math]] \E[XY\mid\mathcal{G}]=\E[Y\mid\mathcal{G}]X. [[/math]]

In particular, if [math]X[/math] is a positive r.v. or in [math]L^1(\Omega,\mathcal{G},\p)[/math] and [math]\mathcal{G}[/math]-measurable, then

[[math]] \E[X\mid\mathcal{G}]=X. [[/math]]

Show Proof

For [math](ii)[/math] assume first that [math]X,Y\leq 0[/math]. Let [math]Z[/math] be a positive and [math]\mathcal{G}[/math]-measurable r.v. Then we can obtain

[[math]] \E[(XY)Z]=\E[Y(XZ)]=\E[\E[Y\mid\mathcal{G}]XZ]=\E[(\E[Y\mid\mathcal{G}]X)Z]. [[/math]]

Note that [math]\E[(\E[Y\mid\mathcal{G}]X)Z][/math] is a positive r.v. and [math]\mathcal{G}[/math]-measurable. Hence [math]\E[XY\mid\mathcal{G}]=X\E[Y\mid\mathcal{G}][/math]. For [math](i)[/math] we can write [math]X=X^++X^-[/math] and use [math](ii)[/math]. This is an easy exercise.

■

Next we want to show that the classical limit theorems from measure theory also make sense in terms of the conditional expectation^[i].

Theorem (Limit theorems for the conditional expectation)

Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math](Y_n)_{n\geq 1}[/math] be a sequence of r.v.'s on that space and let [math]\mathcal{G}\subset\F[/math] be a sub [math]\sigma[/math]-Algebra of [math]\F[/math]. Then we have:

(Monotone convergence) Assume that [math](Y_n)_{n\geq 1}[/math] is a sequence of positive r.v.'s for all [math]n[/math] such that [math]\lim_{n\to\infty}\uparrow Y_n=Y[/math] a.s. Then
[[math]] \lim_{n\to\infty}\E[Y_n\mid\mathcal{G}]=\E[Y\mid \mathcal{G}]. [[/math]]
(Fatou) Assume that [math](Y_n)_{n\geq 1}[/math] is a sequence of positive r.v.'s for all [math]n[/math]. Then
[[math]] \E[\liminf_n Y_n\mid\mathcal{G}]=\liminf_n\E[Y_n\mid\mathcal{G}]. [[/math]]
(Dominated convergence) Assume that [math]Y_n\xrightarrow{n\to\infty}Y[/math] a.s. and that there exists [math]Z\in L^1(\Omega,\F,\p)[/math] such that [math]\vert Y_n\vert\leq Z[/math] for all [math]n[/math]. Then
[[math]] \lim_{n\to\infty}\E[Y_n\mid \mathcal{G}]=\E[Y\mid\mathcal{G}]. [[/math]]

Show Proof

We will only prove [math](i)[/math], since [math](ii)[/math] and [math](iii)[/math] are proved in a similar way (it's a good exercise to do the proof). Since [math](Y_n)_{n\geq 1}[/math] is an increasing sequence, it follows that

[[math]] \E[Y_{n+1}\mid\mathcal{G}]\geq \E[Y_n\mid\mathcal{G}]. [[/math]]

Hence we can deduce that [math]\lim_{n\to\infty}\uparrow \E[Y_n\mid\mathcal{G}][/math] exists and we denote it by [math]Y'[/math]. Moreover, note that [math]Y'[/math] is [math]\mathcal{G}[/math]-measurable, since it is a limit of [math]\mathcal{G}[/math]-measurable r.v.'s. Let [math]X[/math] be a positive and [math]\mathcal{G}[/math]-measurable r.v. and obtain then

[[math]] \E[Y'X]=\E[\lim_{n\to\infty}\E[Y_n\mid\mathcal{G}]X]=\lim_{n\to\infty}\uparrow\E[\E[Y_n\mid\mathcal{G}]X]=\lim_{n\to\infty}\E[Y_n X]=\E[YX], [[/math]]

where we have used monotone convergence twice and equation (4). Therefore we get

[[math]] \lim_{n\to\infty}\E[Y_n\mid\mathcal{G}]=\E[Y\mid\mathcal{G}]. [[/math]]

■

Theorem (Jensen's inequality)

Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math]\varphi:\R\to\R[/math] be a real, convex function. Let [math]X \in L^1(\Omega,\F,\p)[/math] such that [math]\varphi(X)\in L^1(\Omega,\F,\p)[/math]. Then

[[math]] \varphi(\E[X\mid\mathcal{G}])\leq \E[\varphi(X)\mid\mathcal{G}] [[/math]]

for all sub [math]\sigma[/math]-Algebras [math]\mathcal{G}\subset\F[/math].

Show Proof

Exercise.

■

Example

Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math]\varphi(X)=X^2[/math] and let [math]X\in L^2(\Omega,\F,\p)[/math]. Then

[[math]] (\E[X\mid \mathcal{G}])^2\leq \E[X^2\mid\mathcal{G}] [[/math]]

for all sub [math]\sigma[/math]-Algebras [math]\mathcal{G}\subset \F[/math].

Theorem (Tower property)

Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math]X\in L^1(\Omega,\F,\p)[/math] be a positive r.v. on that space. Let [math]\mathcal{C}\subset\mathcal{G}\subset \F[/math] be a tower of sub [math]\sigma[/math]-Algebras of [math]\F[/math]. Then

[[math]] \E[\E[X\mid\mathcal{G}]\mathcal{C}]=\E[X\mid\mathcal{C}]. [[/math]]

Show Proof

Let [math]Z[/math] be a bounded and [math]\mathcal{C}[/math]-measurable r.v. Then we obtain

[[math]] \E[XZ]=\E[\E[X\mid\mathcal{C}]Z]. [[/math]]

But [math]Z[/math] is also [math]\mathcal{G}[/math]-measurable and hence we get

[[math]] \E[XZ]=\E[\E[X\mid\mathcal{G}]Z]. [[/math]]

Therefore, for all [math]Z[/math] bounded and [math]\mathcal{C}[/math]-measurable r.v.'s, we get

[[math]] \E[\E[X\mid\mathcal{G}]Z]=\E[\E[X\mid\mathcal{C}]Z] [[/math]]

and thus

[[math]] \E[\E[X\mid\mathcal{G}]\mathcal{C}]=\E[X\mid\mathcal{C}]. [[/math]]

■

General references

Moshayedi, Nima (2020). "Lectures on Probability Theory". arXiv:2010.16280 [math.PR].

Notes

One can look it up for more details in the stochastics I part.
Use the previous facts for the proof of Baye's formula. One can also look it up in the stochastics I part.
One also has to notice that if [math]A[/math] and [math]B[/math] are two independent events, then [math]\p[A\mid B]=\frac{\p[A\cap B]}{\p[B]}=\frac{\p[A]\p[B]}{\p[B]}=\p[A][/math]
Recall that this means that [math]X[/math] is Poisson distributed: [math]\p[X=k]=e^{-\lambda}\frac{\lambda^k}{k!}[/math] for [math]k\in\N[/math]
If [math]X\sim \B(p,n)[/math] then [math]\E[X]=pn[/math]. For further calculation, one can look it up in the stochastics I notes
Since we can always consider linear combinations of [math]\one_A[/math] and then apply density theorems to it
because for [math]Y\in L^2[/math] and [math]U\in L^2[/math] we get [math]Y\geq U\Longrightarrow Y-U\geq 0\Longrightarrow \E[Y\mid\mathcal{G}]\geq \E[U\mid\mathcal{G}][/math]
Note that for any [math]W\in L^+[/math], the set [math]E[/math] on which [math]W=\infty[/math] is a null set. For suppose not, then [math]\E[W]\geq \E[\infty \one_E]=\infty\p[E][/math]. But since [math]\p[E] \gt 0[/math] this cannot happen
Recall the classical limit theorems for integrals: [math]Monotone[/math] [math]convergence:[/math] Let [math](f_n)_{n\geq 1}[/math] be an increasing sequence of positive and measurable functions and let [math]f=\lim_{n\to\infty}\uparrow f_n[/math]. Then [math]\int fd\mu=\lim_{n\to\infty}f_nd\mu[/math]. [math]Fatou:[/math] Let [math](f_n)_{n\geq 1}[/math] be a sequence of measurable and positive functions. Then [math]\int\liminf_n f_n d\mu\leq \liminf_n \int f_nd\mu[/math]. [math]Dominated[/math] [math]convergence:[/math] Let [math](f_n)_{n\geq 1}[/math] be a sequence of integrable functions with [math]\vert f_n\vert\leq g[/math] for all [math]n[/math] with [math]g[/math] integrable. Denote [math]f=\lim_{n\to\infty}f_n[/math]. Then [math]\lim_{n\to\infty}\int f_nd\mu=\int fd\mu[/math]

[1] One can look it up for more details in the stochastics I part.

[2] Use the previous facts for the proof of Baye's formula. One can also look it up in the stochastics I part.

[3] One also has to notice that if [math]A[/math] and [math]B[/math] are two independent events, then [math]\p[A\mid B]=\frac{\p[A\cap B]}{\p[B]}=\frac{\p[A]\p[B]}{\p[B]}=\p[A][/math]

[4] Recall that this means that [math]X[/math] is Poisson distributed: [math]\p[X=k]=e^{-\lambda}\frac{\lambda^k}{k!}[/math] for [math]k\in\N[/math]

[5] If [math]X\sim \B(p,n)[/math] then [math]\E[X]=pn[/math]. For further calculation, one can look it up in the stochastics I notes

[6] Since we can always consider linear combinations of [math]\one_A[/math] and then apply density theorems to it

[7] use for [math]Y\in L^2[/math] and [math]U\in L^2[/math] we get [math]Y\geq U\Longrightarrow Y-U\geq 0\Longrightarrow \E[Y\mid\mathcal{G}]\geq \E[U\mid\mathcal{G}][/math]

[8] Note that for any [math]W\in L^+[/math], the set [math]E[/math] on which [math]W=\infty[/math] is a null set. For suppose not, then [math]\E[W]\geq \E[\infty \one_E]=\infty\p[E][/math]. But since [math]\p[E] \gt 0[/math] this cannot happen

[9] Recall the classical limit theorems for integrals: [math]Monotone[/math] [math]convergence:[/math] Let [math](f_n)_{n\geq 1}[/math] be an increasing sequence of positive and measurable functions and let [math]f=\lim_{n\to\infty}\uparrow f_n[/math]. Then [math]\int fd\mu=\lim_{n\to\infty}f_nd\mu[/math]. [math]Fatou:[/math] Let [math](f_n)_{n\geq 1}[/math] be a sequence of measurable and positive functions. Then [math]\int\liminf_n f_n d\mu\leq \liminf_n \int f_nd\mu[/math]. [math]Dominated[/math] [math]convergence:[/math] Let [math](f_n)_{n\geq 1}[/math] be a sequence of integrable functions with [math]\vert f_n\vert\leq g[/math] for all [math]n[/math] with [math]g[/math] integrable. Denote [math]f=\lim_{n\to\infty}f_n[/math]. Then [math]\lim_{n\to\infty}\int f_nd\mu=\int fd\mu[/math]

[a]

[b]

[c]

[d]

[e]

[f]

[g]

[h]

[i]

@@ Line 1: / Line 1: @@
+<div class="d-none"><math>
+\newcommand{\R}{\mathbb{R}}
+\newcommand{\A}{\mathcal{A}}
+\newcommand{\B}{\mathcal{B}}
+\newcommand{\N}{\mathbb{N}}
+\newcommand{\C}{\mathbb{C}}
+\newcommand{\Rbar}{\overline{\mathbb{R}}}
+\newcommand{\Bbar}{\overline{\mathcal{B}}}
+\newcommand{\Q}{\mathbb{Q}}
+\newcommand{\E}{\mathbb{E}}
+\newcommand{\p}{\mathbb{P}}
+\newcommand{\one}{\mathds{1}}
+\newcommand{\0}{\mathcal{O}}
+\newcommand{\mat}{\textnormal{Mat}}
+\newcommand{\sign}{\textnormal{sign}}
+\newcommand{\CP}{\mathcal{P}}
+\newcommand{\CT}{\mathcal{T}}
+\newcommand{\CY}{\mathcal{Y}}
+\newcommand{\F}{\mathcal{F}}
+\newcommand{\mathds}{\mathbb}</math></div>
+===Conditional probability===
+Let <math>(\Omega,\F,\p)</math> be a probability space and let <math>A,B\in\F</math> such that <math>\p[B] > 0</math>. Then the conditional probability{{efn|One can look it up for more details in the stochastics I part.}} of <math>A</math> given <math>B</math> is defined as
+<math display="block">
+\p[A\mid B]=\frac{\p[A\cap B]}{\p[B]}.
+</math>
+The important fact here is that the application <math>\F\to [0,1]</math>, <math>A\mapsto \p[A\mid B]</math> defines a new probability measure on <math>\F</math> called the conditional probability given <math>B</math>. There are several facts, which we need to recall:
+<ul style{{=}}"list-style-type:lower-roman"><li>If <math>A_1,...,A_n\in\F</math> and if <math>\p\left[\bigcap_{k=1}^nA_k\right] > 0</math>, then
+<math display="block">
+\p\left[\bigcap_{k=1}^nA_k\right]=\prod_{j=1}^n\p\left[A_j\Big|\bigcap_{k=1}^{j-1}A_k\right].
+</math>
+</li>
+<li>Let <math>(E_n)_{n\geq 1}</math> be a measurable partition of <math>\Omega</math>, i.e. for all <math>n\geq 1</math> we have that <math>E_n\in\F</math> and for <math>n\not=m</math> we get <math>E_n\cap E_m=\varnothing</math> and <math>\bigcup_{n\geq 1}E_n=\Omega</math>. Now for <math>A\in \F</math> we get
+<math display="block">
+\p[A]=\sum_{n\geq 1}\p[A\mid E_n]\p[E_n].
+</math>
+</li>
+<li>(Baye's formula){{efn|Use the previous facts for the proof of Baye's formula. One can also look it up in the stochastics I part.}} Let <math>(E_n)_{n\geq 1}</math> be a measurable partition of <math>\Omega</math> and <math>A\in\F</math> with <math>\p[A] > 0</math>. Then
+<math display="block">
+\p[E_n\mid A]=\frac{\p[A\mid E_n]\p[E_n]}{\sum_{m\geq 1}\p[A\mid E_m]\p[E_m]}.
+</math>
+</li>
+</ul>
+{{alert-info |
+We can reformulate the definition of the conditional probability to obtain
+<math display="block">
+\begin{align*}
+\p[A\mid B]\p[B]&=\p[A\cap B]\\
+\p[B\mid A]\p[A]&=\p[A\cap B]
+\end{align*}
+</math>
+Therefore one can prove the statements (1) to (3) by using these two equations{{efn|One also has to notice that if <math>A</math> and <math>B</math> are two independent events, then <math>\p[A\mid B]=\frac{\p[A\cap B]}{\p[B]}=\frac{\p[A]\p[B]}{\p[B]}=\p[A]</math>}}.
+}}
+===Discrete construction of the conditional expectation===
+Let <math>X</math> and <math>Y</math> be two r.v.'s on a probability space <math>(\Omega,\F,\p)</math>. Let <math>Y</math> take values in <math>\R</math> and <math>X</math> take values in a countable discrete set <math>\{x_1,x_2,...,x_n,...\}</math>. The goal is to describe the expectation of the r.v. <math>Y</math> by knowing the observed r.v. <math>X</math>. For instance, let <math>X=x_j\in\{x_1,x_2,...,x_n,...\}</math>. Therefore we look at a set <math>\{\omega\in\Omega\mid X(\omega)=x_j\}</math> rather than looking at whole <math>\Omega</math>. For <math>\Lambda\in\F</math>, we thus define
+<math display="block">
+\Q[\Lambda]=\p[\Lambda\mid \{X=x_j\}],
+</math>
+a new probability measure <math>\Q</math>, with <math>\p[X=x_j] > 0</math>. Therefore it makes more sense to compute
+<math display="block">
+\E_\Q[Y]=\int_\Omega Y(\omega)d\Q(\omega)=\int_{\{\omega\in\Omega\mid X(\omega)=x_j\}}Y(\omega)d\p(\omega)
+</math>
+rather than
+<math display="block">
+\E_\p[Y]=\int_\Omega Y(\omega)d\p(\omega)=\int_\R yd\p_Y(y).
+</math>
+{{definitioncard|Conditional expectation (<math>X</math> discrete, <math>Y</math> real valued, single value case)|Let <math>(\Omega,\F,\p)</math> be a probability space. Let <math>X:\Omega\to\{x_1,x_2,...,x_n,...\}</math> be a r.v. taking values in a discrete set and let <math>Y</math> be a real valued r.v. on that space. If <math>\p[X=x_j] > 0</math>, we can define the conditional expectation of <math>Y</math> given <math>\{X=x_j\}</math> to be
+<math display="block">
+\E[Y\mid X=x_j]=\E_\Q[Y],
+</math>
+where <math>\Q</math> is the probability measure on <math>\F</math> defined by
+<math display="block">
+\Q[\Lambda]=\p[\Lambda\mid X=x_j],
+</math>
+for <math>\Lambda\in\F</math>, provided that <math>\E_\Q[\vert Y\vert] < \infty</math>.}}
+{{proofcard|Theorem (Conditional expectation (<math>X</math> discrete, <math>Y</math> discrete, single value case))|thm-1|Let <math>(\Omega,\F,\p)</math> be a probability space. Let <math>X</math> be a r.v. on that space with values in <math>\{x_1,x_2,...,x_n,...\}</math> and let <math>Y</math> also be a r.v. with values in <math>\{y_1,y_2,...,y_n,...\}</math>. If <math>\p[X=x_j] > 0</math>, we can write the conditional expectation of <math>Y</math> given <math>\{X=x_j\}</math> as
+<math display="block">
+\E[Y\mid X=x_j]=\sum_{k=1}^\infty y_k\p[Y=y_k\mid X=x_j].
+</math>
+provided that the series is absolutely convergent.
+|Apply the definitions above to obtain
+<math display="block">
+\E[Y\mid X=x_j]=\E_\Q[Y]=\sum_{k=1}^\infty y_k\Q[Y=y_k]=\sum_{k=1}^\infty y_k\p[Y=y_k\mid X=x_j]
+</math>}}
+Now let again <math>X</math> be a r.v. with values in <math>\{x_1,x_2,...,x_n,...\}</math> and <math>Y</math> a real valued r.v. The next step is to define <math>\E[Y\mid X]</math> as a function <math>f(X)</math>. Therefore we introduce the function
+<math display="block">
+\begin{equation}
+f:\{x_1,x_2,...,x_n,...\}\to \Rf(x)=\begin{cases}\E[Y\mid X=x],&\p[X=x] > 0\\ \text{any value in $\R$},&\p[X=x]=0\end{cases}
+\end{equation}
+</math>
+{{alert-info |It doesn't matter which value we assign to <math>f</math> for <math>\p[X=x]=0</math>, since it doesn't affect the expectation because it's defined on a null set. For convention we want to assign to it the value 0.}}
+{{definitioncard|Conditional expectation (<math>X</math> discrete, <math>Y</math> real valued, complete case)|Let <math>(\Omega,\F,\p)</math> be a probability space. Let <math>X</math> be a countably valued r.v. and let <math>Y</math> be a real valued r.v. The conditional expectation of <math>Y</math> given <math>X</math> is defined by
+<math display="block">
+\E[Y\mid X]=f(X),
+</math>
+with <math>f</math> as in (2), provided that for all <math>j</math>: if <math>\Q_j[\Lambda]=\p[\Lambda\mid X=x_j]</math>, with <math>\p[X=x_j] > 0</math>, we get <math>\E_{\Q_j}[\vert Y\vert] < \infty</math>.}}
+{{alert-info |
+The above definition does not define <math>\E[Y\mid X]</math> everywhere but rather almost everywhere, since on each set <math>\{X=x\}</math>, where <math>\p[X=x]=0</math>, its value is arbitrary.
+}}
+'''Example'''
+Let{{efn|Recall that this means that <math>X</math> is Poisson distributed: <math>\p[X=k]=e^{-\lambda}\frac{\lambda^k}{k!}</math> for <math>k\in\N</math>}} <math>X\sim\Pi(\lambda)</math>. Let us consider a tossing game, where we say that when <math>X=n</math>, we do <math>n</math> independent tossing of a coin where each time one obtains 1 with probability <math>p\in[0,1]</math> and 0 with probability <math>1-p</math>. Define also <math>S</math> to be the r.v. giving the total number of 1 obtained in the game. Therefore, if <math>X=n</math> is given, we get that <math>S</math> is binomial distributed with parameters <math>(p,n)</math>. We want to compute
+<ul style{{=}}"list-style-type:lower-roman"><li><math>\E[S\mid X]</math>
+</li>
+<li><math>\E[X\mid S]</math>
+</li>
+</ul>
+{{alert-info |
+It is more natural to ask for the expectation of the amount of 1 obtained for the whole game by knowing how many games were played. The reverse is a bit more difficult. Logically, we may also notice that it definitely doesn't make sense to say <math>S\geq X</math>, because we can not obtain more wins in a game than the amount of games that were played.
+}}
+<ul style{{=}}"list-style-type:lower-roman"><li>First we compute <math>\E[S\mid X=n]</math>: If <math>X=n</math>, we know that <math>S</math> is binomial distributed with parameters <math>(p,n)</math> (<math>S\sim \B(p,n)</math>) and therefore we already know{{efn|If <math>X\sim \B(p,n)</math> then <math>\E[X]=pn</math>. For further calculation, one can look it up in the stochastics I notes}}
+<math display="block">
+\E[S\mid X=n]=pn.
+</math>
+Now we need to identify the function <math>f</math> defined as in (2) by
+<math display="block">
+\begin{align*}
+f:\N&\longrightarrow\R\\
+n&\longmapsto pn.
+\end{align*}
+</math>
+Therefore we get by definition
+<math display="block">
+\E[S\mid X]=pX.
+</math>
+</li>
+<li>
+Next we want to compute <math>\E[X\mid S=k]</math>: For <math>n\geq k</math> we have
+<math display="block">
+\p[X=n\mid S=k]=\frac{\p[S=k\mid X=n]\p[X=n]}{\p[S=k]}=\frac{\binom{n}{k} p^k(1-p)^{n-k}e^{-\lambda}\frac{\lambda^n}{n!}}{\sum_{m=k}^\infty\binom{m}{k}p^k(1-p)^{m-k}e^{-\lambda}\frac{\lambda^m}{m!}},
+</math>
+since <math>\{S=k\}=\bigsqcup_{m\geq k}\{S=k,X=m\}</math>. By some algebra we obtain that
+<math display="block">
+\frac{\binom{n}{k}p^k(1-p)^{n-k}e^{-\lambda}\frac{\lambda^n}{n!}}{\sum_{m=k}^\infty\binom{m}{k}p^k(1-p)^{m-k}e^{-\lambda}\frac{\lambda^m}{m!}}=\frac{(\lambda(1-p))^{n-k}e^{-\lambda(1-p)}}{(n-k)!}
+</math>
+Hence we get that
+<math display="block">
+\E[X\mid S=k]=\sum_{n\geq k}n\p[X=n\mid S=k]=k+\lambda(1-p).
+</math>
+Therefore <math>\E[X\mid S]=S+\lambda(1-p)</math>.
+</li>
+</ul>
+===Continuous construction of the conditional expectation===
+Now we want to define <math>\E[Y\mid X]</math>, where <math>X</math> is no longer assumed to be countably valued. Therefore we want to recall the following two facts:
+{{definitioncard|<math>\sigma</math>-Algebra generated by a random variable|Let <math>(\Omega,\F,\p)</math> be a probability space. Let <math>X:(\Omega,\F,\p)\to (\R^n,\B(\R^n),\lambda)</math> be a r.v. on that space. The <math>\sigma</math>-Algebra generated by <math>X</math> is given by
+<math display="block">
+\sigma(X)=X^{-1}(\B(\R^n))=\{A\in\Omega\mid A=X^{-1}(B),B\in\B(\R^n)\}.
+</math>}}
+{{proofcard|Theorem|thm-2|Let <math>(\Omega,\F,\p)</math> be a probability space. Let <math>X:(\Omega,\F,\p)\to(\R^n,\B(\R^n),\lambda)</math> be a r.v. on that space and let <math>Y</math> be a real valued r.v. on that space. <math>Y</math> is measurable with respect to <math>\sigma(X)</math> if and only if there exists a Borel measurable function <math>f:\R^n\to\R</math> such that
+<math display="block">
+Y=f(X).
+</math>|}}
+{{alert-info | We want to make use of the fact that for the Hilbert space <math>L^2(\Omega,\F,\p)</math> we get that <math>L^2(\Omega,\sigma(X),\p)\subset L^2(\Omega,\F,\p)</math> is a complete subspace, since <math>\sigma(X)\subset\F</math>. This allows us to use the orthogonal projections and to interpret the conditional expectation as such a projection.
+}}
+{{definitioncard|Conditional expectation (as a projection onto a closed subspace)|Let <math>(\Omega,\F,\p)</math> be a probability space. Let <math>Y\in L^2(\Omega,\F,\p)</math>. Then the conditional expectation of <math>Y</math> given <math>X</math> is the unique element <math>\hat Y\in L^2(\Omega,\sigma(X),\p)</math> such that for all <math>Z\in L^2(\Omega,\sigma(X),\p)</math>
+<math display="block">
+\begin{equation}
+\E[YZ]=\E[\hat Y Z].
+\end{equation}
+</math>
+This result is due to the fact that if <math>Y-\hat Y\in L^2(\Omega,\F,\p)</math> then for all <math>Z\in L^2(\Omega,\sigma(X),\p)</math> we get <math>\langle Y-\hat Y,Z\rangle=0</math>. We write <math>\E[Y\mid X]</math> for <math>\hat Y</math>.}}
+{{alert-info |
+<math>\hat Y</math> is the orthogonal projection of <math>Y</math> onto <math>L^2(\Omega,\sigma(X),\p)</math>.
+}}
+{{alert-info |
+Since <math>X</math> takes values in <math>\R^n</math>, there exists a Borel measurable function <math>f:\R^n\to\R</math> such that
+<math display="block">
+\E[Y\mid X]=f(X)
+</math>
+with <math>\E[f^2(X)] < \infty</math>. We can also rewrite (3) as: for all Borel measurable <math>g:\R^n\to\R</math>, such that <math>\E[g^2(X)] < \infty</math>, we get
+<math display="block">
+\E[Yg(X)]=\E[f(X)g(X)].
+</math>
+}}
+Now let <math>\mathcal{G}\subset\F</math> be a sub <math>\sigma</math>-Algebra of <math>\F</math> and consider the space <math>L^2(\Omega,\mathcal{G},\p)\subset L^2(\Omega,\F,\p)</math>. It is clear that <math>L^2(\Omega,\mathcal{G},\p)</math> is a Hilbert space and thus we can project to it.
+{{definitioncard|Conditional expectation (projection case)|Let <math>(\Omega,\F,\p)</math> be a probability space. Let <math>Y\in L^2(\Omega,\F,\p)</math> and let <math>\mathcal{G}\subset\F</math> be a sub <math>\sigma</math>-Algebra of <math>\F</math>. Then the conditional expectation of <math>Y</math> given <math>\mathcal{G}</math> is defined as the unique element <math>\E[Y\mid \mathcal{G}]\in L^2(\Omega,\mathcal{G},\p)</math> such that for all <math>Z\in L^2(\Omega,\mathcal{G},\p)</math>
+<math display="block">
+\begin{equation}
+\label{4}
+\E[YZ]=\E[\E[Y\mid \mathcal{G}]Z].
+\end{equation}
+</math>
+}}
+{{alert-info |
+In (3) or (1), it is enough{{efn|Since we can always consider linear combinations of <math>\one_A</math> and then apply density theorems to it}} to restrict the test r.v. <math>Z</math> to the class of r.v.'s of the form
+<math display="block">
+Z=\one_A,A\in\mathcal{G}.
+</math>
+}}
+{{alert-info |
+The conditional expectation is in <math>L^2</math>, so it's only defined a.s. and not everywhere in a unique way. So in particular, any statement like <math>\E[Y\mid\mathcal{G}]\geq0</math> or <math>\E[Y\mid \mathcal{G}]=Z</math> has to be understood with an implicit a.s.
+}}
+{{proofcard|Theorem|thm-3|Let <math>(\Omega,\F,\p)</math> be a probability space. Let <math>Y\in L^2(\Omega,\F,\p)</math> and let <math>\mathcal{G}\subset \F</math> be a sub <math>\sigma</math>-Algebra of <math>\F</math>.
+<ul style{{=}}"list-style-type:lower-roman"><li>If <math>Y\geq 0</math>, then <math>\E[Y\mid \mathcal{G}]\geq 0</math>
+</li>
+<li><math>\E[\E[Y\mid\mathcal{G}]]=\E[Y]</math>
+</li>
+<li>The map <math>Y\mapsto\E[Y\mid\mathcal{G}]</math> is linear.
+</li>
+</ul>
+|For <math>(i)</math> take <math>Z=\one_{\{\E[Y\mid\mathcal{G}] < 0\}}</math> to obtain
+<math display="block">
+\underbrace{\E[YZ]}_{\geq 0}=\underbrace{\E[\E[Y\mid \mathcal{G}]Z]}_{\leq  0}.
+</math>
+This implies that <math>\p[\E[Y\mid \mathcal{G}] < 0]=0</math>. For <math>(ii)</math> take <math>Z=\one_{\Omega}</math> and plug into (4). For <math>(iii)</math> notice that linearity comes from the orthogonal projection operator. But we can also do it directly by taking <math>Y,Y'\in L^2(\Omega,\F,\p)</math>, <math>\alpha,\beta\in \R</math> and <math>Z\in L^2(\Omega,\mathcal{G},\p)</math> to obtain
+<math display="block">
+\E[(\alpha Y+\beta Y')Z]=\E[YZ]+\beta\E[Y'Z]=\alpha\E[\E[Y\mid\mathcal{G}]Z]+\beta\E[\E[Y'\mid\mathcal{G}]Z]=\E[(\alpha\E[Y\mid\mathcal{G}]+\beta\E[Y'\mid\mathcal{G}])Z].
+</math>
+Now we can conclude by using the uniqueness property that
+<math display="block">
+\E[\alpha Y+\beta Y'\mid \mathcal{G}]=\alpha\E[Y\mid \mathcal{G}]+\beta\E[Y'\mid \mathcal{G}].
+</math>}}
+Now we want to extend the definition of the conditional expectation to r.v.'s in <math>L^1(\Omega,\F,\p)</math> or to <math>L^+(\Omega,\F,\p)</math>, which is the space of non negative r.v.'s allowing the value <math>\infty</math>.
+{{proofcard|Lemma|lem-1|Let <math>(\Omega,\F,\p)</math> be a probability space. Let <math>Y\in L^+(\Omega,\F,\p)</math> and let <math>\mathcal{G}\subset \F</math> be a sub <math>\sigma</math>-Algebra of <math>\F</math>. Then there exists a unique element <math>\E[Y\mid \mathcal{G}]\in L^+(\Omega,\mathcal{G},\p)</math> such that for all <math>X\in L^+(\Omega,\mathcal{G},\p)</math>
+<math display="block">
+\begin{equation}
+\E[YX]=\E[\E[Y\mid \mathcal{G}]X]
+\end{equation}
+</math>
+and this conditional expectation agrees with the previous definition when <math>Y\in L^2(\Omega,\F,\p)</math>. Moreover, if <math>0\leq  Y\leq  Y'</math>, then
+<math display="block">
+\E[Y\mid \mathcal{G}]\leq \E[Y'\mid \mathcal{G}].
+</math>|If <math>Y\leq  0</math> and <math>Y\in L^2(\Omega,\F,\p)</math>, then we define <math>\E[Y\mid\mathcal{G}]</math> as before. If <math>X\in L^+(\Omega,\mathcal{G},\p)</math>, we get that <math>X_n=X\land n</math>, is in <math>L^2(\Omega,\mathcal{G},\p)</math> and is positive with <math>X_n\uparrow X</math> for <math>n\to\infty</math>. Using the monotone convergence theorem we get
+<math display="block">
+\E[YX]=\E[Y\lim_{n\to\infty}X_n]=\lim_{n\to\infty}\E[YX_n]=\lim_{n\to\infty}\E[\E[Y\mid\mathcal{G}]X_n]=\E[\E[Y\mid\mathcal{G}]\lim_{n\to\infty}X]=\E[\E[Y\mid\mathcal{G}]X].
+</math>
+This shows that (5) is true whenever <math>Y\in L^2(\Omega,\F,\p)</math> with <math>Y\geq 0</math> and <math>X\in L^+(\Omega,\mathcal{G},\p)</math>. Now let <math>Y\in L^1(\Omega,\F,\p)</math>. Define <math>Y_m=Y\land m</math>. Hence we get <math>Y_m\in L^2(\Omega,\F,\p)</math> and <math>Y_m\uparrow Y</math> as <math>n\to\infty</math>. Each <math>\E[Y_m\mid\mathcal{G}]</math> is well defined{{efn|because for <math>Y\in L^2</math> and <math>U\in L^2</math> we get <math>Y\geq U\Longrightarrow Y-U\geq 0\Longrightarrow \E[Y\mid\mathcal{G}]\geq \E[U\mid\mathcal{G}]</math>}} and positive and increasing. We define
+<math display="block">
+\E[Y\mid\mathcal{G}]=\lim_{n\to\infty}\E[Y_m\mid \mathcal{G}].
+</math>
+Several applications of the monotone convergence theorem will give us for <math>X\in L^+(\Omega,\mathcal{G},\p)</math>
+<math display="block">
+\E[YX]=\lim_{m\to\infty}\E[Y_mX]=\lim_{m\to\infty}\E[\E[Y_m\mid\mathcal{G}]X]=\E[\E[Y\mid \mathcal{G}]X].
+</math>
+Furthermore if <math>0\leq  Y\leq  Y'</math>, then <math>Y\land m\leq  Y'\land m</math> and therefore
+<math display="block">
+\E[Y\mid\mathcal{G}]\leq \E[Y'\mid\mathcal{G}].
+</math>
+Now we need to show uniqueness{{efn|Note that for any <math>W\in L^+</math>, the set <math>E</math> on which <math>W=\infty</math> is a null set. For suppose not, then <math>\E[W]\geq \E[\infty \one_E]=\infty\p[E]</math>. But since <math>\p[E] > 0</math> this cannot happen}}. Let <math>U</math> and <math>V</math> be two versions of <math>\E[Y\mid \mathcal{G}]</math>. Let
+<math display="block">
+\Lambda_n=\{U < V\leq  n\}\in\mathcal{G}
+</math>
+and assume <math>\p[\Lambda_n] > 0</math>. We then have
+<math display="block">
+\E[Y\one_{\Lambda_n}]=\underbrace{\E[U\one_{\Lambda_n}]=\E[V\one_{\Lambda_n}]}_{\E[(U-V)\one_{\Lambda_n}]=0}.
+</math>
+This contradicts the fact that <math>\p[\Lambda_n] > 0</math>. Moreover, <math>\{U < V\}=\bigcup_{n\geq 1}\Lambda_n</math> and therefore
+<math display="block">
+\p[U < V]=0
+</math>
+and similarly <math>\p[V < U]=0</math>. This implies
+<math display="block">
+\p[U=V]=1.
+</math>}}
+{{proofcard|Theorem|thm-4|Let <math>(\Omega,\F,\p)</math> be a probability space. Let <math>Y\in L^1(\Omega,\F,\p)</math> and let <math>\mathcal{G}\subset\F</math> be a sub <math>\sigma</math>-Algebra of <math>\F</math>. Then there exists a unique element <math>\E[Y\mid \mathcal{G}]\in L^1(\Omega,\mathcal{G},\p)</math> such that for every <math>X</math> bounded and <math>\mathcal{G}</math>-measurable
+<math display="block">
+\begin{equation}
+\E[YX]=\E[\E[Y\mid \mathcal{G}]X].
+\end{equation}
+</math>
+This conditional expectation agrees with the definition for the <math>L^2</math>. Moreover it satisfies:
+<ul style{{=}}"list-style-type:lower-roman"><li>If <math>Y\geq 0</math>, then <math>\E[Y\mid\mathcal{G}]\geq 0</math>
+</li>
+<li>The map <math>Y\mapsto \E[Y\mid\mathcal{G}]</math> is linear.
+</li>
+</ul>
+|We will only prove the existence, since the rest is exactly the same as before. Write <math>Y=Y^+-Y^-</math> with <math>Y^+,Y^-\in L^1(\Omega,\F,\p)</math> and <math>Y^+,Y^-\geq 0</math>. So <math>\E[Y^+\mid\mathcal{G}]</math> and <math>\E[Y^-\mid\mathcal{G}]</math> are well defined. Now we set
+<math display="block">
+\E[Y\mid\mathcal{G}]=\E[Y^+\mid \mathcal{G}]-\E[Y^-\mid\mathcal{G}].
+</math>
+This is well defined because
+<math display="block">
+\E[\E[Y^\pm\mid\mathcal{G}]]=\E[Y^\pm] < \infty
+</math>
+if we let <math>X=\one_\Omega</math> in the previous lemma and therefore <math>\E[Y^+\mid\mathcal{G}]</math> and <math>\E[Y^-\mid \mathcal{G}]\in L^1(\Omega,\mathcal{G},\p)</math>. For all <math>X</math> bounded and <math>\mathcal{G}</math>-measurable we can also write <math>X=X^+-X^-</math> and it follows from the previous lemma that
+<math display="block">
+\E[\E[Y^\pm\mid\mathcal{G}]X]=\E[Y^\pm X].
+</math>
+This implies that <math>\E[Y\mid\mathcal{G}]</math> satisfies (6).}}
+{{proofcard|Corollary|cor-1|Let <math>(\Omega,\F,\p)</math> be a probability space. Let <math>X\in L^1(\Omega,\F,\p)</math> be a r.v. on that space. Then
+<math display="block">
+\E[\E[X\mid\mathcal{G}]]=\E[X].
+</math>
+|Take equation (4) and set <math>Z=\one_\Omega</math>.}}
+{{proofcard|Corollary|cor-2|Let <math>(\Omega,\F,\p)</math> be a probability space. Let <math>X\in L^1(\Omega,\F,\p)</math> be a r.v. on that space. Then
+<math display="block">
+\vert\E[X\mid\mathcal{G}]\vert\leq \E[\vert X\vert\mid\mathcal{G}].
+</math>
+In particular
+<math display="block">
+\E[\vert\E[X\mid\mathcal{G}]\vert]\leq \E[\vert X\vert].
+</math>
+|We can always write <math>X=X^+-X^-</math> and also <math>\vert X\vert=X^++X^-</math>. Therefore we get
+<math display="block">
+\vert\E[X\mid\mathcal{G}]\vert=\vert\E[X^+\mid\mathcal{G}]-\E[X^-\mid\mathcal{G}]\vert\leq  \E[X^+\mid\mathcal{G}]+\E[X^-\mid\mathcal{G}]=\E[X^++X^-\mid\mathcal{G}]=\E[\vert X\vert\mid\mathcal{G}].
+</math>}}
+{{proofcard|Proposition|prop-1|Let <math>(\Omega,\F,\p)</math> be a probability space. Let <math>Y\in L^1(\Omega,\F,\p)</math> be a r.v. on that space and assume that <math>Y</math> is independent of the sub <math>\sigma</math>-Algebra <math>\mathcal{G}\subset\F</math>, i.e. <math>\sigma(Y)</math> is independent of <math>\mathcal{G}</math>. Then
+<math display="block">
+\E[Y\mid\mathcal{G}]=\E[Y].
+</math>
+|Let <math>Z</math> be a bounded and <math>\mathcal{G}</math>-measurable r.v. and therefore <math>Y</math> and <math>Z</math> are independent. Hence we get
+<math display="block">
+\E[YZ]=\E[Y]\E[Z]=\E[\E[Y]Z].
+</math>
+This implies that, since <math>\E[Y]</math> is constant, that <math>\E[Y]\in L^1(\Omega,\mathcal{G},\p)</math> and satisfies (4). Therefore by uniqueness we get that <math>\E[Y\mid\mathcal{G}]=\E[Y]</math>.}}
+{{proofcard|Theorem|thm-5|Let <math>(\Omega,\F,\p)</math> be a probability space. Let <math>X</math> and <math>Y</math> be two r.v.'s on that space and let <math>\mathcal{G}\subset\F</math> be a sub <math>\sigma</math>-Algebra of <math>\F</math>. Assume further that at least one of these two holds:
+<ul style{{=}}"list-style-type:lower-roman"><li><math>X,Y</math> and <math>XY</math> are in <math>L^1(\Omega,\F,\p)</math> with <math>X</math> being <math>\mathcal{G}</math>-measurable.
+</li>
+<li><math>X\geq 0</math>, <math>Y\geq 0</math> with <math>X</math> being <math>\mathcal{G}</math>-mearuable.
+</li>
+</ul>
+Then
+<math display="block">
+\E[XY\mid\mathcal{G}]=\E[Y\mid\mathcal{G}]X.
+</math>
+In particular, if <math>X</math> is a positive r.v. or in <math>L^1(\Omega,\mathcal{G},\p)</math> and <math>\mathcal{G}</math>-measurable, then
+<math display="block">
+\E[X\mid\mathcal{G}]=X.
+</math>|For <math>(ii)</math> assume first that <math>X,Y\leq  0</math>. Let <math>Z</math> be a positive and <math>\mathcal{G}</math>-measurable r.v. Then we can obtain
+<math display="block">
+\E[(XY)Z]=\E[Y(XZ)]=\E[\E[Y\mid\mathcal{G}]XZ]=\E[(\E[Y\mid\mathcal{G}]X)Z].
+</math>
+Note that <math>\E[(\E[Y\mid\mathcal{G}]X)Z]</math> is a positive r.v. and <math>\mathcal{G}</math>-measurable. Hence <math>\E[XY\mid\mathcal{G}]=X\E[Y\mid\mathcal{G}]</math>. For <math>(i)</math> we can write <math>X=X^++X^-</math> and use <math>(ii)</math>. This is an easy exercise.}}
+{{alert-info |Next we want to show that the classical limit theorems from measure theory also make sense in terms of the conditional expectation{{efn|Recall the classical limit theorems for integrals: <math>Monotone</math> <math>convergence:</math> Let <math>(f_n)_{n\geq 1}</math> be an  increasing sequence of positive and measurable functions and let <math>f=\lim_{n\to\infty}\uparrow f_n</math>. Then <math>\int fd\mu=\lim_{n\to\infty}f_nd\mu</math>. <math>Fatou:</math> Let <math>(f_n)_{n\geq 1}</math> be a sequence of measurable and positive functions. Then <math>\int\liminf_n f_n d\mu\leq  \liminf_n \int f_nd\mu</math>. <math>Dominated</math> <math>convergence:</math> Let <math>(f_n)_{n\geq 1}</math> be a sequence of integrable functions with <math>\vert f_n\vert\leq  g</math> for all <math>n</math> with <math>g</math> integrable. Denote <math>f=\lim_{n\to\infty}f_n</math>. Then <math>\lim_{n\to\infty}\int f_nd\mu=\int fd\mu</math>}}.
+}}
+{{proofcard|Theorem (Limit theorems for the conditional expectation)|thm-6|Let <math>(\Omega,\F,\p)</math> be a probability space. Let <math>(Y_n)_{n\geq 1}</math> be a sequence of r.v.'s on that space and let <math>\mathcal{G}\subset\F</math> be a sub <math>\sigma</math>-Algebra of <math>\F</math>. Then we have:
+<ul style{{=}}"list-style-type:lower-roman"><li>(''Monotone convergence'') Assume that <math>(Y_n)_{n\geq 1}</math> is a sequence of positive r.v.'s for all <math>n</math> such that <math>\lim_{n\to\infty}\uparrow Y_n=Y</math> a.s. Then
+<math display="block">
+\lim_{n\to\infty}\E[Y_n\mid\mathcal{G}]=\E[Y\mid \mathcal{G}].
+</math>
+</li>
+<li>(''Fatou'') Assume that <math>(Y_n)_{n\geq 1}</math> is a sequence of positive r.v.'s for all <math>n</math>. Then
+<math display="block">
+\E[\liminf_n Y_n\mid\mathcal{G}]=\liminf_n\E[Y_n\mid\mathcal{G}].
+</math>
+</li>
+<li>(''Dominated convergence'') Assume that <math>Y_n\xrightarrow{n\to\infty}Y</math> a.s. and that there exists <math>Z\in L^1(\Omega,\F,\p)</math> such that <math>\vert Y_n\vert\leq  Z</math> for all <math>n</math>. Then
+<math display="block">
+\lim_{n\to\infty}\E[Y_n\mid \mathcal{G}]=\E[Y\mid\mathcal{G}].
+</math>
+</li>
+</ul>
+|We will only prove <math>(i)</math>, since <math>(ii)</math> and <math>(iii)</math> are proved in a similar way (it's a good exercise to do the proof). Since <math>(Y_n)_{n\geq 1}</math> is an increasing sequence, it follows that
+<math display="block">
+\E[Y_{n+1}\mid\mathcal{G}]\geq \E[Y_n\mid\mathcal{G}].
+</math>
+Hence we can deduce that <math>\lim_{n\to\infty}\uparrow \E[Y_n\mid\mathcal{G}]</math> exists and we denote it by <math>Y'</math>. Moreover, note that <math>Y'</math> is <math>\mathcal{G}</math>-measurable, since it is a limit of <math>\mathcal{G}</math>-measurable r.v.'s. Let <math>X</math> be a positive and <math>\mathcal{G}</math>-measurable r.v. and obtain then
+<math display="block">
+\E[Y'X]=\E[\lim_{n\to\infty}\E[Y_n\mid\mathcal{G}]X]=\lim_{n\to\infty}\uparrow\E[\E[Y_n\mid\mathcal{G}]X]=\lim_{n\to\infty}\E[Y_n X]=\E[YX],
+</math>
+where we have used monotone convergence twice and equation (4). Therefore we get
+<math display="block">
+\lim_{n\to\infty}\E[Y_n\mid\mathcal{G}]=\E[Y\mid\mathcal{G}].
+</math>}}
+{{proofcard|Theorem (Jensen's inequality)|thm-7|Let <math>(\Omega,\F,\p)</math> be a probability space. Let <math>\varphi:\R\to\R</math> be a real, convex function. Let <math>X \in L^1(\Omega,\F,\p)</math> such that <math>\varphi(X)\in L^1(\Omega,\F,\p)</math>. Then
+<math display="block">
+\varphi(\E[X\mid\mathcal{G}])\leq  \E[\varphi(X)\mid\mathcal{G}]
+</math>
+for all sub <math>\sigma</math>-Algebras <math>\mathcal{G}\subset\F</math>.
+|Exercise.}}
+'''Example'''
+Let <math>(\Omega,\F,\p)</math> be a probability space. Let <math>\varphi(X)=X^2</math> and let <math>X\in L^2(\Omega,\F,\p)</math>. Then
+<math display="block">
+(\E[X\mid \mathcal{G}])^2\leq  \E[X^2\mid\mathcal{G}]
+</math>
+for all sub <math>\sigma</math>-Algebras <math>\mathcal{G}\subset \F</math>.
+{{proofcard|Theorem (Tower property)|thm-8|Let <math>(\Omega,\F,\p)</math> be a probability space. Let <math>X\in L^1(\Omega,\F,\p)</math> be a positive r.v. on that space. Let <math>\mathcal{C}\subset\mathcal{G}\subset \F</math> be a tower of sub <math>\sigma</math>-Algebras of <math>\F</math>. Then
+<math display="block">
+\E[\E[X\mid\mathcal{G}]\mathcal{C}]=\E[X\mid\mathcal{C}].
+</math>|Let <math>Z</math> be a bounded and <math>\mathcal{C}</math>-measurable r.v. Then we obtain
+<math display="block">
+\E[XZ]=\E[\E[X\mid\mathcal{C}]Z].
+</math>
+But <math>Z</math> is also <math>\mathcal{G}</math>-measurable and hence we get
+<math display="block">
+\E[XZ]=\E[\E[X\mid\mathcal{G}]Z].
+</math>
+Therefore, for all <math>Z</math> bounded and <math>\mathcal{C}</math>-measurable r.v.'s, we get
+<math display="block">
+\E[\E[X\mid\mathcal{G}]Z]=\E[\E[X\mid\mathcal{C}]Z]
+</math>
+and thus
+<math display="block">
+\E[\E[X\mid\mathcal{G}]\mathcal{C}]=\E[X\mid\mathcal{C}].
+</math>}}
+==General references==
+{{cite arXiv|last=Moshayedi|first=Nima|year=2020|title=Lectures on Probability Theory|eprint=2010.16280|class=math.PR}}
+==Notes==
+{{notelist}}