The Conditional expectation
Conditional probability
Let [math](\Omega,\F,\p)[/math] be a probability space and let [math]A,B\in\F[/math] such that [math]\p[B] \gt 0[/math]. Then the conditional probability[a] of [math]A[/math] given [math]B[/math] is defined as
The important fact here is that the application [math]\F\to [0,1][/math], [math]A\mapsto \p[A\mid B][/math] defines a new probability measure on [math]\F[/math] called the conditional probability given [math]B[/math]. There are several facts, which we need to recall:
- If [math]A_1,...,A_n\in\F[/math] and if [math]\p\left[\bigcap_{k=1}^nA_k\right] \gt 0[/math], then
[[math]] \p\left[\bigcap_{k=1}^nA_k\right]=\prod_{j=1}^n\p\left[A_j\Big|\bigcap_{k=1}^{j-1}A_k\right]. [[/math]]
- Let [math](E_n)_{n\geq 1}[/math] be a measurable partition of [math]\Omega[/math], i.e. for all [math]n\geq 1[/math] we have that [math]E_n\in\F[/math] and for [math]n\not=m[/math] we get [math]E_n\cap E_m=\varnothing[/math] and [math]\bigcup_{n\geq 1}E_n=\Omega[/math]. Now for [math]A\in \F[/math] we get
[[math]] \p[A]=\sum_{n\geq 1}\p[A\mid E_n]\p[E_n]. [[/math]]
- (Baye's formula)[b] Let [math](E_n)_{n\geq 1}[/math] be a measurable partition of [math]\Omega[/math] and [math]A\in\F[/math] with [math]\p[A] \gt 0[/math]. Then
[[math]] \p[E_n\mid A]=\frac{\p[A\mid E_n]\p[E_n]}{\sum_{m\geq 1}\p[A\mid E_m]\p[E_m]}. [[/math]]
We can reformulate the definition of the conditional probability to obtain
Discrete construction of the conditional expectation
Let [math]X[/math] and [math]Y[/math] be two r.v.'s on a probability space [math](\Omega,\F,\p)[/math]. Let [math]Y[/math] take values in [math]\R[/math] and [math]X[/math] take values in a countable discrete set [math]\{x_1,x_2,...,x_n,...\}[/math]. The goal is to describe the expectation of the r.v. [math]Y[/math] by knowing the observed r.v. [math]X[/math]. For instance, let [math]X=x_j\in\{x_1,x_2,...,x_n,...\}[/math]. Therefore we look at a set [math]\{\omega\in\Omega\mid X(\omega)=x_j\}[/math] rather than looking at whole [math]\Omega[/math]. For [math]\Lambda\in\F[/math], we thus define
a new probability measure [math]\Q[/math], with [math]\p[X=x_j] \gt 0[/math]. Therefore it makes more sense to compute
rather than
Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math]X:\Omega\to\{x_1,x_2,...,x_n,...\}[/math] be a r.v. taking values in a discrete set and let [math]Y[/math] be a real valued r.v. on that space. If [math]\p[X=x_j] \gt 0[/math], we can define the conditional expectation of [math]Y[/math] given [math]\{X=x_j\}[/math] to be
Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math]X[/math] be a r.v. on that space with values in [math]\{x_1,x_2,...,x_n,...\}[/math] and let [math]Y[/math] also be a r.v. with values in [math]\{y_1,y_2,...,y_n,...\}[/math]. If [math]\p[X=x_j] \gt 0[/math], we can write the conditional expectation of [math]Y[/math] given [math]\{X=x_j\}[/math] as
Apply the definitions above to obtain
Now let again [math]X[/math] be a r.v. with values in [math]\{x_1,x_2,...,x_n,...\}[/math] and [math]Y[/math] a real valued r.v. The next step is to define [math]\E[Y\mid X][/math] as a function [math]f(X)[/math]. Therefore we introduce the function
Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math]X[/math] be a countably valued r.v. and let [math]Y[/math] be a real valued r.v. The conditional expectation of [math]Y[/math] given [math]X[/math] is defined by
The above definition does not define [math]\E[Y\mid X][/math] everywhere but rather almost everywhere, since on each set [math]\{X=x\}[/math], where [math]\p[X=x]=0[/math], its value is arbitrary.
Example
Let[d] [math]X\sim\Pi(\lambda)[/math]. Let us consider a tossing game, where we say that when [math]X=n[/math], we do [math]n[/math] independent tossing of a coin where each time one obtains 1 with probability [math]p\in[0,1][/math] and 0 with probability [math]1-p[/math]. Define also [math]S[/math] to be the r.v. giving the total number of 1 obtained in the game. Therefore, if [math]X=n[/math] is given, we get that [math]S[/math] is binomial distributed with parameters [math](p,n)[/math]. We want to compute
- [math]\E[S\mid X][/math]
- [math]\E[X\mid S][/math]
It is more natural to ask for the expectation of the amount of 1 obtained for the whole game by knowing how many games were played. The reverse is a bit more difficult. Logically, we may also notice that it definitely doesn't make sense to say [math]S\geq X[/math], because we can not obtain more wins in a game than the amount of games that were played.
- First we compute [math]\E[S\mid X=n][/math]: If [math]X=n[/math], we know that [math]S[/math] is binomial distributed with parameters [math](p,n)[/math] ([math]S\sim \B(p,n)[/math]) and therefore we already know[e]
[[math]] \E[S\mid X=n]=pn. [[/math]]Now we need to identify the function [math]f[/math] defined as in (2) by[[math]] \begin{align*} f:\N&\longrightarrow\R\\ n&\longmapsto pn. \end{align*} [[/math]]Therefore we get by definition[[math]] \E[S\mid X]=pX. [[/math]]
-
Next we want to compute [math]\E[X\mid S=k][/math]: For [math]n\geq k[/math] we have
[[math]] \p[X=n\mid S=k]=\frac{\p[S=k\mid X=n]\p[X=n]}{\p[S=k]}=\frac{\binom{n}{k} p^k(1-p)^{n-k}e^{-\lambda}\frac{\lambda^n}{n!}}{\sum_{m=k}^\infty\binom{m}{k}p^k(1-p)^{m-k}e^{-\lambda}\frac{\lambda^m}{m!}}, [[/math]]since [math]\{S=k\}=\bigsqcup_{m\geq k}\{S=k,X=m\}[/math]. By some algebra we obtain that[[math]] \frac{\binom{n}{k}p^k(1-p)^{n-k}e^{-\lambda}\frac{\lambda^n}{n!}}{\sum_{m=k}^\infty\binom{m}{k}p^k(1-p)^{m-k}e^{-\lambda}\frac{\lambda^m}{m!}}=\frac{(\lambda(1-p))^{n-k}e^{-\lambda(1-p)}}{(n-k)!} [[/math]]Hence we get that[[math]] \E[X\mid S=k]=\sum_{n\geq k}n\p[X=n\mid S=k]=k+\lambda(1-p). [[/math]]Therefore [math]\E[X\mid S]=S+\lambda(1-p)[/math].
Continuous construction of the conditional expectation
Now we want to define [math]\E[Y\mid X][/math], where [math]X[/math] is no longer assumed to be countably valued. Therefore we want to recall the following two facts:
Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math]X:(\Omega,\F,\p)\to (\R^n,\B(\R^n),\lambda)[/math] be a r.v. on that space. The [math]\sigma[/math]-Algebra generated by [math]X[/math] is given by
Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math]X:(\Omega,\F,\p)\to(\R^n,\B(\R^n),\lambda)[/math] be a r.v. on that space and let [math]Y[/math] be a real valued r.v. on that space. [math]Y[/math] is measurable with respect to [math]\sigma(X)[/math] if and only if there exists a Borel measurable function [math]f:\R^n\to\R[/math] such that
Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math]Y\in L^2(\Omega,\F,\p)[/math]. Then the conditional expectation of [math]Y[/math] given [math]X[/math] is the unique element [math]\hat Y\in L^2(\Omega,\sigma(X),\p)[/math] such that for all [math]Z\in L^2(\Omega,\sigma(X),\p)[/math]
[math]\hat Y[/math] is the orthogonal projection of [math]Y[/math] onto [math]L^2(\Omega,\sigma(X),\p)[/math].
Since [math]X[/math] takes values in [math]\R^n[/math], there exists a Borel measurable function [math]f:\R^n\to\R[/math] such that
Now let [math]\mathcal{G}\subset\F[/math] be a sub [math]\sigma[/math]-Algebra of [math]\F[/math] and consider the space [math]L^2(\Omega,\mathcal{G},\p)\subset L^2(\Omega,\F,\p)[/math]. It is clear that [math]L^2(\Omega,\mathcal{G},\p)[/math] is a Hilbert space and thus we can project to it.
Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math]Y\in L^2(\Omega,\F,\p)[/math] and let [math]\mathcal{G}\subset\F[/math] be a sub [math]\sigma[/math]-Algebra of [math]\F[/math]. Then the conditional expectation of [math]Y[/math] given [math]\mathcal{G}[/math] is defined as the unique element [math]\E[Y\mid \mathcal{G}]\in L^2(\Omega,\mathcal{G},\p)[/math] such that for all [math]Z\in L^2(\Omega,\mathcal{G},\p)[/math]
In (3) or (1), it is enough[f] to restrict the test r.v. [math]Z[/math] to the class of r.v.'s of the form
The conditional expectation is in [math]L^2[/math], so it's only defined a.s. and not everywhere in a unique way. So in particular, any statement like [math]\E[Y\mid\mathcal{G}]\geq0[/math] or [math]\E[Y\mid \mathcal{G}]=Z[/math] has to be understood with an implicit a.s.
Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math]Y\in L^2(\Omega,\F,\p)[/math] and let [math]\mathcal{G}\subset \F[/math] be a sub [math]\sigma[/math]-Algebra of [math]\F[/math].
- If [math]Y\geq 0[/math], then [math]\E[Y\mid \mathcal{G}]\geq 0[/math]
- [math]\E[\E[Y\mid\mathcal{G}]]=\E[Y][/math]
- The map [math]Y\mapsto\E[Y\mid\mathcal{G}][/math] is linear.
For [math](i)[/math] take [math]Z=\one_{\{\E[Y\mid\mathcal{G}] \lt 0\}}[/math] to obtain
Now we want to extend the definition of the conditional expectation to r.v.'s in [math]L^1(\Omega,\F,\p)[/math] or to [math]L^+(\Omega,\F,\p)[/math], which is the space of non negative r.v.'s allowing the value [math]\infty[/math].
Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math]Y\in L^+(\Omega,\F,\p)[/math] and let [math]\mathcal{G}\subset \F[/math] be a sub [math]\sigma[/math]-Algebra of [math]\F[/math]. Then there exists a unique element [math]\E[Y\mid \mathcal{G}]\in L^+(\Omega,\mathcal{G},\p)[/math] such that for all [math]X\in L^+(\Omega,\mathcal{G},\p)[/math]
If [math]Y\leq 0[/math] and [math]Y\in L^2(\Omega,\F,\p)[/math], then we define [math]\E[Y\mid\mathcal{G}][/math] as before. If [math]X\in L^+(\Omega,\mathcal{G},\p)[/math], we get that [math]X_n=X\land n[/math], is in [math]L^2(\Omega,\mathcal{G},\p)[/math] and is positive with [math]X_n\uparrow X[/math] for [math]n\to\infty[/math]. Using the monotone convergence theorem we get
Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math]Y\in L^1(\Omega,\F,\p)[/math] and let [math]\mathcal{G}\subset\F[/math] be a sub [math]\sigma[/math]-Algebra of [math]\F[/math]. Then there exists a unique element [math]\E[Y\mid \mathcal{G}]\in L^1(\Omega,\mathcal{G},\p)[/math] such that for every [math]X[/math] bounded and [math]\mathcal{G}[/math]-measurable
- If [math]Y\geq 0[/math], then [math]\E[Y\mid\mathcal{G}]\geq 0[/math]
- The map [math]Y\mapsto \E[Y\mid\mathcal{G}][/math] is linear.
We will only prove the existence, since the rest is exactly the same as before. Write [math]Y=Y^+-Y^-[/math] with [math]Y^+,Y^-\in L^1(\Omega,\F,\p)[/math] and [math]Y^+,Y^-\geq 0[/math]. So [math]\E[Y^+\mid\mathcal{G}][/math] and [math]\E[Y^-\mid\mathcal{G}][/math] are well defined. Now we set
Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math]X\in L^1(\Omega,\F,\p)[/math] be a r.v. on that space. Then
Take equation (4) and set [math]Z=\one_\Omega[/math].
Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math]X\in L^1(\Omega,\F,\p)[/math] be a r.v. on that space. Then
We can always write [math]X=X^+-X^-[/math] and also [math]\vert X\vert=X^++X^-[/math]. Therefore we get
Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math]Y\in L^1(\Omega,\F,\p)[/math] be a r.v. on that space and assume that [math]Y[/math] is independent of the sub [math]\sigma[/math]-Algebra [math]\mathcal{G}\subset\F[/math], i.e. [math]\sigma(Y)[/math] is independent of [math]\mathcal{G}[/math]. Then
Let [math]Z[/math] be a bounded and [math]\mathcal{G}[/math]-measurable r.v. and therefore [math]Y[/math] and [math]Z[/math] are independent. Hence we get
Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math]X[/math] and [math]Y[/math] be two r.v.'s on that space and let [math]\mathcal{G}\subset\F[/math] be a sub [math]\sigma[/math]-Algebra of [math]\F[/math]. Assume further that at least one of these two holds:
- [math]X,Y[/math] and [math]XY[/math] are in [math]L^1(\Omega,\F,\p)[/math] with [math]X[/math] being [math]\mathcal{G}[/math]-measurable.
- [math]X\geq 0[/math], [math]Y\geq 0[/math] with [math]X[/math] being [math]\mathcal{G}[/math]-mearuable.
Then
For [math](ii)[/math] assume first that [math]X,Y\leq 0[/math]. Let [math]Z[/math] be a positive and [math]\mathcal{G}[/math]-measurable r.v. Then we can obtain
Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math](Y_n)_{n\geq 1}[/math] be a sequence of r.v.'s on that space and let [math]\mathcal{G}\subset\F[/math] be a sub [math]\sigma[/math]-Algebra of [math]\F[/math]. Then we have:
- (Monotone convergence) Assume that [math](Y_n)_{n\geq 1}[/math] is a sequence of positive r.v.'s for all [math]n[/math] such that [math]\lim_{n\to\infty}\uparrow Y_n=Y[/math] a.s. Then
[[math]] \lim_{n\to\infty}\E[Y_n\mid\mathcal{G}]=\E[Y\mid \mathcal{G}]. [[/math]]
- (Fatou) Assume that [math](Y_n)_{n\geq 1}[/math] is a sequence of positive r.v.'s for all [math]n[/math]. Then
[[math]] \E[\liminf_n Y_n\mid\mathcal{G}]=\liminf_n\E[Y_n\mid\mathcal{G}]. [[/math]]
- (Dominated convergence) Assume that [math]Y_n\xrightarrow{n\to\infty}Y[/math] a.s. and that there exists [math]Z\in L^1(\Omega,\F,\p)[/math] such that [math]\vert Y_n\vert\leq Z[/math] for all [math]n[/math]. Then
[[math]] \lim_{n\to\infty}\E[Y_n\mid \mathcal{G}]=\E[Y\mid\mathcal{G}]. [[/math]]
We will only prove [math](i)[/math], since [math](ii)[/math] and [math](iii)[/math] are proved in a similar way (it's a good exercise to do the proof). Since [math](Y_n)_{n\geq 1}[/math] is an increasing sequence, it follows that
Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math]\varphi:\R\to\R[/math] be a real, convex function. Let [math]X \in L^1(\Omega,\F,\p)[/math] such that [math]\varphi(X)\in L^1(\Omega,\F,\p)[/math]. Then
Exercise.
Example
Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math]\varphi(X)=X^2[/math] and let [math]X\in L^2(\Omega,\F,\p)[/math]. Then
for all sub [math]\sigma[/math]-Algebras [math]\mathcal{G}\subset \F[/math].
Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math]X\in L^1(\Omega,\F,\p)[/math] be a positive r.v. on that space. Let [math]\mathcal{C}\subset\mathcal{G}\subset \F[/math] be a tower of sub [math]\sigma[/math]-Algebras of [math]\F[/math]. Then
Let [math]Z[/math] be a bounded and [math]\mathcal{C}[/math]-measurable r.v. Then we obtain
General references
Moshayedi, Nima (2020). "Lectures on Probability Theory". arXiv:2010.16280 [math.PR].
Notes
- One can look it up for more details in the stochastics I part.
- Use the previous facts for the proof of Baye's formula. One can also look it up in the stochastics I part.
- One also has to notice that if [math]A[/math] and [math]B[/math] are two independent events, then [math]\p[A\mid B]=\frac{\p[A\cap B]}{\p[B]}=\frac{\p[A]\p[B]}{\p[B]}=\p[A][/math]
- Recall that this means that [math]X[/math] is Poisson distributed: [math]\p[X=k]=e^{-\lambda}\frac{\lambda^k}{k!}[/math] for [math]k\in\N[/math]
- If [math]X\sim \B(p,n)[/math] then [math]\E[X]=pn[/math]. For further calculation, one can look it up in the stochastics I notes
- Since we can always consider linear combinations of [math]\one_A[/math] and then apply density theorems to it
- because for [math]Y\in L^2[/math] and [math]U\in L^2[/math] we get [math]Y\geq U\Longrightarrow Y-U\geq 0\Longrightarrow \E[Y\mid\mathcal{G}]\geq \E[U\mid\mathcal{G}][/math]
- Note that for any [math]W\in L^+[/math], the set [math]E[/math] on which [math]W=\infty[/math] is a null set. For suppose not, then [math]\E[W]\geq \E[\infty \one_E]=\infty\p[E][/math]. But since [math]\p[E] \gt 0[/math] this cannot happen
- Recall the classical limit theorems for integrals: [math]Monotone[/math] [math]convergence:[/math] Let [math](f_n)_{n\geq 1}[/math] be an increasing sequence of positive and measurable functions and let [math]f=\lim_{n\to\infty}\uparrow f_n[/math]. Then [math]\int fd\mu=\lim_{n\to\infty}f_nd\mu[/math]. [math]Fatou:[/math] Let [math](f_n)_{n\geq 1}[/math] be a sequence of measurable and positive functions. Then [math]\int\liminf_n f_n d\mu\leq \liminf_n \int f_nd\mu[/math]. [math]Dominated[/math] [math]convergence:[/math] Let [math](f_n)_{n\geq 1}[/math] be a sequence of integrable functions with [math]\vert f_n\vert\leq g[/math] for all [math]n[/math] with [math]g[/math] integrable. Denote [math]f=\lim_{n\to\infty}f_n[/math]. Then [math]\lim_{n\to\infty}\int f_nd\mu=\int fd\mu[/math]