The Radon-Nikodym approach for the conditional expectation

Before stating the Radon-Nikodym theorem, we recall some definitions from measure theory. Let [math](\Omega,\B)[/math] be a measurable space. A measure [math]\nu[/math] is [math]absolutely[/math] [math]continuous[/math] with respect to another measure [math]\mu[/math], written [math]\nu\ll\mu[/math] if there exists some measurable [math]f\geq 0[/math] with [math]d\nu=fd\mu[/math], that is if there is a finite measurable [math]f\geq 0[/math] with

[[math]] \nu(B)=\int_B fd\mu [[/math]]

for all [math]B\in\B[/math]. Two measures [math]\mu[/math] and [math]\nu[/math] are [math]singular[/math] with respect to each other if there exists disjoint measurable sets [math]A_1,A_2\subset \Omega[/math] with [math]\Omega=A_1\sqcup A_2[/math] and with [math]\nu(A_1)=0=\mu(A_2)[/math]. Finally, recall that a measure [math]\mu[/math] is [math]\sigma[/math]-finite if there is a decomposition of [math]\Omega[/math] into measurable sets,

[[math]] \Omega=\bigsqcup_{i=1}^\infty A_i [[/math]]

with [math]\mu(A_i) \lt \infty[/math].

Theorem (Radon-Nikodym)

Let [math]\mu[/math] and [math]\nu[/math] be two [math]\sigma[/math]-finite measures on a measurable space [math](\Omega,\B)[/math]. Then [math]\nu[/math] can be decomposed as

[[math]] \nu=\nu_{abs}+\nu_{sing} [[/math]]

into the sum of two [math]\sigma[/math]-finite measure with [math]\nu_{abs}\ll\mu[/math] being absolutely continuous with respect to [math]\mu[/math], and with [math]\nu_{sing}[/math] and [math]\mu[/math] being singular to each other (which will be written [math]\nu_{sing}\perp\mu[/math]).

The theorem implies that there exists another, more practical way of checking whether a given [math]\sigma[/math]-finite measure [math]\nu[/math] is absolutely continuous with respect to another [math]\sigma[/math]-finite measure [math]\mu[/math]. If [math]\mu(N)=0[/math] implies that [math]\nu(N)=0[/math] for every measurable [math]N\subset \Omega[/math], then [math]\nu=\nu_{abs}[/math] is absolutely continuous. We also note that the density function [math]f[/math] with [math]fd\mu=d\nu[/math] is called the [math]Radon[/math]-[math]Nikodym[/math] [math]derivative[/math] and is often written [math]f=\frac{d\nu}{d\mu}[/math].

To prove this theorem, we need a theorem which gives us a nice relationship between a Hilbert space and its dual space. Actually we can identify a Hilbert space [math]\mathcal{H}[/math] with its dual space [math]\mathcal{H}^*[/math].

Lemma (Riesz-representation for Hilbert spaces)

For a Hilbert space [math]\mathcal{H}[/math], the map sending [math]h\in \mathcal{H}[/math] to [math]\phi(h)\in\mathcal{H}^*[/math] defined by

[[math]] \phi(h)(x)=\langle x,h\rangle [[/math]]

is a linear (resp. sesqui-linear in the complex case) isometric isomorphism between [math]\mathcal{H}[/math] and its dual space [math]\mathcal{H}^*[/math].

Show Proof

[Proof of Theorem] Suppose that [math]\mu[/math] and [math]\nu[/math] are both finite measures (the general case can be reduced to this case by using the assumption that [math]\mu[/math] and [math]\nu[/math] are both [math]\sigma[/math]-finite). We define a new measure [math]m=\mu+\nu[/math] and will work with the real Hilbert space [math]\mathcal{H}=L^2(\Omega,m)[/math]. On this Hilbert space we define a linear functional [math]\phi[/math] by

[[math]] \phi(g)=\int gd\nu [[/math]]

for [math]g\in\mathcal{H}[/math]. For [math]g[/math] a simple function on [math]\Omega[/math], this is clearly well-defined and satisfies

[[math]] \vert\phi(g)\vert=\left\vert\int gd\nu\right\vert\leq \int \vert g\vert d\nu\leq \int \vert g\vert dm\leq \| g\|_{\mathcal{H}}\|\one\|_{\mathcal{H}} [[/math]]

where we have used the fact that [math]m=\mu+\nu[/math], that [math]\mu[/math] is a positive measure and the Cauchy-Schwartz inequality on [math]\mathcal{H}[/math]. Since the simple functions are dense in [math]\mathcal{H}[/math], the functional extends to a functional on all of [math]\mathcal{H}[/math]. By the Riesz-representation for Hilbert spaces there is some [math]k\in\mathcal{H}[/math] such that

[[math]] \begin{equation} \int gd\nu=\phi(g)=\int gkdm. \end{equation} [[/math]]

We claim that [math]k[/math] takes values in [math][0,1][/math] almost surely with respect to [math]m[/math]. Indeed, for any [math]B\in\B[/math] we have

[[math]] 0\leq \nu(B)\leq m(B), [[/math]]

so (using [math]g=\one_B[/math]),

[[math]] 0\leq \int_B kdm\leq m(B). [[/math]]

Using the choices

[[math]] B=\{\omega\in\Omega\mid k(\omega) \lt 0\} [[/math]]

and

[[math]] B=\{\omega\in\Omega\mid k(\omega) \gt 1\} [[/math]]

implies the claim that [math]k[/math] takes [math]m[/math]-almost surely values in [math][0,1][/math]. Since [math]m=\mu+\nu[/math], we can reformulate (7) as

[[math]] \begin{equation} \int g(1-k)d\nu=\int gkd\mu. \end{equation} [[/math]]

This holds by construction for all simple functions [math]g[/math], and hence for all nonnegative measurable functions by monotone convergence. Now define [math]\nu_{sing}[/math] to be [math]\nu\mid_{A}[/math], where

[[math]] A=\{\omega\in\Omega\mid k(\omega)=1\}. [[/math]]

By definition, [math]\nu_{sing}(\Omega\setminus A)=0[/math] and by (8) applied with [math]g=\one_{A}[/math] we also have [math]\mu(A)=0[/math]. Therefore

[[math]] \nu_{sing}\perp\mu. [[/math]]

We also define

[[math]] \nu_{abs}=\nu\mid_{\Omega\setminus A}=\nu_{\{\omega\in\Omega\mid k(\omega) \lt 1\}} [[/math]]

so that [math]\nu=\nu_{sing}+\nu_{abs}[/math]. Define the function [math]f=\frac{k}{1-k}\geq 0[/math] on [math]\Omega\setminus A[/math] and let [math]g\geq 0[/math] be measurable. Then by (8) we have

[[math]] \int_{\Omega\setminus A}gf d\mu=\int_{\Omega\setminus A}\frac{g}{1-k}kd\mu=\int_{\Omega\setminus A}\frac{g}{1-k}(1-k)d\nu=\int_{\Omega\setminus A}gd\nu_{abs}, [[/math]]

which shows that [math]d\nu_{abs}=fd\mu[/math] and so [math]\nu_{abs}\ll \mu[/math].

■

Theorem

Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math]\mathcal{G}\subset \F[/math] be a sub [math]\sigma[/math]-Algebra of [math]\F[/math] and let [math]X\in L^1(\Omega,\F,\p)[/math] be a r.v. Then there exists a unique r.v. in [math]L^1(\Omega,\mathcal{G},\p)[/math], denoted by [math]\E[X\mid\mathcal{G}][/math], such that for all [math]B\in\mathcal{G}[/math]

[[math]] \E[X\one_B]=\E[\E[X\mid\mathcal{G}]\one_B]. [[/math]]

More generally, for every bounded and [math]\mathcal{G}[/math]-measurable r.v. [math]Z[/math] we get

[[math]] \E[XZ]=\E[\E[X\mid\mathcal{G}]Z] [[/math]]

and if [math]X\geq 0[/math], then [math]\E[X\mid\mathcal{G}]\geq 0[/math].

Show Proof

The uniqueness part was already done. To show existence, assume first that [math]X[/math] is positive. Define a new measure [math]\Q[/math] on [math](\Omega,\mathcal{G})[/math] by

[[math]] \Q[A]=\E[X\one_A]=\int_A X(\omega)d\p(\omega) [[/math]]

for all [math]A\in\mathcal{G}[/math]. Now consider the measure [math]\p[/math] restricted to [math]\mathcal{G}[/math]. Then we get that

[[math]] \Q\ll\p [[/math]]

on [math]\mathcal{G}[/math]. The Radon-Nikodym theorem implies that there exists a positive and [math]\mathcal{G}[/math]-measurable r.v. [math]\tilde X[/math] such that

[[math]] \Q[A]=\E[\tilde X\one_A] [[/math]]

for all [math]A\in\mathcal{G}[/math]. For [math]A\in\mathcal{G}[/math] we get that

[[math]] \E[X\one_A]=\E[\tilde X\one_A]. [[/math]]

Now taking [math]A=\Omega[/math], we get that [math]\E[X]=\E[\tilde X][/math]. Therefore we have that [math]\tilde X\in L^1(\Omega,\mathcal{G},\p)[/math] and hence we see that [math]\tilde X=\E[X\mid\mathcal{G}][/math] For the general case, we can just write [math]X=X^++X^-[/math] and do the same as before.

■

General references

Moshayedi, Nima (2020). "Lectures on Probability Theory". arXiv:2010.16280 [math.PR].