The Radon-Nikodym approach for the conditional expectation

[math] \newcommand{\R}{\mathbb{R}} \newcommand{\A}{\mathcal{A}} \newcommand{\B}{\mathcal{B}} \newcommand{\N}{\mathbb{N}} \newcommand{\C}{\mathbb{C}} \newcommand{\Rbar}{\overline{\mathbb{R}}} \newcommand{\Bbar}{\overline{\mathcal{B}}} \newcommand{\Q}{\mathbb{Q}} \newcommand{\E}{\mathbb{E}} \newcommand{\p}{\mathbb{P}} \newcommand{\one}{\mathds{1}} \newcommand{\0}{\mathcal{O}} \newcommand{\mat}{\textnormal{Mat}} \newcommand{\sign}{\textnormal{sign}} \newcommand{\CP}{\mathcal{P}} \newcommand{\CT}{\mathcal{T}} \newcommand{\CY}{\mathcal{Y}} \newcommand{\F}{\mathcal{F}} \newcommand{\mathds}{\mathbb}[/math]

Before stating the Radon-Nikodym theorem, we recall some definitions from measure theory. Let [math](\Omega,\B)[/math] be a measurable space. A measure [math]\nu[/math] is [math]absolutely[/math] [math]continuous[/math] with respect to another measure [math]\mu[/math], written [math]\nu\ll\mu[/math] if there exists some measurable [math]f\geq 0[/math] with [math]d\nu=fd\mu[/math], that is if there is a finite measurable [math]f\geq 0[/math] with

[[math]] \nu(B)=\int_B fd\mu [[/math]]
for all [math]B\in\B[/math]. Two measures [math]\mu[/math] and [math]\nu[/math] are [math]singular[/math] with respect to each other if there exists disjoint measurable sets [math]A_1,A_2\subset \Omega[/math] with [math]\Omega=A_1\sqcup A_2[/math] and with [math]\nu(A_1)=0=\mu(A_2)[/math]. Finally, recall that a measure [math]\mu[/math] is [math]\sigma[/math]-finite if there is a decomposition of [math]\Omega[/math] into measurable sets,

[[math]] \Omega=\bigsqcup_{i=1}^\infty A_i [[/math]]
with [math]\mu(A_i) \lt \infty[/math].

Theorem (Radon-Nikodym)

Let [math]\mu[/math] and [math]\nu[/math] be two [math]\sigma[/math]-finite measures on a measurable space [math](\Omega,\B)[/math]. Then [math]\nu[/math] can be decomposed as

[[math]] \nu=\nu_{abs}+\nu_{sing} [[/math]]
into the sum of two [math]\sigma[/math]-finite measure with [math]\nu_{abs}\ll\mu[/math] being absolutely continuous with respect to [math]\mu[/math], and with [math]\nu_{sing}[/math] and [math]\mu[/math] being singular to each other (which will be written [math]\nu_{sing}\perp\mu[/math]).

The theorem implies that there exists another, more practical way of checking whether a given [math]\sigma[/math]-finite measure [math]\nu[/math] is absolutely continuous with respect to another [math]\sigma[/math]-finite measure [math]\mu[/math]. If [math]\mu(N)=0[/math] implies that [math]\nu(N)=0[/math] for every measurable [math]N\subset \Omega[/math], then [math]\nu=\nu_{abs}[/math] is absolutely continuous. We also note that the density function [math]f[/math] with [math]fd\mu=d\nu[/math] is called the [math]Radon[/math]-[math]Nikodym[/math] [math]derivative[/math] and is often written [math]f=\frac{d\nu}{d\mu}[/math].

To prove this theorem, we need a theorem which gives us a nice relationship between a Hilbert space and its dual space. Actually we can identify a Hilbert space [math]\mathcal{H}[/math] with its dual space [math]\mathcal{H}^*[/math].

Lemma (Riesz-representation for Hilbert spaces)

For a Hilbert space [math]\mathcal{H}[/math], the map sending [math]h\in \mathcal{H}[/math] to [math]\phi(h)\in\mathcal{H}^*[/math] defined by

[[math]] \phi(h)(x)=\langle x,h\rangle [[/math]]
is a linear (resp. sesqui-linear in the complex case) isometric isomorphism between [math]\mathcal{H}[/math] and its dual space [math]\mathcal{H}^*[/math].


Show Proof

[Proof of Theorem] Suppose that [math]\mu[/math] and [math]\nu[/math] are both finite measures (the general case can be reduced to this case by using the assumption that [math]\mu[/math] and [math]\nu[/math] are both [math]\sigma[/math]-finite). We define a new measure [math]m=\mu+\nu[/math] and will work with the real Hilbert space [math]\mathcal{H}=L^2(\Omega,m)[/math]. On this Hilbert space we define a linear functional [math]\phi[/math] by

[[math]] \phi(g)=\int gd\nu [[/math]]
for [math]g\in\mathcal{H}[/math]. For [math]g[/math] a simple function on [math]\Omega[/math], this is clearly well-defined and satisfies

[[math]] \vert\phi(g)\vert=\left\vert\int gd\nu\right\vert\leq \int \vert g\vert d\nu\leq \int \vert g\vert dm\leq \| g\|_{\mathcal{H}}\|\one\|_{\mathcal{H}} [[/math]]
where we have used the fact that [math]m=\mu+\nu[/math], that [math]\mu[/math] is a positive measure and the Cauchy-Schwartz inequality on [math]\mathcal{H}[/math]. Since the simple functions are dense in [math]\mathcal{H}[/math], the functional extends to a functional on all of [math]\mathcal{H}[/math]. By the Riesz-representation for Hilbert spaces there is some [math]k\in\mathcal{H}[/math] such that

[[math]] \begin{equation} \int gd\nu=\phi(g)=\int gkdm. \end{equation} [[/math]]


We claim that [math]k[/math] takes values in [math][0,1][/math] almost surely with respect to [math]m[/math]. Indeed, for any [math]B\in\B[/math] we have

[[math]] 0\leq \nu(B)\leq m(B), [[/math]]
so (using [math]g=\one_B[/math]),

[[math]] 0\leq \int_B kdm\leq m(B). [[/math]]
Using the choices

[[math]] B=\{\omega\in\Omega\mid k(\omega) \lt 0\} [[/math]]
and

[[math]] B=\{\omega\in\Omega\mid k(\omega) \gt 1\} [[/math]]
implies the claim that [math]k[/math] takes [math]m[/math]-almost surely values in [math][0,1][/math]. Since [math]m=\mu+\nu[/math], we can reformulate (7) as

[[math]] \begin{equation} \int g(1-k)d\nu=\int gkd\mu. \end{equation} [[/math]]


This holds by construction for all simple functions [math]g[/math], and hence for all nonnegative measurable functions by monotone convergence. Now define [math]\nu_{sing}[/math] to be [math]\nu\mid_{A}[/math], where

[[math]] A=\{\omega\in\Omega\mid k(\omega)=1\}. [[/math]]
By definition, [math]\nu_{sing}(\Omega\setminus A)=0[/math] and by (8) applied with [math]g=\one_{A}[/math] we also have [math]\mu(A)=0[/math]. Therefore

[[math]] \nu_{sing}\perp\mu. [[/math]]
We also define

[[math]] \nu_{abs}=\nu\mid_{\Omega\setminus A}=\nu_{\{\omega\in\Omega\mid k(\omega) \lt 1\}} [[/math]]
so that [math]\nu=\nu_{sing}+\nu_{abs}[/math]. Define the function [math]f=\frac{k}{1-k}\geq 0[/math] on [math]\Omega\setminus A[/math] and let [math]g\geq 0[/math] be measurable. Then by (8) we have

[[math]] \int_{\Omega\setminus A}gf d\mu=\int_{\Omega\setminus A}\frac{g}{1-k}kd\mu=\int_{\Omega\setminus A}\frac{g}{1-k}(1-k)d\nu=\int_{\Omega\setminus A}gd\nu_{abs}, [[/math]]
which shows that [math]d\nu_{abs}=fd\mu[/math] and so [math]\nu_{abs}\ll \mu[/math].

Theorem

Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math]\mathcal{G}\subset \F[/math] be a sub [math]\sigma[/math]-Algebra of [math]\F[/math] and let [math]X\in L^1(\Omega,\F,\p)[/math] be a r.v. Then there exists a unique r.v. in [math]L^1(\Omega,\mathcal{G},\p)[/math], denoted by [math]\E[X\mid\mathcal{G}][/math], such that for all [math]B\in\mathcal{G}[/math]

[[math]] \E[X\one_B]=\E[\E[X\mid\mathcal{G}]\one_B]. [[/math]]
More generally, for every bounded and [math]\mathcal{G}[/math]-measurable r.v. [math]Z[/math] we get

[[math]] \E[XZ]=\E[\E[X\mid\mathcal{G}]Z] [[/math]]
and if [math]X\geq 0[/math], then [math]\E[X\mid\mathcal{G}]\geq 0[/math].


Show Proof

The uniqueness part was already done. To show existence, assume first that [math]X[/math] is positive. Define a new measure [math]\Q[/math] on [math](\Omega,\mathcal{G})[/math] by

[[math]] \Q[A]=\E[X\one_A]=\int_A X(\omega)d\p(\omega) [[/math]]
for all [math]A\in\mathcal{G}[/math]. Now consider the measure [math]\p[/math] restricted to [math]\mathcal{G}[/math]. Then we get that

[[math]] \Q\ll\p [[/math]]
on [math]\mathcal{G}[/math]. The Radon-Nikodym theorem implies that there exists a positive and [math]\mathcal{G}[/math]-measurable r.v. [math]\tilde X[/math] such that

[[math]] \Q[A]=\E[\tilde X\one_A] [[/math]]
for all [math]A\in\mathcal{G}[/math]. For [math]A\in\mathcal{G}[/math] we get that

[[math]] \E[X\one_A]=\E[\tilde X\one_A]. [[/math]]
Now taking [math]A=\Omega[/math], we get that [math]\E[X]=\E[\tilde X][/math]. Therefore we have that [math]\tilde X\in L^1(\Omega,\mathcal{G},\p)[/math] and hence we see that [math]\tilde X=\E[X\mid\mathcal{G}][/math] For the general case, we can just write [math]X=X^++X^-[/math] and do the same as before.

General references

Moshayedi, Nima (2020). "Lectures on Probability Theory". arXiv:2010.16280 [math.PR].