Transition Kernel and Conditional distribution

Definition (Transition Kernel)

Let [math](E,\mathcal{E})[/math] and [math](F,\F)[/math] be two measurable spaces. A transition kernel from [math]E[/math] to [math]F[/math] is a map

[[math]] \nu:E\times \F\to [0,1], [[/math]]

such that

[math]\nu(x,\cdot)[/math] is a probability measure on [math]\F[/math] for all [math]x\in E[/math].
[math]x\mapsto \nu(x,A)[/math] is [math]\mathcal{E}[/math]-measurable for all [math]A\in\F[/math].

Example

Let [math]\rho[/math] be a [math]\sigma[/math]-finite measure on [math]\F[/math] and let [math]f:E\times F\to\R_+[/math] be a map such that

[[math]] \int_F f(x,y)d\rho(y)=1. [[/math]]

Then

[[math]] \nu(x,A)=\int_A f(x,y)d\rho(y) [[/math]]

is a transition kernel. An example for [math]f[/math] would be

[[math]] f(x,y)=\frac{1}{\sigma\sqrt{2\pi}}\exp\left(-\frac{(x-y)^2}{2\sigma^2}\right). [[/math]]

Proposition

The following two hold.

Let [math]h[/math] be a nonnegative (or bounded) Borel function on a measurable space [math](F,\F)[/math]. Then
[[math]] \varphi(x)=\int_F h(y)\nu(x,dy) [[/math]]
is a nonnegative (or bounded) measurable function on a measurable space [math](E,\mathcal{E})[/math].
If [math]\rho[/math] is a probability measure on a measurable space [math](E,\mathcal{E})[/math], then
[[math]] \mu(A)=\int_E\nu(x,A)d\rho(x), [[/math]]
is a probability measure on a measurable space [math](F,\F)[/math] for all [math]A\in \F[/math].

Definition (Conditional Distribution)

Let [math]X[/math] and [math]Y[/math] be two r.v.'s with values in a measurable space [math](E,\mathcal{E})[/math]. The conditional distribution of [math]Y[/math] given [math]X[/math] is any transition kernel [math]\nu[/math] from [math]E[/math] to [math]F[/math] such that for all nonnegative (or bounded), measurable maps [math]h[/math] on a measurable space [math](F,\F)[/math] one has

[[math]] \E[h(Y)\mid X]=\int_F h(y)\nu(X,dy)a.s., [[/math]]

where the last equality should be understood as a map [math]\phi(X)[/math] given by

[[math]] \phi:x\mapsto \int_Fh(y)\nu(x,dy). [[/math]]

If [math]\nu[/math] is the conditional distribution of [math]Y[/math] given [math]X[/math], we get for all [math]A\in F[/math]

[[math]] \p[Y\in A\mid X]=\nu(X,A)a.s., [[/math]]

where we have set [math]h=\one_A[/math] in the definition. If [math]\nu'[/math] is another such conditional distribution, we get

[[math]] \nu(X,A)=\nu'(X,A)a.s. [[/math]]

This implies that

[[math]] \nu(x,A)=\nu'(x,A)d\p_X(x)a.s. [[/math]]

Theorem

Assume that [math](E,\mathcal{E})[/math] and [math](F,\F)[/math] are two complete, separable, metric, measurable spaces endowed with their Borel [math]\sigma[/math]-Algebras. Then the conditional distribution of [math]Y[/math] given [math]X[/math], exists and is a.s. unique.

Show Proof

No proof here.

■

General references

Moshayedi, Nima (2020). "Lectures on Probability Theory". arXiv:2010.16280 [math.PR].