General Definitions

Law of a Random Variable

Definition (Random Variable)

Let [math](\Omega,\A,\p)[/math] be a probability space. Let [math](E,\mathcal{E})[/math] be a measurable space. A measurable map [math]X:(\Omega,\A,\p)\to (E,\mathcal{E})[/math] is called a random variable (and is noted r.v.) with values in [math]E[/math].

Definition (Law/Distribution)

The law or distribution of a random variable is the image measure of [math]\p[/math] by [math]X[/math], and is usually noted [math]\p_X[/math]. It is hence a probability measure on [math](E,\mathcal{E})[/math].

[[math]] \p_X[B]=\p[X^{-1}(B)]=\p[X\in B]=\p\left[\{\omega\in\Omega\mid X(\omega)\in B\}\right] [[/math]]

If [math]\mu[/math] is a probability measure on [math](\R^d,\B(\R^d))[/math], (or even on a more general space [math](E,\mathcal{E})[/math]), there is a canonical way of constructing a r.v. [math]X[/math] such that [math]\p_X=\mu[/math] as a map

[[math]] X:(\R^d,\B(\R^d),\mu)\to\R^d. [[/math]]

There are two special cases.

Discrete r.v.: Let [math]E[/math] be a countable space and [math]\mathcal{E}=\mathcal{P}(E)[/math]. The law of [math]X[/math] is given by
[[math]] \p_X:=\sum_{x\in E}P(x)\delta_x, [[/math]]
where [math]P(x)=\p[X=x][/math] and [math]\delta_x[/math] is the Dirac measure of [math]x[/math], meaning that for all [math]A\subset E[/math],
[[math]] \delta_x(A)=\begin{cases}1&\text{if $x\in A$}\\ 0&\text{if $x\not\in A$}\end{cases} [[/math]]
We note that if [math]\p_X[E]=1[/math], then
[[math]] \sum_{x\in E}P(x)\delta_x(E)=\sum_{x\in E}P(x)=1. [[/math]]
Indeed, for all [math]B\in E[/math] we have that
[[math]] \p_X[B]=\p[X\in B]=\p\left[\bigcup_{x\in B}\{X=x\}\right]=\sum_{x\in B}\p[X=x]=\sum_{x\in E}P(x)\delta_x(B). [[/math]]
Continuous r.v.: A random variable [math]X[/math] with values in [math](\R^d,\B(\R^d))[/math] is said to have a density if [math]\p_X\ll \lambda[/math], where [math]\lambda[/math] is the lebesgue measure on [math]\R^d[/math]. The Radon-Nikodym theorem says there exists [math]P:\R^d\to\R[/math], measurable such that for al [math]B\in \B(\R^d)[/math]
[[math]] \p_X[B]=\int_BP(x)dx. [[/math]]
In particular, [math]\int_{\R^d}P(x)dx=\p_X(\R^d)=1[/math]. Moreover the map [math]P[/math] is unique up to sets of lebesgue measure 0. [math]P[/math] is called the density of [math]X[/math]. If [math]d=1[/math], then
[[math]] \p[\alpha\leq X\leq \beta]=\p_X[[\alpha,\beta]]=\int_\alpha^\beta P(x)dx. [[/math]]

Definition (Expected Value/Expectation)

Let [math](\Omega,\A,\p)[/math] be a probability space. Let [math]X[/math] be a real valued r.v. (i.e. with values in [math]\R[/math]). The expectation of such a r.v. is defined as

[[math]] \E[X]=\int_{\Omega}X(\omega)d\p(\omega)=\int_\R xd\p_X(x), [[/math]]

which is well defined in the following two cases.

If [math]x\geq 0[/math], and then [math]\E[X]\in[0,\infty][/math].
If [math]\E[X]=\int_{\Omega}\vert X(\omega)\vert d\p(\omega) \lt \infty[/math].

We extend this definition to the case of a r.v. [math]X=(X_1,...,X_d)[/math] taking values in [math]\R^d[/math] by defining

[[math]] \E[X]=(\E[X_1],...,\E[X_d]) [[/math]]

provided each [math]\E[X_i][/math] is well defined.

If [math]B\in\A[/math] and [math]X=\one_{B}[/math], then

[[math]] 0\leq \E[X]=\E[\one_B]=\p[B]\leq 1. [[/math]]

In general, [math]\E[X][/math] is interpreted as the average or the mean of the r.v. [math]X[/math]. If [math]X[/math] takes values in [math]\{x_1,...,x_n,...\}[/math] then

[[math]] \E[X]=\sum_{n=1}^\infty x_n\p[X=x], [[/math]]

whenever it is well defined.

The expectation is a special case of an integral with respect to a positive measure. In particular,

For all [math]X,Y[/math] integrable and [math]a,b\in\R[/math] we have
[[math]] \E[aX+bY]=a\E[X]+b\E[Y]. [[/math]]
If [math]C[/math] is a constant and [math]\E[X]=C[/math], then
[[math]] \int_\Omega Cd\p(\omega)=C\p[\Omega]=C. [[/math]]
If [math]X\geq 0[/math] and [math]\E[X]\geq 0[/math] and if [math]X\leq Y[/math] both integrable then
[[math]] \E[X]\leq \E[Y]. [[/math]]
(Monotne convergence) If [math](X_n)_{n\geq 1}[/math] is a sequence of real valued r.v.'s, and if [math]X_n\geq 0[/math] for all [math]n\geq 1[/math] and [math]X_n\uparrow X[/math] as [math]n\to\infty[/math], then
[[math]] \E[X_n]\uparrow\E[X]\text{as $n\to\infty$}. [[/math]]
(Fatou) If [math](X_n)_{n\geq 1}[/math] is a sequence of real valued r.v.'s with [math]X_n\geq 0[/math] for all [math]n\geq 1[/math], then
[[math]] \E\left[\liminf_{n\to\infty}X_n\right]\leq \liminf_{n\to\infty}\E[X_n]. [[/math]]
(Dominated convergence) If [math](X_n)_{n\geq 1}[/math] is a sequence of real valued r.v.'s with [math]\vert X_n\vert \leq Z[/math] for all [math]n\geq 1[/math], such that [math]\E[Z] \lt \infty[/math], for another real valued r.v. [math]Z[/math], and [math]X_n\xrightarrow{n\to\infty}X[/math] a.e., then
[[math]] \E[X_n]\xrightarrow{n\to\infty}\E[X]. [[/math]]

In probability theory we say almost sure convergence and write a.s., rather than almost everywhere. If [math]X_n\xrightarrow{n\to\infty}X[/math] a.s., then we mean

[[math]] \p\left[\{\omega\in\Omega\mid X_n(\omega)\xrightarrow{n\to\infty}X(\omega)\}\right]=1. [[/math]]

Proposition

Let [math]X[/math] be a r.v. with values in [math](E,\mathcal{E})[/math]. If [math]f:E\to [0,\infty][/math] is measurable, then

[[math]] \E[f(X)]=\int_E f(x)d\p_X(x). [[/math]]

Similarly, if [math]f:E\to \R[/math] is such that [math]\E[f(X)] \lt \infty[/math], then

[[math]] \E[f(X)]=\int_E f(x)d\p_X(x). [[/math]]

[math]f(X)[/math] is also a r.v.

Show Proof

In the case [math]f=\one_B[/math] with [math]B\in\mathcal{E}[/math] we get that

[[math]] \E[f(X)]=\p[X\in B]=\p_X[B] [[/math]]

from the definition of the distribution of a r.v. Then by linearity, the result is true for positive simple functions. And then we use the fact that for [math]f\geq 0[/math] measurable, [math]\exists (f_n)_{n\in\N}[/math], where the [math]f_n[/math]'s are simple and positive such that [math]f_n\uparrow f[/math] as [math]n\to\infty[/math] and we apply the monotone convergence theorem.

■

One often uses the proposition to compute the law of a r.v. [math]X[/math]. If one is able to write [math]\E[X]=\int f d\nu[/math] for a sufficiently large class of functions [math]f[/math], then one can deduce that [math]\p_X=\nu[/math]. The idea is to be able to take [math]f=\one_B[/math], for then [math]\E[f(X)]=\p_X[B]=\nu(B)[/math].

Example

Assume that [math]\p_X[/math] is absolutely continuous with density [math]h(x)=\frac{1}{\sqrt{2\pi}}e^{-\frac{x^2}{2}}[/math] for [math]x\in\R[/math] and [math]Y=X^2[/math]. Then one can ask about the distribution of [math]Y[/math]. Let [math]f:\R\to[0,\infty][/math] be measurable. Then

[[math]] \E[f(Y)]=\E[f(X^2)]=\int_{-\infty}^\infty f(x^2)\frac{1}{\sqrt{2\pi}}e^{-\frac{x^2}{2}}dx. [[/math]]

We can write

[[math]] \int_{-\infty}^\infty f(x^2)\frac{1}{\sqrt{2\pi}}e^{-\frac{x^2}{2}}dx=2\int_{0}^\infty f(x^2)\frac{1}{\sqrt{2\pi}}e^{-\frac{x^2}{2}}dx. [[/math]]

Now we can set [math]y=x^2[/math]. Then [math]dy=2xdx[/math] and hence [math]dx=\frac{dy}{2\sqrt{y}}[/math]. Now we can write

[[math]] 2\int_0^\infty f(y)\frac{e^{-\frac{y}{2}}}{2\sqrt{2\pi y}}dy=\int_0^\infty f(y)\frac{e^{-\frac{y}{2}}}{\sqrt{2\pi y}}dy, [[/math]]

which implies that

[[math]] d\nu(y)=\frac{e^{-\frac{y}{2}}}{\sqrt{2\pi y}}\one_{\{y \gt 0\}}dy. [[/math]]

So we see that the distribution of [math]Y[/math] is given by [math]\frac{e^{-\frac{y}{2}}}{\sqrt{2\pi y}}\one_{\{y \gt 0\}}[/math].

Proposition

Let [math]X=(X_1,...,X_d)\in\R^d[/math] be a r.v. Assume that [math]X[/math] has density [math]P(x_1,...,x_d).[/math] Then [math]\forall j\in\{1,...,n\}[/math], [math]X_j[/math] has density

[[math]] P_j(x)=\int_{\R^{d-1}}P(x_1,...,x_{j-1},x_{j},x_{j+1},...,x_d)dx^1\dotsm dx^{j-1}dx^{j+1}\dotsm dx^d [[/math]]

Let [math]d=2[/math] and [math]X=(X_1,X_2)[/math]. Then [math]P_1(x)=\int_\R P(x,y)dy[/math] and [math]P_2(x)=\int_\R P(x,y)dx[/math].

Show Proof

Let [math]\pi_j:(x_1,...,x_d)\mapsto x_j[/math]. From Fubini's theorem we get that [math]\forall f:\R\to \R^+[/math], Borel measurable

[[math]] \E[f(X_j)]=\E[f(\pi_j(X))]=\int_{\R^{d}}f(x_j)P(x_1,...,x_d)dx^1\dotsm dx^d [[/math]]

[[math]] =\int_\R f(x_j)\underbrace{\left(\int_{\R^{d-1}}P(x_1,...,x_{j-1},x_j,x_{j+1},...,x_d)dx^1\dotsm dx^{j-1}dx^{j+1}\dotsm dx^d\right)dx^j}_{d\nu(x_j)=P(x_j)dx^j} [[/math]]

By renaming [math]x_j=y[/math], we get

[[math]] \E[f(X_j)]=\int_\R f(y)P_j(y)dy. [[/math]]

Hence the distribution of [math]X_j[/math] has density [math]P_j(y)[/math] on [math]\R[/math].

■

If [math]X=(X_1,...,X_d)\in\R^d[/math] is a r.v., then the distribution [math]\p_{X_j}[/math] are called the margins of [math]X[/math]. The last proposition shows us that the margins are determined by

[[math]] \p_{X=(X_1,...,X_d)}, [[/math]]

but the converse is wrong. For example take [math]Q[/math] to be a density on [math]\R[/math] and observe that [math]P(x_1,x_2)=Q(x_1)Q(x_2)[/math] is also a density on [math]\R^2[/math]. We have already seen that we can construct (in a canonical way) a r.v. [math]X=(X_1,X_2)\in\R^2[/math] such that [math]\p_X[/math] has [math]P(x_1,x_2)[/math] as density. Now the margins of [math]X[/math], namely [math]\p_{X_1}[/math] and [math]\p_{X_2}[/math], have density [math]q(x)[/math]. We now observe that the [math]r.v.[/math]'s [math]X=(X_1,X_2)[/math] and [math]X'=(X_1,X_1)[/math] have the same margin but they are different. [math]\p_X[/math] has support in [math]\R^2[/math], while [math]\p_{X'}[/math] has support in the diagonal of [math]\R^2[/math], which is of Lebesgue measure 0 in [math]\R^2[/math]. In general we have [math]\p_X\not=\p_{X'}[/math].

General references

Moshayedi, Nima (2020). "Lectures on Probability Theory". arXiv:2010.16280 [math.PR].