guide:8a33754ae2: Difference between revisions

Revision as of 01:53, 8 May 2024

Law of a Random Variable

Definition (Random Variable)

Let [math](\Omega,\A,\p)[/math] be a probability space. Let [math](E,\mathcal{E})[/math] be a measurable space. A measurable map [math]X:(\Omega,\A,\p)\to (E,\mathcal{E})[/math] is called a random variable (and is noted r.v.) with values in [math]E[/math].

Definition (Law/Distribution)

The law or distribution of a random variable is the image measure of [math]\p[/math] by [math]X[/math], and is usually noted [math]\p_X[/math]. It is hence a probability measure on [math](E,\mathcal{E})[/math].

[[math]] \p_X[B]=\p[X^{-1}(B)]=\p[X\in B]=\p\left[\{\omega\in\Omega\mid X(\omega)\in B\}\right] [[/math]]

If [math]\mu[/math] is a probability measure on [math](\R^d,\B(\R^d))[/math], (or even on a more general space [math](E,\mathcal{E})[/math]), there is a canonical way of constructing a r.v. [math]X[/math] such that [math]\p_X=\mu[/math] as a map

[[math]] X:(\R^d,\B(\R^d),\mu)\to\R^d. [[/math]]

There are two special cases.

Discrete r.v.: Let [math]E[/math] be a countable space and [math]\mathcal{E}=\mathcal{P}(E)[/math]. The law of [math]X[/math] is given by
[[math]] \p_X:=\sum_{x\in E}P(x)\delta_x, [[/math]]
where [math]P(x)=\p[X=x][/math] and [math]\delta_x[/math] is the Dirac measure of [math]x[/math], meaning that for all [math]A\subset E[/math],
[[math]] \delta_x(A)=\begin{cases}1&\text{if $x\in A$}\\ 0&\text{if $x\not\in A$}\end{cases} [[/math]]
We note that if [math]\p_X[E]=1[/math], then
[[math]] \sum_{x\in E}P(x)\delta_x(E)=\sum_{x\in E}P(x)=1. [[/math]]
Indeed, for all [math]B\in E[/math] we have that
[[math]] \p_X[B]=\p[X\in B]=\p\left[\bigcup_{x\in B}\{X=x\}\right]=\sum_{x\in B}\p[X=x]=\sum_{x\in E}P(x)\delta_x(B). [[/math]]
Continuous r.v.: A random variable [math]X[/math] with values in [math](\R^d,\B(\R^d))[/math] is said to have a density if [math]\p_X\ll \lambda[/math], where [math]\lambda[/math] is the lebesgue measure on [math]\R^d[/math]. The Radon-Nikodym theorem says there exists [math]P:\R^d\to\R[/math], measurable such that for al [math]B\in \B(\R^d)[/math]
[[math]] \p_X[B]=\int_BP(x)dx. [[/math]]
In particular, [math]\int_{\R^d}P(x)dx=\p_X(\R^d)=1[/math]. Moreover the map [math]P[/math] is unique up to sets of lebesgue measure 0. [math]P[/math] is called the density of [math]X[/math]. If [math]d=1[/math], then
[[math]] \p[\alpha\leq X\leq \beta]=\p_X[[\alpha,\beta]]=\int_\alpha^\beta P(x)dx. [[/math]]

Definition (Expected Value/Expectation)

Let [math](\Omega,\A,\p)[/math] be a probability space. Let [math]X[/math] be a real valued r.v. (i.e. with values in [math]\R[/math]). The expectation of such a r.v. is defined as

[[math]] \E[X]=\int_{\Omega}X(\omega)d\p(\omega)=\int_\R xd\p_X(x), [[/math]]

which is well defined in the following two cases.

If [math]x\geq 0[/math], and then [math]\E[X]\in[0,\infty][/math].
If [math]\E[X]=\int_{\Omega}\vert X(\omega)\vert d\p(\omega) \lt \infty[/math].

We extend this definition to the case of a r.v. [math]X=(X_1,...,X_d)[/math] taking values in [math]\R^d[/math] by defining

[[math]] \E[X]=(\E[X_1],...,\E[X_d]) [[/math]]

provided each [math]\E[X_i][/math] is well defined.

If [math]B\in\A[/math] and [math]X=\one_{B}[/math], then

[[math]] 0\leq \E[X]=\E[\one_B]=\p[B]\leq 1. [[/math]]

In general, [math]\E[X][/math] is interpreted as the average or the mean of the r.v. [math]X[/math]. If [math]X[/math] takes values in [math]\{x_1,...,x_n,...\}[/math] then

[[math]] \E[X]=\sum_{n=1}^\infty x_n\p[X=x], [[/math]]

whenever it is well defined.

The expectation is a special case of an integral with respect to a positive measure. In particular,

For all [math]X,Y[/math] integrable and [math]a,b\in\R[/math] we have
[[math]] \E[aX+bY]=a\E[X]+b\E[Y]. [[/math]]
If [math]C[/math] is a constant and [math]\E[X]=C[/math], then
[[math]] \int_\Omega Cd\p(\omega)=C\p[\Omega]=C. [[/math]]
If [math]X\geq 0[/math] and [math]\E[X]\geq 0[/math] and if [math]X\leq Y[/math] both integrable then
[[math]] \E[X]\leq \E[Y]. [[/math]]
(Monotne convergence) If [math](X_n)_{n\geq 1}[/math] is a sequence of real valued r.v.'s, and if [math]X_n\geq 0[/math] for all [math]n\geq 1[/math] and [math]X_n\uparrow X[/math] as [math]n\to\infty[/math], then
[[math]] \E[X_n]\uparrow\E[X]\text{as $n\to\infty$}. [[/math]]
}
(Fatou) If [math](X_n)_{n\geq 1}[/math] is a sequence of real valued r.v.'s with [math]X_n\geq 0[/math] for all [math]n\geq 1[/math], then
[[math]] \E\left[\liminf_{n\to\infty}X_n\right]\leq \liminf_{n\to\infty}\E[X_n]. [[/math]]
}
(Dominated convergence) If [math](X_n)_{n\geq 1}[/math] is a sequence of real valued r.v.'s with [math]\vert X_n\vert \leq Z[/math] for all [math]n\geq 1[/math], such that [math]\E[Z] \lt \infty[/math], for another real valued r.v. [math]Z[/math], and [math]X_n\xrightarrow{n\to\infty}X[/math] a.e., then
[[math]] \E[X_n]\xrightarrow{n\to\infty}\E[X]. [[/math]]

In probability theory we say almost sure convergence and write a.s., rather than almost everywhere. If [math]X_n\xrightarrow{n\to\infty}X[/math] a.s., then we mean

[[math]] \p\left[\{\omega\in\Omega\mid X_n(\omega)\xrightarrow{n\to\infty}X(\omega)\}\right]=1. [[/math]]

Proposition

Let [math]X[/math] be a r.v. with values in [math](E,\mathcal{E})[/math]. If [math]f:E\to [0,\infty][/math] is measurable, then

[[math]] \E[f(X)]=\int_E f(x)d\p_X(x). [[/math]]

Similarly, if [math]f:E\to \R[/math] is such that [math]\E[f(X)] \lt \infty[/math], then

[[math]] \E[f(X)]=\int_E f(x)d\p_X(x). [[/math]]

[math]f(X)[/math] is also a r.v.

Show Proof

[Proof of Proposition] In the case [math]f=\one_B[/math] with [math]B\in\mathcal{E}[/math] we get that

[[math]] \E[f(X)]=\p[X\in B]=\p_X[B] [[/math]]

from the definition of the distribution of a r.v. Then by linearity, the result is true for positive simple functions. And then we use the fact that for [math]f\geq 0[/math] measurable, [math]\exists (f_n)_{n\in\N}[/math], where the [math]f_n[/math]'s are simple and positive such that [math]f_n\uparrow f[/math] as [math]n\to\infty[/math] and we apply the monotone convergence theorem.

■

One often uses the proposition to compute the law of a r.v. [math]X[/math]. If one is able to write [math]\E[X]=\int f d\nu[/math] for a sufficiently large class of functions [math]f[/math], then one can deduce that [math]\p_X=\nu[/math]. The idea is to be able to take [math]f=\one_B[/math], for then [math]\E[f(X)]=\p_X[B]=\nu(B)[/math].

Example

Assume that [math]\p_X[/math] is absolutely continuous with density [math]h(x)=\frac{1}{\sqrt{2\pi}}e^{-\frac{x^2}{2}}[/math] for [math]x\in\R[/math] and [math]Y=X^2[/math]. Then one can ask about the distribution of [math]Y[/math]. Let [math]f:\R\to[0,\infty][/math] be measurable. Then

[[math]] \E[f(Y)]=\E[f(X^2)]=\int_{-\infty}^\infty f(x^2)\frac{1}{\sqrt{2\pi}}e^{-\frac{x^2}{2}}dx. [[/math]]

We can write

[[math]] \int_{-\infty}^\infty f(x^2)\frac{1}{\sqrt{2\pi}}e^{-\frac{x^2}{2}}dx=2\int_{0}^\infty f(x^2)\frac{1}{\sqrt{2\pi}}e^{-\frac{x^2}{2}}dx. [[/math]]

Now we can set [math]y=x^2[/math]. Then [math]dy=2xdx[/math] and hence [math]dx=\frac{dy}{2\sqrt{y}}[/math]. Now we can write

[[math]] 2\int_0^\infty f(y)\frac{e^{-\frac{y}{2}}}{2\sqrt{2\pi y}}dy=\int_0^\infty f(y)\frac{e^{-\frac{y}{2}}}{\sqrt{2\pi y}}dy, [[/math]]

which implies that

[[math]] d\nu(y)=\frac{e^{-\frac{y}{2}}}{\sqrt{2\pi y}}\one_{\{y \gt 0\}}dy. [[/math]]

So we see that the distribution of [math]Y[/math] is given by [math]\frac{e^{-\frac{y}{2}}}{\sqrt{2\pi y}}\one_{\{y \gt 0\}}[/math].

Proposition

Let [math]X=(X_1,...,X_d)\in\R^d[/math] be a r.v. Assume that [math]X[/math] has density [math]P(x_1,...,x_d).[/math] Then [math]\forall j\in\{1,...,n\}[/math], [math]X_j[/math] has density

[[math]] P_j(x)=\int_{\R^{d-1}}P(x_1,...,x_{j-1},x_{j},x_{j+1},...,x_d)dx^1\dotsm dx^{j-1}dx^{j+1}\dotsm dx^d [[/math]]

Let [math]d=2[/math] and [math]X=(X_1,X_2)[/math]. Then [math]P_1(x)=\int_\R P(x,y)dy[/math] and [math]P_2(x)=\int_\R P(x,y)dx[/math].

Show Proof

[Proof of Proposition] Let [math]\pi_j:(x_1,...,x_d)\mapsto x_j[/math]. From Fubini's theorem we get that [math]\forall f:\R\to \R^+[/math], Borel measurable

[[math]] \E[f(X_j)]=\E[f(\pi_j(X))]=\int_{\R^{d}}f(x_j)P(x_1,...,x_d)dx^1\dotsm dx^d [[/math]]

[[math]] =\int_\R f(x_j)\underbrace{\left(\int_{\R^{d-1}}P(x_1,...,x_{j-1},x_j,x_{j+1},...,x_d)dx^1\dotsm dx^{j-1}dx^{j+1}\dotsm dx^d\right)dx^j}_{d\nu(x_j)=P(x_j)dx^j} [[/math]]

By renaming [math]x_j=y[/math], we get

[[math]] \E[f(X_j)]=\int_\R f(y)P_j(y)dy. [[/math]]

Hence the distribution of [math]X_j[/math] has density [math]P_j(y)[/math] on [math]\R[/math].

■

If [math]X=(X_1,...,X_d)\in\R^d[/math] is a r.v., then the distribution [math]\p_{X_j}[/math] are called the margins of [math]X[/math]. The last proposition shows us that the margins are determined by

[[math]] \p_{X=(X_1,...,X_d)}, [[/math]]

but the converse is wrong. For example take [math]Q[/math] to be a density on [math]\R[/math] and observe that [math]P(x_1,x_2)=Q(x_1)Q(x_2)[/math] is also a density on [math]\R^2[/math]. We have already seen that we can construct (in a canonical way) a r.v. [math]X=(X_1,X_2)\in\R^2[/math] such that [math]\p_X[/math] has [math]P(x_1,x_2)[/math] as density. Now the margins of [math]X[/math], namely [math]\p_{X_1}[/math] and [math]\p_{X_2}[/math], have density [math]q(x)[/math]. We now observe that the [math]r.v.[/math]'s [math]X=(X_1,X_2)[/math] and [math]X'=(X_1,X_1)[/math] have the same margin but they are different. [math]\p_X[/math] has support in [math]\R^2[/math], while [math]\p_{X'}[/math] has support in the diagonal of [math]\R^2[/math], which is of Lebesgue measure 0 in [math]\R^2[/math]. In general we have [math]\p_X\not=\p_{X'}[/math].

General references

Moshayedi, Nima (2020). "Lectures on Probability Theory". arXiv:2010.16280 [math.PR].

@@ Line 1: / Line 1: @@
+<div class="d-none"><math>
+\newcommand{\R}{\mathbb{R}}
+\newcommand{\A}{\mathcal{A}}
+\newcommand{\B}{\mathcal{B}}
+\newcommand{\N}{\mathbb{N}}
+\newcommand{\C}{\mathbb{C}}
+\newcommand{\Rbar}{\overline{\mathbb{R}}}
+\newcommand{\Bbar}{\overline{\mathcal{B}}}
+\newcommand{\Q}{\mathbb{Q}}
+\newcommand{\E}{\mathbb{E}}
+\newcommand{\p}{\mathbb{P}}
+\newcommand{\one}{\mathds{1}}
+\newcommand{\0}{\mathcal{O}}
+\newcommand{\mat}{\textnormal{Mat}}
+\newcommand{\sign}{\textnormal{sign}}
+\newcommand{\CP}{\mathcal{P}}
+\newcommand{\CT}{\mathcal{T}}
+\newcommand{\CY}{\mathcal{Y}}
+\newcommand{\F}{\mathcal{F}}
+\newcommand{\mathds}{\mathbb}</math></div>
+===Law of a Random Variable===
+{{definitioncard|Random Variable|Let <math>(\Omega,\A,\p)</math> be a probability space. Let <math>(E,\mathcal{E})</math> be a measurable space. A measurable map <math>X:(\Omega,\A,\p)\to (E,\mathcal{E})</math> is called a random variable (and is noted r.v.) with values in <math>E</math>. }}
+{{definitioncard|Law/Distribution|The law or distribution of a random variable is the image measure of <math>\p</math> by <math>X</math>, and is usually noted <math>\p_X</math>. It is hence a probability measure on <math>(E,\mathcal{E})</math>.
+<math display="block">
+\p_X[B]=\p[X^{-1}(B)]=\p[X\in B]=\p\left[\{\omega\in\Omega\mid X(\omega)\in B\}\right]
+</math>
+}}
+If <math>\mu</math> is a probability measure on <math>(\R^d,\B(\R^d))</math>, (or even on a more general space <math>(E,\mathcal{E})</math>), there is a canonical way of constructing a r.v. <math>X</math> such that <math>\p_X=\mu</math> as a map
+<math display="block">
+X:(\R^d,\B(\R^d),\mu)\to\R^d.
+</math>
+There are two special cases.
+<ul style{{=}}"list-style-type:lower-roman"><li>''Discrete r.v.:'' Let <math>E</math> be a countable space and <math>\mathcal{E}=\mathcal{P}(E)</math>. The law of <math>X</math> is given by
+<math display="block">
+\p_X:=\sum_{x\in E}P(x)\delta_x,
+</math>
+where <math>P(x)=\p[X=x]</math> and <math>\delta_x</math> is the Dirac measure of <math>x</math>, meaning that for all <math>A\subset E</math>,
+<math display="block">
+\delta_x(A)=\begin{cases}1&\text{if $x\in A$}\\ 0&\text{if $x\not\in A$}\end{cases}
+</math>
+We note that if <math>\p_X[E]=1</math>, then
+<math display="block">
+\sum_{x\in E}P(x)\delta_x(E)=\sum_{x\in E}P(x)=1.
+</math>
+Indeed, for all <math>B\in E</math> we have that
+<math display="block">
+\p_X[B]=\p[X\in B]=\p\left[\bigcup_{x\in B}\{X=x\}\right]=\sum_{x\in B}\p[X=x]=\sum_{x\in E}P(x)\delta_x(B).
+</math>
+</li>
+<li>''Continuous r.v.:'' A random variable <math>X</math> with values in <math>(\R^d,\B(\R^d))</math> is said to have a density if <math>\p_X\ll \lambda</math>, where <math>\lambda</math> is the lebesgue measure on <math>\R^d</math>. The Radon-Nikodym theorem says there exists <math>P:\R^d\to\R</math>, measurable such that for al <math>B\in \B(\R^d)</math>
+<math display="block">
+\p_X[B]=\int_BP(x)dx.
+</math>
+In particular, <math>\int_{\R^d}P(x)dx=\p_X(\R^d)=1</math>. Moreover the map <math>P</math> is unique up to sets of lebesgue measure 0. <math>P</math> is called the density of <math>X</math>. If <math>d=1</math>, then
+<math display="block">
+\p[\alpha\leq X\leq \beta]=\p_X[[\alpha,\beta]]=\int_\alpha^\beta P(x)dx.
+</math>
+</li>
+</ul>
+{{definitioncard|Expected Value/Expectation|Let <math>(\Omega,\A,\p)</math> be a probability space. Let <math>X</math> be a real valued r.v. (i.e. with values in <math>\R</math>). The expectation of such a r.v. is defined as
+<math display="block">
+\E[X]=\int_{\Omega}X(\omega)d\p(\omega)=\int_\R xd\p_X(x),
+</math>
+which is well defined in the following two cases.
+<ul style{{=}}"list-style-type:lower-roman"><li>If <math>x\geq 0</math>, and then <math>\E[X]\in[0,\infty]</math>.
+</li>
+<li>If <math>\E[X]=\int_{\Omega}\vert X(\omega)\vert d\p(\omega) < \infty</math>.
+</li>
+</ul>}}
+We extend this definition to the case of a r.v. <math>X=(X_1,...,X_d)</math> taking values in <math>\R^d</math> by defining
+<math display="block">
+\E[X]=(\E[X_1],...,\E[X_d])
+</math>
+provided each <math>\E[X_i]</math> is well defined.
+{{alert-info |
+If <math>B\in\A</math> and <math>X=\one_{B}</math>, then
+<math display="block">
+\leq \E[X]=\E[\one_B]=\p[B]\leq 1.
+</math>
+In general, <math>\E[X]</math> is interpreted as the average or the mean of the r.v. <math>X</math>. If <math>X</math> takes values in <math>\{x_1,...,x_n,...\}</math> then
+<math display="block">
+\E[X]=\sum_{n=1}^\infty x_n\p[X=x],
+</math>
+whenever it is well defined.
+}}
+The expectation is a special case of an integral with respect to a positive measure. In particular,
+<ul style{{=}}"list-style-type:lower-roman"><li>For all <math>X,Y</math> integrable and <math>a,b\in\R</math> we have
+<math display="block">
+\E[aX+bY]=a\E[X]+b\E[Y].
+</math>
+</li>
+<li>If <math>C</math> is a constant and <math>\E[X]=C</math>, then
+<math display="block">
+\int_\Omega Cd\p(\omega)=C\p[\Omega]=C.
+</math>
+</li>
+<li>If <math>X\geq 0</math> and <math>\E[X]\geq 0</math> and if <math>X\leq Y</math> both integrable then
+<math display="block">
+\E[X]\leq \E[Y].
+</math>
+</li>
+<li>(''Monotne convergence'') If <math>(X_n)_{n\geq 1}</math> is a sequence of real valued r.v.'s, and if <math>X_n\geq 0</math> for all <math>n\geq 1</math> and <math>X_n\uparrow X</math> as <math>n\to\infty</math>, then
+<math display="block">
+\E[X_n]\uparrow\E[X]\text{as $n\to\infty$}.
+</math>
+}
+</li>
+<li>(''Fatou'') If <math>(X_n)_{n\geq 1}</math> is a sequence of real valued r.v.'s with <math>X_n\geq 0</math> for all <math>n\geq 1</math>, then
+<math display="block">
+\E\left[\liminf_{n\to\infty}X_n\right]\leq \liminf_{n\to\infty}\E[X_n].
+</math>
+}
+</li>
+<li>(''Dominated convergence'') If <math>(X_n)_{n\geq 1}</math> is a sequence of real valued r.v.'s with <math>\vert X_n\vert \leq Z</math> for all <math>n\geq 1</math>, such that <math>\E[Z] < \infty</math>, for another real valued r.v. <math>Z</math>, and <math>X_n\xrightarrow{n\to\infty}X</math> a.e., then
+<math display="block">
+\E[X_n]\xrightarrow{n\to\infty}\E[X].
+</math>
+</li>
+</ul>
+{{alert-info |
+In probability theory we say almost sure convergence and write a.s., rather than almost everywhere. If <math>X_n\xrightarrow{n\to\infty}X</math> a.s., then we mean
+<math display="block">
+\p\left[\{\omega\in\Omega\mid X_n(\omega)\xrightarrow{n\to\infty}X(\omega)\}\right]=1.
+</math>
+}}
+{{proofcard|Proposition|random|Let <math>X</math> be a r.v. with values in <math>(E,\mathcal{E})</math>. If <math>f:E\to [0,\infty]</math> is measurable, then
+<math display="block">
+\E[f(X)]=\int_E f(x)d\p_X(x).
+</math>
+Similarly, if <math>f:E\to \R</math> is such that <math>\E[f(X)] < \infty</math>, then
+<math display="block">
+\E[f(X)]=\int_E f(x)d\p_X(x).
+</math>
+{{alert-info |
+<math>f(X)</math> is also a r.v.
+}}
+|[Proof of [[#random |Proposition]]]
+In the case <math>f=\one_B</math> with <math>B\in\mathcal{E}</math> we get that
+<math display="block">
+\E[f(X)]=\p[X\in B]=\p_X[B]
+</math>
+from the definition of the distribution of a r.v. Then by linearity, the result is true for positive simple functions. And then we use the fact that for <math>f\geq 0</math> measurable, <math>\exists (f_n)_{n\in\N}</math>, where the <math>f_n</math>'s are simple and positive such that <math>f_n\uparrow f</math> as <math>n\to\infty</math> and we apply the monotone convergence theorem.}}
+{{alert-info |
+One often uses the proposition to compute the law of a r.v. <math>X</math>. If one is able to write <math>\E[X]=\int f d\nu</math> for a sufficiently large class of functions <math>f</math>, then one can deduce that <math>\p_X=\nu</math>. The idea is to be able to take <math>f=\one_B</math>, for then <math>\E[f(X)]=\p_X[B]=\nu(B)</math>.
+}}
+'''Example'''
+Assume that <math>\p_X</math> is absolutely continuous with density <math>h(x)=\frac{1}{\sqrt{2\pi}}e^{-\frac{x^2}{2}}</math> for <math>x\in\R</math> and <math>Y=X^2</math>. Then one can ask about the distribution of <math>Y</math>. Let <math>f:\R\to[0,\infty]</math> be measurable. Then
+<math display="block">
+\E[f(Y)]=\E[f(X^2)]=\int_{-\infty}^\infty f(x^2)\frac{1}{\sqrt{2\pi}}e^{-\frac{x^2}{2}}dx.
+</math>
+We can write
+<math display="block">
+\int_{-\infty}^\infty f(x^2)\frac{1}{\sqrt{2\pi}}e^{-\frac{x^2}{2}}dx=2\int_{0}^\infty f(x^2)\frac{1}{\sqrt{2\pi}}e^{-\frac{x^2}{2}}dx.
+</math>
+Now we can set <math>y=x^2</math>. Then <math>dy=2xdx</math> and hence <math>dx=\frac{dy}{2\sqrt{y}}</math>. Now we can write
+<math display="block">
+\int_0^\infty f(y)\frac{e^{-\frac{y}{2}}}{2\sqrt{2\pi y}}dy=\int_0^\infty f(y)\frac{e^{-\frac{y}{2}}}{\sqrt{2\pi y}}dy,
+</math>
+which implies that
+<math display="block">
+d\nu(y)=\frac{e^{-\frac{y}{2}}}{\sqrt{2\pi y}}\one_{\{y > 0\}}dy.
+</math>
+So we see that the distribution of <math>Y</math> is given by <math>\frac{e^{-\frac{y}{2}}}{\sqrt{2\pi y}}\one_{\{y > 0\}}</math>.
+{{proofcard|Proposition|random2|Let <math>X=(X_1,...,X_d)\in\R^d</math> be a r.v. Assume that <math>X</math> has density <math>P(x_1,...,x_d).</math> Then <math>\forall j\in\{1,...,n\}</math>, <math>X_j</math> has density
+<math display="block">
+P_j(x)=\int_{\R^{d-1}}P(x_1,...,x_{j-1},x_{j},x_{j+1},...,x_d)dx^1\dotsm dx^{j-1}dx^{j+1}\dotsm dx^d
+</math>
+{{alert-info |
+Let <math>d=2</math> and <math>X=(X_1,X_2)</math>. Then <math>P_1(x)=\int_\R P(x,y)dy</math> and <math>P_2(x)=\int_\R P(x,y)dx</math>.
+}}
+|[Proof of [[#random2 |Proposition]]]
+Let <math>\pi_j:(x_1,...,x_d)\mapsto x_j</math>. From Fubini's theorem we get that <math>\forall f:\R\to \R^+</math>, Borel measurable
+<math display="block">
+\E[f(X_j)]=\E[f(\pi_j(X))]=\int_{\R^{d}}f(x_j)P(x_1,...,x_d)dx^1\dotsm dx^d
+</math>
+<math display="block">
+=\int_\R f(x_j)\underbrace{\left(\int_{\R^{d-1}}P(x_1,...,x_{j-1},x_j,x_{j+1},...,x_d)dx^1\dotsm dx^{j-1}dx^{j+1}\dotsm dx^d\right)dx^j}_{d\nu(x_j)=P(x_j)dx^j}
+</math>
+By renaming <math>x_j=y</math>, we get
+<math display="block">
+\E[f(X_j)]=\int_\R f(y)P_j(y)dy.
+</math>
+Hence the distribution of <math>X_j</math> has density <math>P_j(y)</math> on <math>\R</math>.}}
+{{alert-info |
+If <math>X=(X_1,...,X_d)\in\R^d</math> is a r.v., then the distribution <math>\p_{X_j}</math> are called the margins of <math>X</math>. The last proposition shows us that the margins are determined by
+<math display="block">
+\p_{X=(X_1,...,X_d)},
+</math>
+but the converse is wrong. For example take <math>Q</math> to be a density on <math>\R</math> and observe that <math>P(x_1,x_2)=Q(x_1)Q(x_2)</math> is also a density on <math>\R^2</math>. We have already seen that we can construct (in a canonical way) a r.v. <math>X=(X_1,X_2)\in\R^2</math> such that <math>\p_X</math> has <math>P(x_1,x_2)</math> as density. Now the margins of <math>X</math>, namely <math>\p_{X_1}</math> and <math>\p_{X_2}</math>, have density <math>q(x)</math>. We now observe that the <math>r.v.</math>'s <math>X=(X_1,X_2)</math> and <math>X'=(X_1,X_1)</math> have the same margin but they are different. <math>\p_X</math> has support in <math>\R^2</math>, while <math>\p_{X'}</math> has support in the diagonal of <math>\R^2</math>, which is of Lebesgue measure 0 in <math>\R^2</math>. In general we have <math>\p_X\not=\p_{X'}</math>.
+}}
+==General references==
+{{cite arXiv|last=Moshayedi|first=Nima|year=2020|title=Lectures on Probability Theory|eprint=2010.16280|class=math.PR}}