guide:8a33754ae2: Difference between revisions
No edit summary |
No edit summary |
||
Line 1: | Line 1: | ||
<div class="d-none"><math> | |||
\newcommand{\R}{\mathbb{R}} | |||
\newcommand{\A}{\mathcal{A}} | |||
\newcommand{\B}{\mathcal{B}} | |||
\newcommand{\N}{\mathbb{N}} | |||
\newcommand{\C}{\mathbb{C}} | |||
\newcommand{\Rbar}{\overline{\mathbb{R}}} | |||
\newcommand{\Bbar}{\overline{\mathcal{B}}} | |||
\newcommand{\Q}{\mathbb{Q}} | |||
\newcommand{\E}{\mathbb{E}} | |||
\newcommand{\p}{\mathbb{P}} | |||
\newcommand{\one}{\mathds{1}} | |||
\newcommand{\0}{\mathcal{O}} | |||
\newcommand{\mat}{\textnormal{Mat}} | |||
\newcommand{\sign}{\textnormal{sign}} | |||
\newcommand{\CP}{\mathcal{P}} | |||
\newcommand{\CT}{\mathcal{T}} | |||
\newcommand{\CY}{\mathcal{Y}} | |||
\newcommand{\F}{\mathcal{F}} | |||
\newcommand{\mathds}{\mathbb}</math></div> | |||
===Law of a Random Variable=== | |||
{{definitioncard|Random Variable|Let <math>(\Omega,\A,\p)</math> be a probability space. Let <math>(E,\mathcal{E})</math> be a measurable space. A measurable map <math>X:(\Omega,\A,\p)\to (E,\mathcal{E})</math> is called a random variable (and is noted r.v.) with values in <math>E</math>. }} | |||
{{definitioncard|Law/Distribution|The law or distribution of a random variable is the image measure of <math>\p</math> by <math>X</math>, and is usually noted <math>\p_X</math>. It is hence a probability measure on <math>(E,\mathcal{E})</math>. | |||
<math display="block"> | |||
\p_X[B]=\p[X^{-1}(B)]=\p[X\in B]=\p\left[\{\omega\in\Omega\mid X(\omega)\in B\}\right] | |||
</math> | |||
}} | |||
If <math>\mu</math> is a probability measure on <math>(\R^d,\B(\R^d))</math>, (or even on a more general space <math>(E,\mathcal{E})</math>), there is a canonical way of constructing a r.v. <math>X</math> such that <math>\p_X=\mu</math> as a map | |||
<math display="block"> | |||
X:(\R^d,\B(\R^d),\mu)\to\R^d. | |||
</math> | |||
There are two special cases. | |||
<ul style{{=}}"list-style-type:lower-roman"><li>''Discrete r.v.:'' Let <math>E</math> be a countable space and <math>\mathcal{E}=\mathcal{P}(E)</math>. The law of <math>X</math> is given by | |||
<math display="block"> | |||
\p_X:=\sum_{x\in E}P(x)\delta_x, | |||
</math> | |||
where <math>P(x)=\p[X=x]</math> and <math>\delta_x</math> is the Dirac measure of <math>x</math>, meaning that for all <math>A\subset E</math>, | |||
<math display="block"> | |||
\delta_x(A)=\begin{cases}1&\text{if $x\in A$}\\ 0&\text{if $x\not\in A$}\end{cases} | |||
</math> | |||
We note that if <math>\p_X[E]=1</math>, then | |||
<math display="block"> | |||
\sum_{x\in E}P(x)\delta_x(E)=\sum_{x\in E}P(x)=1. | |||
</math> | |||
Indeed, for all <math>B\in E</math> we have that | |||
<math display="block"> | |||
\p_X[B]=\p[X\in B]=\p\left[\bigcup_{x\in B}\{X=x\}\right]=\sum_{x\in B}\p[X=x]=\sum_{x\in E}P(x)\delta_x(B). | |||
</math> | |||
</li> | |||
<li>''Continuous r.v.:'' A random variable <math>X</math> with values in <math>(\R^d,\B(\R^d))</math> is said to have a density if <math>\p_X\ll \lambda</math>, where <math>\lambda</math> is the lebesgue measure on <math>\R^d</math>. The Radon-Nikodym theorem says there exists <math>P:\R^d\to\R</math>, measurable such that for al <math>B\in \B(\R^d)</math> | |||
<math display="block"> | |||
\p_X[B]=\int_BP(x)dx. | |||
</math> | |||
In particular, <math>\int_{\R^d}P(x)dx=\p_X(\R^d)=1</math>. Moreover the map <math>P</math> is unique up to sets of lebesgue measure 0. <math>P</math> is called the density of <math>X</math>. If <math>d=1</math>, then | |||
<math display="block"> | |||
\p[\alpha\leq X\leq \beta]=\p_X[[\alpha,\beta]]=\int_\alpha^\beta P(x)dx. | |||
</math> | |||
</li> | |||
</ul> | |||
{{definitioncard|Expected Value/Expectation|Let <math>(\Omega,\A,\p)</math> be a probability space. Let <math>X</math> be a real valued r.v. (i.e. with values in <math>\R</math>). The expectation of such a r.v. is defined as | |||
<math display="block"> | |||
\E[X]=\int_{\Omega}X(\omega)d\p(\omega)=\int_\R xd\p_X(x), | |||
</math> | |||
which is well defined in the following two cases. | |||
<ul style{{=}}"list-style-type:lower-roman"><li>If <math>x\geq 0</math>, and then <math>\E[X]\in[0,\infty]</math>. | |||
</li> | |||
<li>If <math>\E[X]=\int_{\Omega}\vert X(\omega)\vert d\p(\omega) < \infty</math>. | |||
</li> | |||
</ul>}} | |||
We extend this definition to the case of a r.v. <math>X=(X_1,...,X_d)</math> taking values in <math>\R^d</math> by defining | |||
<math display="block"> | |||
\E[X]=(\E[X_1],...,\E[X_d]) | |||
</math> | |||
provided each <math>\E[X_i]</math> is well defined. | |||
{{alert-info | | |||
If <math>B\in\A</math> and <math>X=\one_{B}</math>, then | |||
<math display="block"> | |||
0\leq \E[X]=\E[\one_B]=\p[B]\leq 1. | |||
</math> | |||
In general, <math>\E[X]</math> is interpreted as the average or the mean of the r.v. <math>X</math>. If <math>X</math> takes values in <math>\{x_1,...,x_n,...\}</math> then | |||
<math display="block"> | |||
\E[X]=\sum_{n=1}^\infty x_n\p[X=x], | |||
</math> | |||
whenever it is well defined. | |||
}} | |||
The expectation is a special case of an integral with respect to a positive measure. In particular, | |||
<ul style{{=}}"list-style-type:lower-roman"><li>For all <math>X,Y</math> integrable and <math>a,b\in\R</math> we have | |||
<math display="block"> | |||
\E[aX+bY]=a\E[X]+b\E[Y]. | |||
</math> | |||
</li> | |||
<li>If <math>C</math> is a constant and <math>\E[X]=C</math>, then | |||
<math display="block"> | |||
\int_\Omega Cd\p(\omega)=C\p[\Omega]=C. | |||
</math> | |||
</li> | |||
<li>If <math>X\geq 0</math> and <math>\E[X]\geq 0</math> and if <math>X\leq Y</math> both integrable then | |||
<math display="block"> | |||
\E[X]\leq \E[Y]. | |||
</math> | |||
</li> | |||
<li>(''Monotne convergence'') If <math>(X_n)_{n\geq 1}</math> is a sequence of real valued r.v.'s, and if <math>X_n\geq 0</math> for all <math>n\geq 1</math> and <math>X_n\uparrow X</math> as <math>n\to\infty</math>, then | |||
<math display="block"> | |||
\E[X_n]\uparrow\E[X]\text{as $n\to\infty$}. | |||
</math> | |||
} | |||
</li> | |||
<li>(''Fatou'') If <math>(X_n)_{n\geq 1}</math> is a sequence of real valued r.v.'s with <math>X_n\geq 0</math> for all <math>n\geq 1</math>, then | |||
<math display="block"> | |||
\E\left[\liminf_{n\to\infty}X_n\right]\leq \liminf_{n\to\infty}\E[X_n]. | |||
</math> | |||
} | |||
</li> | |||
<li>(''Dominated convergence'') If <math>(X_n)_{n\geq 1}</math> is a sequence of real valued r.v.'s with <math>\vert X_n\vert \leq Z</math> for all <math>n\geq 1</math>, such that <math>\E[Z] < \infty</math>, for another real valued r.v. <math>Z</math>, and <math>X_n\xrightarrow{n\to\infty}X</math> a.e., then | |||
<math display="block"> | |||
\E[X_n]\xrightarrow{n\to\infty}\E[X]. | |||
</math> | |||
</li> | |||
</ul> | |||
{{alert-info | | |||
In probability theory we say almost sure convergence and write a.s., rather than almost everywhere. If <math>X_n\xrightarrow{n\to\infty}X</math> a.s., then we mean | |||
<math display="block"> | |||
\p\left[\{\omega\in\Omega\mid X_n(\omega)\xrightarrow{n\to\infty}X(\omega)\}\right]=1. | |||
</math> | |||
}} | |||
{{proofcard|Proposition|random|Let <math>X</math> be a r.v. with values in <math>(E,\mathcal{E})</math>. If <math>f:E\to [0,\infty]</math> is measurable, then | |||
<math display="block"> | |||
\E[f(X)]=\int_E f(x)d\p_X(x). | |||
</math> | |||
Similarly, if <math>f:E\to \R</math> is such that <math>\E[f(X)] < \infty</math>, then | |||
<math display="block"> | |||
\E[f(X)]=\int_E f(x)d\p_X(x). | |||
</math> | |||
{{alert-info | | |||
<math>f(X)</math> is also a r.v. | |||
}} | |||
|[Proof of [[#random |Proposition]]] | |||
In the case <math>f=\one_B</math> with <math>B\in\mathcal{E}</math> we get that | |||
<math display="block"> | |||
\E[f(X)]=\p[X\in B]=\p_X[B] | |||
</math> | |||
from the definition of the distribution of a r.v. Then by linearity, the result is true for positive simple functions. And then we use the fact that for <math>f\geq 0</math> measurable, <math>\exists (f_n)_{n\in\N}</math>, where the <math>f_n</math>'s are simple and positive such that <math>f_n\uparrow f</math> as <math>n\to\infty</math> and we apply the monotone convergence theorem.}} | |||
{{alert-info | | |||
One often uses the proposition to compute the law of a r.v. <math>X</math>. If one is able to write <math>\E[X]=\int f d\nu</math> for a sufficiently large class of functions <math>f</math>, then one can deduce that <math>\p_X=\nu</math>. The idea is to be able to take <math>f=\one_B</math>, for then <math>\E[f(X)]=\p_X[B]=\nu(B)</math>. | |||
}} | |||
'''Example''' | |||
Assume that <math>\p_X</math> is absolutely continuous with density <math>h(x)=\frac{1}{\sqrt{2\pi}}e^{-\frac{x^2}{2}}</math> for <math>x\in\R</math> and <math>Y=X^2</math>. Then one can ask about the distribution of <math>Y</math>. Let <math>f:\R\to[0,\infty]</math> be measurable. Then | |||
<math display="block"> | |||
\E[f(Y)]=\E[f(X^2)]=\int_{-\infty}^\infty f(x^2)\frac{1}{\sqrt{2\pi}}e^{-\frac{x^2}{2}}dx. | |||
</math> | |||
We can write | |||
<math display="block"> | |||
\int_{-\infty}^\infty f(x^2)\frac{1}{\sqrt{2\pi}}e^{-\frac{x^2}{2}}dx=2\int_{0}^\infty f(x^2)\frac{1}{\sqrt{2\pi}}e^{-\frac{x^2}{2}}dx. | |||
</math> | |||
Now we can set <math>y=x^2</math>. Then <math>dy=2xdx</math> and hence <math>dx=\frac{dy}{2\sqrt{y}}</math>. Now we can write | |||
<math display="block"> | |||
2\int_0^\infty f(y)\frac{e^{-\frac{y}{2}}}{2\sqrt{2\pi y}}dy=\int_0^\infty f(y)\frac{e^{-\frac{y}{2}}}{\sqrt{2\pi y}}dy, | |||
</math> | |||
which implies that | |||
<math display="block"> | |||
d\nu(y)=\frac{e^{-\frac{y}{2}}}{\sqrt{2\pi y}}\one_{\{y > 0\}}dy. | |||
</math> | |||
So we see that the distribution of <math>Y</math> is given by <math>\frac{e^{-\frac{y}{2}}}{\sqrt{2\pi y}}\one_{\{y > 0\}}</math>. | |||
{{proofcard|Proposition|random2|Let <math>X=(X_1,...,X_d)\in\R^d</math> be a r.v. Assume that <math>X</math> has density <math>P(x_1,...,x_d).</math> Then <math>\forall j\in\{1,...,n\}</math>, <math>X_j</math> has density | |||
<math display="block"> | |||
P_j(x)=\int_{\R^{d-1}}P(x_1,...,x_{j-1},x_{j},x_{j+1},...,x_d)dx^1\dotsm dx^{j-1}dx^{j+1}\dotsm dx^d | |||
</math> | |||
{{alert-info | | |||
Let <math>d=2</math> and <math>X=(X_1,X_2)</math>. Then <math>P_1(x)=\int_\R P(x,y)dy</math> and <math>P_2(x)=\int_\R P(x,y)dx</math>. | |||
}} | |||
|[Proof of [[#random2 |Proposition]]] | |||
Let <math>\pi_j:(x_1,...,x_d)\mapsto x_j</math>. From Fubini's theorem we get that <math>\forall f:\R\to \R^+</math>, Borel measurable | |||
<math display="block"> | |||
\E[f(X_j)]=\E[f(\pi_j(X))]=\int_{\R^{d}}f(x_j)P(x_1,...,x_d)dx^1\dotsm dx^d | |||
</math> | |||
<math display="block"> | |||
=\int_\R f(x_j)\underbrace{\left(\int_{\R^{d-1}}P(x_1,...,x_{j-1},x_j,x_{j+1},...,x_d)dx^1\dotsm dx^{j-1}dx^{j+1}\dotsm dx^d\right)dx^j}_{d\nu(x_j)=P(x_j)dx^j} | |||
</math> | |||
By renaming <math>x_j=y</math>, we get | |||
<math display="block"> | |||
\E[f(X_j)]=\int_\R f(y)P_j(y)dy. | |||
</math> | |||
Hence the distribution of <math>X_j</math> has density <math>P_j(y)</math> on <math>\R</math>.}} | |||
{{alert-info | | |||
If <math>X=(X_1,...,X_d)\in\R^d</math> is a r.v., then the distribution <math>\p_{X_j}</math> are called the margins of <math>X</math>. The last proposition shows us that the margins are determined by | |||
<math display="block"> | |||
\p_{X=(X_1,...,X_d)}, | |||
</math> | |||
but the converse is wrong. For example take <math>Q</math> to be a density on <math>\R</math> and observe that <math>P(x_1,x_2)=Q(x_1)Q(x_2)</math> is also a density on <math>\R^2</math>. We have already seen that we can construct (in a canonical way) a r.v. <math>X=(X_1,X_2)\in\R^2</math> such that <math>\p_X</math> has <math>P(x_1,x_2)</math> as density. Now the margins of <math>X</math>, namely <math>\p_{X_1}</math> and <math>\p_{X_2}</math>, have density <math>q(x)</math>. We now observe that the <math>r.v.</math>'s <math>X=(X_1,X_2)</math> and <math>X'=(X_1,X_1)</math> have the same margin but they are different. <math>\p_X</math> has support in <math>\R^2</math>, while <math>\p_{X'}</math> has support in the diagonal of <math>\R^2</math>, which is of Lebesgue measure 0 in <math>\R^2</math>. In general we have <math>\p_X\not=\p_{X'}</math>. | |||
}} | |||
==General references== | |||
{{cite arXiv|last=Moshayedi|first=Nima|year=2020|title=Lectures on Probability Theory|eprint=2010.16280|class=math.PR}} |
Revision as of 01:53, 8 May 2024
Law of a Random Variable
Let [math](\Omega,\A,\p)[/math] be a probability space. Let [math](E,\mathcal{E})[/math] be a measurable space. A measurable map [math]X:(\Omega,\A,\p)\to (E,\mathcal{E})[/math] is called a random variable (and is noted r.v.) with values in [math]E[/math].
The law or distribution of a random variable is the image measure of [math]\p[/math] by [math]X[/math], and is usually noted [math]\p_X[/math]. It is hence a probability measure on [math](E,\mathcal{E})[/math].
If [math]\mu[/math] is a probability measure on [math](\R^d,\B(\R^d))[/math], (or even on a more general space [math](E,\mathcal{E})[/math]), there is a canonical way of constructing a r.v. [math]X[/math] such that [math]\p_X=\mu[/math] as a map
There are two special cases.
- Discrete r.v.: Let [math]E[/math] be a countable space and [math]\mathcal{E}=\mathcal{P}(E)[/math]. The law of [math]X[/math] is given by
[[math]] \p_X:=\sum_{x\in E}P(x)\delta_x, [[/math]]where [math]P(x)=\p[X=x][/math] and [math]\delta_x[/math] is the Dirac measure of [math]x[/math], meaning that for all [math]A\subset E[/math],[[math]] \delta_x(A)=\begin{cases}1&\text{if $x\in A$}\\ 0&\text{if $x\not\in A$}\end{cases} [[/math]]We note that if [math]\p_X[E]=1[/math], then[[math]] \sum_{x\in E}P(x)\delta_x(E)=\sum_{x\in E}P(x)=1. [[/math]]Indeed, for all [math]B\in E[/math] we have that[[math]] \p_X[B]=\p[X\in B]=\p\left[\bigcup_{x\in B}\{X=x\}\right]=\sum_{x\in B}\p[X=x]=\sum_{x\in E}P(x)\delta_x(B). [[/math]]
- Continuous r.v.: A random variable [math]X[/math] with values in [math](\R^d,\B(\R^d))[/math] is said to have a density if [math]\p_X\ll \lambda[/math], where [math]\lambda[/math] is the lebesgue measure on [math]\R^d[/math]. The Radon-Nikodym theorem says there exists [math]P:\R^d\to\R[/math], measurable such that for al [math]B\in \B(\R^d)[/math]
[[math]] \p_X[B]=\int_BP(x)dx. [[/math]]In particular, [math]\int_{\R^d}P(x)dx=\p_X(\R^d)=1[/math]. Moreover the map [math]P[/math] is unique up to sets of lebesgue measure 0. [math]P[/math] is called the density of [math]X[/math]. If [math]d=1[/math], then[[math]] \p[\alpha\leq X\leq \beta]=\p_X[[\alpha,\beta]]=\int_\alpha^\beta P(x)dx. [[/math]]
Let [math](\Omega,\A,\p)[/math] be a probability space. Let [math]X[/math] be a real valued r.v. (i.e. with values in [math]\R[/math]). The expectation of such a r.v. is defined as
- If [math]x\geq 0[/math], and then [math]\E[X]\in[0,\infty][/math].
- If [math]\E[X]=\int_{\Omega}\vert X(\omega)\vert d\p(\omega) \lt \infty[/math].
We extend this definition to the case of a r.v. [math]X=(X_1,...,X_d)[/math] taking values in [math]\R^d[/math] by defining
provided each [math]\E[X_i][/math] is well defined.
If [math]B\in\A[/math] and [math]X=\one_{B}[/math], then
The expectation is a special case of an integral with respect to a positive measure. In particular,
- For all [math]X,Y[/math] integrable and [math]a,b\in\R[/math] we have
[[math]] \E[aX+bY]=a\E[X]+b\E[Y]. [[/math]]
- If [math]C[/math] is a constant and [math]\E[X]=C[/math], then
[[math]] \int_\Omega Cd\p(\omega)=C\p[\Omega]=C. [[/math]]
- If [math]X\geq 0[/math] and [math]\E[X]\geq 0[/math] and if [math]X\leq Y[/math] both integrable then
[[math]] \E[X]\leq \E[Y]. [[/math]]
- (Monotne convergence) If [math](X_n)_{n\geq 1}[/math] is a sequence of real valued r.v.'s, and if [math]X_n\geq 0[/math] for all [math]n\geq 1[/math] and [math]X_n\uparrow X[/math] as [math]n\to\infty[/math], then
[[math]] \E[X_n]\uparrow\E[X]\text{as $n\to\infty$}. [[/math]]}
- (Fatou) If [math](X_n)_{n\geq 1}[/math] is a sequence of real valued r.v.'s with [math]X_n\geq 0[/math] for all [math]n\geq 1[/math], then
[[math]] \E\left[\liminf_{n\to\infty}X_n\right]\leq \liminf_{n\to\infty}\E[X_n]. [[/math]]}
- (Dominated convergence) If [math](X_n)_{n\geq 1}[/math] is a sequence of real valued r.v.'s with [math]\vert X_n\vert \leq Z[/math] for all [math]n\geq 1[/math], such that [math]\E[Z] \lt \infty[/math], for another real valued r.v. [math]Z[/math], and [math]X_n\xrightarrow{n\to\infty}X[/math] a.e., then
[[math]] \E[X_n]\xrightarrow{n\to\infty}\E[X]. [[/math]]
In probability theory we say almost sure convergence and write a.s., rather than almost everywhere. If [math]X_n\xrightarrow{n\to\infty}X[/math] a.s., then we mean
Let [math]X[/math] be a r.v. with values in [math](E,\mathcal{E})[/math]. If [math]f:E\to [0,\infty][/math] is measurable, then
[math]f(X)[/math] is also a r.v.
[Proof of Proposition] In the case [math]f=\one_B[/math] with [math]B\in\mathcal{E}[/math] we get that
One often uses the proposition to compute the law of a r.v. [math]X[/math]. If one is able to write [math]\E[X]=\int f d\nu[/math] for a sufficiently large class of functions [math]f[/math], then one can deduce that [math]\p_X=\nu[/math]. The idea is to be able to take [math]f=\one_B[/math], for then [math]\E[f(X)]=\p_X[B]=\nu(B)[/math].
Example
Assume that [math]\p_X[/math] is absolutely continuous with density [math]h(x)=\frac{1}{\sqrt{2\pi}}e^{-\frac{x^2}{2}}[/math] for [math]x\in\R[/math] and [math]Y=X^2[/math]. Then one can ask about the distribution of [math]Y[/math]. Let [math]f:\R\to[0,\infty][/math] be measurable. Then
We can write
Now we can set [math]y=x^2[/math]. Then [math]dy=2xdx[/math] and hence [math]dx=\frac{dy}{2\sqrt{y}}[/math]. Now we can write
which implies that
So we see that the distribution of [math]Y[/math] is given by [math]\frac{e^{-\frac{y}{2}}}{\sqrt{2\pi y}}\one_{\{y \gt 0\}}[/math].
Let [math]X=(X_1,...,X_d)\in\R^d[/math] be a r.v. Assume that [math]X[/math] has density [math]P(x_1,...,x_d).[/math] Then [math]\forall j\in\{1,...,n\}[/math], [math]X_j[/math] has density
Let [math]d=2[/math] and [math]X=(X_1,X_2)[/math]. Then [math]P_1(x)=\int_\R P(x,y)dy[/math] and [math]P_2(x)=\int_\R P(x,y)dx[/math].
[Proof of Proposition] Let [math]\pi_j:(x_1,...,x_d)\mapsto x_j[/math]. From Fubini's theorem we get that [math]\forall f:\R\to \R^+[/math], Borel measurable
If [math]X=(X_1,...,X_d)\in\R^d[/math] is a r.v., then the distribution [math]\p_{X_j}[/math] are called the margins of [math]X[/math]. The last proposition shows us that the margins are determined by
General references
Moshayedi, Nima (2020). "Lectures on Probability Theory". arXiv:2010.16280 [math.PR].