guide:42147c03f5: Difference between revisions
No edit summary |
mNo edit summary |
||
Line 18: | Line 18: | ||
\newcommand{\CY}{\mathcal{Y}} | \newcommand{\CY}{\mathcal{Y}} | ||
\newcommand{\F}{\mathcal{F}} | \newcommand{\F}{\mathcal{F}} | ||
\newcommand{\mathds}{\mathbb}</math></div> | \newcommand{\mathds}{\mathbb} | ||
</math> | |||
</div> | |||
{{proofcard|Theorem|thm-1|Let <math>(\Omega,\F,\p)</math> be a probability space. Let <math>\mathcal{G}_1\subset\F</math> and <math>\mathcal{G}_2\subset\F</math> be two sub <math>\sigma</math>-Algebras of <math>\F</math>. Then <math>\mathcal{G}_1</math> and <math>\mathcal{G}_2</math> are independent if and only if for every positive and <math>\mathcal{G}_2</math>-measurable r.v. <math>X</math> (or for <math>X\in L^1(\Omega,\mathcal{G}_2,\p)</math> or <math>X=\one_A</math> for <math>A\in \mathcal{G}_2</math>) we have | {{proofcard|Theorem|thm-1|Let <math>(\Omega,\F,\p)</math> be a probability space. Let <math>\mathcal{G}_1\subset\F</math> and <math>\mathcal{G}_2\subset\F</math> be two sub <math>\sigma</math>-Algebras of <math>\F</math>. Then <math>\mathcal{G}_1</math> and <math>\mathcal{G}_2</math> are independent if and only if for every positive and <math>\mathcal{G}_2</math>-measurable r.v. <math>X</math> (or for <math>X\in L^1(\Omega,\mathcal{G}_2,\p)</math> or <math>X=\one_A</math> for <math>A\in \mathcal{G}_2</math>) we have | ||
<math display="block"> | <math display="block"> | ||
\E[X\mid\mathcal{G}_2]=\E[X]. | \E[X\mid\mathcal{G}_2]=\E[X]. | ||
</math> | </math>|We only need to prove that the statement in the bracket implies that <math>\mathcal{G}_1</math> and <math>\mathcal{G}_2</math> are independent. Assume that for all <math>A\in \mathcal{G}_2</math> we have that | ||
|We only need to prove that the statement in the bracket implies that <math>\mathcal{G}_1</math> and <math>\mathcal{G}_2</math> are independent. Assume that for all <math>A\in \mathcal{G}_2</math> we have that | |||
<math display="block"> | <math display="block"> | ||
Line 35: | Line 37: | ||
</math> | </math> | ||
Note that <math>\E[\one_A\mid\mathcal{G}_1]=\p[A]</math> and therefore <math>\E[\one_B\one_A]=\p[A\cap B]=\p[A]\E[\one_B]=\p[A]\p[B]</math> and hence the claim follows.}} | Note that <math>\E[\one_A\mid\mathcal{G}_1]=\p[A]</math> and therefore <math>\E[\one_B\one_A]=\p[A\cap B]=\p[A]\E[\one_B]=\p[A]\p[B]</math> and hence the claim follows.}} | ||
{{alert-info | | |||
Let <math>Z</math> and <math>Y</math> be two real valued r.v.'s. Then <math>Z</math> and <math>Y</math> are independent if and only if for all <math>h</math> Borel measurable, such that <math>\E[\vert h(Z)\vert] < \infty</math>, we get <math>\E[h(Z)\mid Y]=\E[h(Z)]</math>. To see this we can apply the theorem with <math>\mathcal{G}_2=\sigma(Z)</math> and note that all r.v.'s in <math>L^1(\Omega,\mathcal{G}_2,\p)</math> are of the form <math>h(Z)</math> with <math>\E[\vert h(Z)\vert] < \infty</math>. In particular, if <math>Z\in L^1(\Omega,\F,\p)</math>, we get <math>\E[Z\mid Y]=\E[Z]</math>. Be aware that the latter equation does not imply that <math>Y</math> and <math>Z</math> are independent. For example take <math>Z\sim\mathcal{N}(0,1)</math> and <math>Y=\vert Z\vert</math>. Now for all <math>h</math> with <math>\E[\vert h(\vert Z\vert)\vert] < \infty</math> we get <math>\E[Zh(\vert Z\vert )]=0</math>. Thus <math>\E[Z\mid \vert Z\vert]=0</math>, but <math>Z</math> and <math>\vert Z\vert</math> are not independent. | {{alert-info | Let <math>Z</math> and <math>Y</math> be two real valued r.v.'s. Then <math>Z</math> and <math>Y</math> are independent if and only if for all <math>h</math> Borel measurable, such that <math>\E[\vert h(Z)\vert] < \infty</math>, we get <math>\E[h(Z)\mid Y]=\E[h(Z)]</math>. To see this we can apply the theorem with <math>\mathcal{G}_2=\sigma(Z)</math> and note that all r.v.'s in <math>L^1(\Omega,\mathcal{G}_2,\p)</math> are of the form <math>h(Z)</math> with <math>\E[\vert h(Z)\vert] < \infty</math>. In particular, if <math>Z\in L^1(\Omega,\F,\p)</math>, we get <math>\E[Z\mid Y]=\E[Z]</math>. Be aware that the latter equation does not imply that <math>Y</math> and <math>Z</math> are independent. For example take <math>Z\sim\mathcal{N}(0,1)</math> and <math>Y=\vert Z\vert</math>. Now for all <math>h</math> with <math>\E[\vert h(\vert Z\vert)\vert] < \infty</math> we get <math>\E[Zh(\vert Z\vert )]=0</math>. Thus <math>\E[Z\mid \vert Z\vert]=0</math>, but <math>Z</math> and <math>\vert Z\vert</math> are not independent.}} | ||
}} | |||
{{proofcard|Theorem|thm-2|Let <math>(\Omega,\F,\p)</math> be a probability space. Let <math>X</math> and <math>Y</math> be two r.v.'s on that space with values in the same measure space <math>E</math> and <math>F</math>. Assume that <math>X</math> is independent of the sub <math>\sigma</math>-Algebra <math>\mathcal{G}\subset \F</math> and that <math>Y</math> is <math>\mathcal{G}</math>-measurable. Then for every measurable map <math>g:E\times F\to \R_+</math> we have | {{proofcard|Theorem|thm-2|Let <math>(\Omega,\F,\p)</math> be a probability space. Let <math>X</math> and <math>Y</math> be two r.v.'s on that space with values in the same measure space <math>E</math> and <math>F</math>. Assume that <math>X</math> is independent of the sub <math>\sigma</math>-Algebra <math>\mathcal{G}\subset \F</math> and that <math>Y</math> is <math>\mathcal{G}</math>-measurable. Then for every measurable map <math>g:E\times F\to \R_+</math> we have | ||
Line 48: | Line 49: | ||
<math display="block"> | <math display="block"> | ||
\phi:y\mapsto \int_E g(x,y)d\p_X(x). | \phi:y\mapsto \int_E g(x,y)d\p_X(x). | ||
</math> | </math>|We need to show that for all <math>\mathcal{G}</math>-measurable r.v. <math>Z</math> we get that | ||
|We need to show that for all <math>\mathcal{G}</math>-measurable r.v. <math>Z</math> we get that | |||
<math display="block"> | <math display="block"> | ||
Line 63: | Line 62: | ||
\end{align*} | \end{align*} | ||
</math>}} | </math>}} | ||
==Important examples== | |||
We need to take a look at two important examples. | |||
===Variables with densities=== | |||
Let <math>(X,Y)\in \R^m\times \R^n</math>. Assume that <math>(X,Y)</math> has density <math>P(x,y)</math>, i.e. for all Borel measurable maps <math>h:\R^m\times\R^n\to \R_+</math> we have | Let <math>(X,Y)\in \R^m\times \R^n</math>. Assume that <math>(X,Y)</math> has density <math>P(x,y)</math>, i.e. for all Borel measurable maps <math>h:\R^m\times\R^n\to \R_+</math> we have | ||
Line 79: | Line 83: | ||
<math display="block"> | <math display="block"> | ||
\begin{align* | \begin{align*} | ||
\E[h(X)g(Y)&=\int_{\R^m\times\R^n}h(x)g(y)P(x,y)dxdy=\int_{\R^n}\left(\int_{\R^m} h(x)P(x,y)dx\right) g(y)dy\\ | \E[h(X)g(Y)&=\int_{\R^m\times\R^n}h(x)g(y)P(x,y)dxdy=\int_{\R^n}\left(\int_{\R^m} h(x)P(x,y)dx\right) g(y)dy\\ | ||
&=\int_{\R^n}\frac{1}{Q(y)}\left(\int_{\R^m}h(x)P(x,y)dx\right)g(y)Q(y)\one_{\{Q(y) > 0\}}dy\\ | &=\int_{\R^n}\frac{1}{Q(y)}\left(\int_{\R^m}h(x)P(x,y)dx\right)g(y)Q(y)\one_{\{Q(y) > 0\}}dy\\ | ||
Line 85: | Line 89: | ||
\end{align*} | \end{align*} | ||
</math> | </math> | ||
where | where | ||
Line 98: | Line 101: | ||
\nu(x,dy)=\begin{cases}\frac{1}{Q(y)}P(x,y)& Q(y) > 0\\ \delta_0(dx)& Q(y)=0\end{cases} | \nu(x,dy)=\begin{cases}\frac{1}{Q(y)}P(x,y)& Q(y) > 0\\ \delta_0(dx)& Q(y)=0\end{cases} | ||
</math> | </math> | ||
Then for all measurable maps <math>h:\R^m\to\R_+</math> we get | Then for all measurable maps <math>h:\R^m\to\R_+</math> we get | ||
Line 109: | Line 113: | ||
</math>|}} | </math>|}} | ||
{{alert-info | | {{alert-info | In the literature, one abusively note | ||
In the literature, one abusively note | |||
<math display="block"> | <math display="block"> | ||
\E[h(X)\mid Y=y]=\int_{\R^m}h(x)\nu(y,dx), | \E[h(X)\mid Y=y]=\int_{\R^m}h(x)\nu(y,dx), | ||
</math> | </math> | ||
and <math>\nu(y,dx)</math> is called the <math>conditional</math> <math>distribution</math> of <math>X</math> given <math>Y=y</math> (even though in general we have <math>\p[Y=y]=0</math>). | and <math>\nu(y,dx)</math> is called the <math>conditional</math> <math>distribution</math> of <math>X</math> given <math>Y=y</math> (even though in general we have <math>\p[Y=y]=0</math>).}} | ||
===The Gaussian case=== | |||
Let <math>(\Omega,\F,\p)</math> be a probability space. Let <math>X,Y_1,...,Y_p\in L^2(\Omega,\F,\p)</math>. We saw that <math>\E[X\mid Y_1,...,Y_p]</math> is the orthogonal projection of <math>X</math> on <math>L^2(\Omega,\sigma(Y_1,...,Y_p),\p)</math>. Since this conditional expectation is <math>\sigma(Y_1,...,Y_p)</math>-measurable, it is of the form <math>\varphi(Y_1,...,Y_p)</math>. In general, <math>L^2(\Omega,\sigma(Y_1,...,Y_p),\p)</math> is of infinite dimension, so it is bad to obtain <math>\varphi</math> explicitly. We also saw that <math>\varphi(Y_1,...,Y_p)</math> is the best approximation of <math>X</math> in the <math>L^2(\Omega,\sigma(Y_1,...,Y_p),\p)</math> sense by an element of <math>L^2(\Omega,\sigma(Y_1,...,Y_p),\p)</math>. Moreover, it is well known that the best <math>L^2</math>-approximation of <math>X</math> by an affine function of <math>\one,Y_1,...,Y_p</math> is the best orthogonal projection of <math>X</math> on the vector space <math>\{\one,Y_1,...,Y_p\}</math>, i.e. | Let <math>(\Omega,\F,\p)</math> be a probability space. Let <math>X,Y_1,...,Y_p\in L^2(\Omega,\F,\p)</math>. We saw that <math>\E[X\mid Y_1,...,Y_p]</math> is the orthogonal projection of <math>X</math> on <math>L^2(\Omega,\sigma(Y_1,...,Y_p),\p)</math>. Since this conditional expectation is <math>\sigma(Y_1,...,Y_p)</math>-measurable, it is of the form <math>\varphi(Y_1,...,Y_p)</math>. In general, <math>L^2(\Omega,\sigma(Y_1,...,Y_p),\p)</math> is of infinite dimension, so it is bad to obtain <math>\varphi</math> explicitly. We also saw that <math>\varphi(Y_1,...,Y_p)</math> is the best approximation of <math>X</math> in the <math>L^2(\Omega,\sigma(Y_1,...,Y_p),\p)</math> sense by an element of <math>L^2(\Omega,\sigma(Y_1,...,Y_p),\p)</math>. Moreover, it is well known that the best <math>L^2</math>-approximation of <math>X</math> by an affine function of <math>\one,Y_1,...,Y_p</math> is the best orthogonal projection of <math>X</math> on the vector space <math>\{\one,Y_1,...,Y_p\}</math>, i.e. | ||
Line 125: | Line 128: | ||
</math> | </math> | ||
In general, this is different from the orthogonal projection on <math>L^2(\Omega,\sigma(Y_1,...,Y_p),\p)</math>, except in the Gaussian case. | In general, this is different from the orthogonal projection on <math>L^2(\Omega,\sigma(Y_1,...,Y_p),\p)</math>, except in the Gaussian case. | ||
==General references== | ==General references== | ||
{{cite arXiv|last=Moshayedi|first=Nima|year=2020|title=Lectures on Probability Theory|eprint=2010.16280|class=math.PR}} | {{cite arXiv|last=Moshayedi|first=Nima|year=2020|title=Lectures on Probability Theory|eprint=2010.16280|class=math.PR}} |
Latest revision as of 20:30, 8 May 2024
Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math]\mathcal{G}_1\subset\F[/math] and [math]\mathcal{G}_2\subset\F[/math] be two sub [math]\sigma[/math]-Algebras of [math]\F[/math]. Then [math]\mathcal{G}_1[/math] and [math]\mathcal{G}_2[/math] are independent if and only if for every positive and [math]\mathcal{G}_2[/math]-measurable r.v. [math]X[/math] (or for [math]X\in L^1(\Omega,\mathcal{G}_2,\p)[/math] or [math]X=\one_A[/math] for [math]A\in \mathcal{G}_2[/math]) we have
We only need to prove that the statement in the bracket implies that [math]\mathcal{G}_1[/math] and [math]\mathcal{G}_2[/math] are independent. Assume that for all [math]A\in \mathcal{G}_2[/math] we have that
Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math]X[/math] and [math]Y[/math] be two r.v.'s on that space with values in the same measure space [math]E[/math] and [math]F[/math]. Assume that [math]X[/math] is independent of the sub [math]\sigma[/math]-Algebra [math]\mathcal{G}\subset \F[/math] and that [math]Y[/math] is [math]\mathcal{G}[/math]-measurable. Then for every measurable map [math]g:E\times F\to \R_+[/math] we have
We need to show that for all [math]\mathcal{G}[/math]-measurable r.v. [math]Z[/math] we get that
Important examples
We need to take a look at two important examples.
Variables with densities
Let [math](X,Y)\in \R^m\times \R^n[/math]. Assume that [math](X,Y)[/math] has density [math]P(x,y)[/math], i.e. for all Borel measurable maps [math]h:\R^m\times\R^n\to \R_+[/math] we have
The density of [math]Y[/math] is given by
We want to compute [math]\E[h(X)\mid Y][/math] for some measurable map [math]h:\R^m\to\R_+[/math]. Therefore we have
where
For [math]Y\in\R^n[/math], let [math]\nu(y,dx)[/math] be the probability measure on [math]\R^n[/math] defined by
Then for all measurable maps [math]h:\R^m\to\R_+[/math] we get
The Gaussian case
Let [math](\Omega,\F,\p)[/math] be a probability space. Let [math]X,Y_1,...,Y_p\in L^2(\Omega,\F,\p)[/math]. We saw that [math]\E[X\mid Y_1,...,Y_p][/math] is the orthogonal projection of [math]X[/math] on [math]L^2(\Omega,\sigma(Y_1,...,Y_p),\p)[/math]. Since this conditional expectation is [math]\sigma(Y_1,...,Y_p)[/math]-measurable, it is of the form [math]\varphi(Y_1,...,Y_p)[/math]. In general, [math]L^2(\Omega,\sigma(Y_1,...,Y_p),\p)[/math] is of infinite dimension, so it is bad to obtain [math]\varphi[/math] explicitly. We also saw that [math]\varphi(Y_1,...,Y_p)[/math] is the best approximation of [math]X[/math] in the [math]L^2(\Omega,\sigma(Y_1,...,Y_p),\p)[/math] sense by an element of [math]L^2(\Omega,\sigma(Y_1,...,Y_p),\p)[/math]. Moreover, it is well known that the best [math]L^2[/math]-approximation of [math]X[/math] by an affine function of [math]\one,Y_1,...,Y_p[/math] is the best orthogonal projection of [math]X[/math] on the vector space [math]\{\one,Y_1,...,Y_p\}[/math], i.e.
In general, this is different from the orthogonal projection on [math]L^2(\Omega,\sigma(Y_1,...,Y_p),\p)[/math], except in the Gaussian case.
General references
Moshayedi, Nima (2020). "Lectures on Probability Theory". arXiv:2010.16280 [math.PR].