guide:B33a8d8a51: Difference between revisions

From Stochiki
No edit summary
 
No edit summary
Line 1: Line 1:
<div class="d-none"><math>
\newcommand{\R}{\mathbb{R}}
\newcommand{\A}{\mathcal{A}}
\newcommand{\B}{\mathcal{B}}
\newcommand{\N}{\mathbb{N}}
\newcommand{\C}{\mathbb{C}}
\newcommand{\Rbar}{\overline{\mathbb{R}}}
\newcommand{\Bbar}{\overline{\mathcal{B}}}
\newcommand{\Q}{\mathbb{Q}}
\newcommand{\E}{\mathbb{E}}
\newcommand{\p}{\mathbb{P}}
\newcommand{\one}{\mathds{1}}
\newcommand{\0}{\mathcal{O}}
\newcommand{\mat}{\textnormal{Mat}}
\newcommand{\sign}{\textnormal{sign}}
\newcommand{\CP}{\mathcal{P}}
\newcommand{\CT}{\mathcal{T}}
\newcommand{\CY}{\mathcal{Y}}
\newcommand{\F}{\mathcal{F}}
\newcommand{\mathds}{\mathbb}</math></div>
===Moments and Variance===
Let <math>(\Omega,\A,\p)</math> be a probability space. Let <math>X</math> be a r.v. and let <math>p\geq 1</math> be an integer (or even a real number). The <math>p</math>-th moment of <math>X</math> is by definition <math>\E[X^p]</math>, which is well defined when <math>X\geq 0</math> or <math>\E[\vert X\vert^p] < \infty</math>, which is by definition


<math display="block">
\E[\vert X\vert^p]=\int_\Omega\vert X(\omega)\vert^pd\p(\omega) < \infty.
</math>
When <math>p=1</math>, we get the expected value. We say that <math>X</math> is <math>centered</math> if <math>\E[X]=0</math>. The spaces <math>L^p(\Omega,\A,\p)</math> for <math>p\in[1,\infty)</math> are defined as we have seen in the course <math>Measure</math> <math>and</math> <math>Integral</math>. From Hölder's inequality we can observe that
<math display="block">
\E[\vert XY\vert]\leq \E[\vert X\vert^p]^{\frac{1}{p}}\E[\vert Y\vert^q]^{\frac{1}{q}},
</math>
whenever <math>p,q\geq 1</math> and <math>\frac{1}{p}+\frac{1}{q}=1</math>. If we take <math>Y=1</math> above, we obtain
<math display="block">
\E[X]\leq \E[\vert X\vert^p]^{\frac{1}{p}},
</math>
which means <math>\|X\|_1\leq \|X\|_p</math>. This can be extended as <math>\|X\|_r\leq \|X\|_p</math> if <math>r\leq p</math>. So it follows that <math>L^p(\Omega,\A,\p)\subset L^r(\Omega,\A,\p)</math>. For <math>p=q=2</math> we get the Cauchy-Schwarz inequality as follows
<math display="block">
\E[\vert XY\vert]\leq \E[\vert X\vert^2]^{\frac{1}{2}}\E[\vert Y\vert^2]^{\frac{1}{2}}.
</math>
With <math>Y=1</math> we have <math>\E[\vert X\vert]^2=\E[X^2]</math>.
{{definitioncard|Variance|Let <math>(\Omega,\A,\p)</math> be a probability space. Consider a r.v. <math>X\in L^2(\Omega,\A,\p)</math>. The variance of <math>X</math> is defined as
<math display="block">
Var(X)=\E[(X-\E[X])^2]
</math>
and the standard deviation of <math>X</math> is given by
<math display="block">
\sigma_X=\sqrt{Var(X)}.
</math>
}}
{{alert-info |
Informally, the variance represents the deviation of <math>X</math> around its mean <math>\E[X]</math>. Note that <math>Var(X)=0</math> if and only if <math>X</math> is a.s. constant.
}}
{{proofcard|Proposition|prop-1|
<math display="block">
Var(X)=\E[X^2]-\E[X]^2\text{and for all $a\in\R$ we get}\E[(X-a)^2]=Var(X)+(\E[X]-a)^2.
</math>
Consequently, we get
<math display="block">
Var(X)=\inf_{a\in R}\E[(X-a)^2].
</math>
|
<math display="block">
Var(X)=\E[(X-\E[X])^2]=\E[X^2-\{\E[X]X+\E[X]^2\}]=\E[X^2]-2\E[X]\E[X]+\E[X]^2=\E[X^2]-\E[X]^2
</math>
Moreover, we have
<math display="block">
\E[(X-a)^2]=\E[(X-\E[X])+(\E[X]-a)^2]=Var(X)+(\E[X]-a)^2,
</math>
which implies that for all <math>a\in\R</math>
<math display="block">
\E[(X-a)^2]\geq Var(X)
</math>
and there is equality when <math>a=\E[X]</math>.}}
{{alert-info |
\label{random3}
It follows that if <math>X</math> is centered (i.e. <math>\E[X]=0</math>), we get <math>Var(X)=\E[X^2]</math>. Moreover, the following two simple inequalities are very often used.
<ul style{{=}}"list-style-type:lower-roman"><li>(''Markov inequality'') If <math>X\geq 0</math> and <math>a\geq 0</math> then
<math display="block">
\boxed{\p[X > a]\leq \frac{1}{a}\E[X]}.
</math>
</li>
<li>(''Tchebisheff inequality'')
<math display="block">
\boxed{\p[\vert X-\E[X]\vert > a]\leq \frac{1}{a^2}Var(X)}.
</math>
</li>
</ul>
}}
\begin{proof}[Proof of [[#random3 |Remark]]] We want to show both inequalities.
<ul style{{=}}"list-style-type:lower-roman"><li>Note that
<math display="block">
\p[X\geq a]=\E[\one_{\{X\geq a\}}]\leq \E\left[\frac{X}{a}\underbrace{\one_{\{X\geq a\}}}_{\leq 1}\right]\leq \E\left[\frac{X}{a}\right].
</math>
</li>
<li>This follows from (1) because <math>\vert X-\E[X]\vert</math> is a positive r.v. and hence
<math display="block">
\p[\vert X-\E[X]\vert\geq a]=\p[\vert X-\E[X]\vert^2\geq a^2]\leq \frac{1}{a^2}\underbrace{\E[\vert X-\E[X]\vert^2]}_{Var(X)}.
</math>
</li>
</ul>
\end{proof}
{{definitioncard|Covariance|Let <math>(\Omega,\A,\p)</math> be a probability space. Consider two r.v.'s <math>X,Y\in L^2(\Omega,\A,\p)</math>. The covariance of <math>X</math> and <math>Y</math> is defined as
<math display="block">
Cov(X,Y)=\E[(X-\E[X])(Y-\E[Y])]=\E[XY]-\E[X]\E[Y].
</math>
}}
If <math>X=(X_1,...,X_d)\in\R^d</math> is a r.v. such that <math>\forall i\in\{1,...,d\}</math>, <math>X_i\in L^2(\Omega,\A,\p)</math>, then the covariance matrix of <math>X</math> is defined as
<math display="block">
K_X=\left( Cov(X_i,X_j)\right)_{1\leq i,j\leq d}.
</math>
Informally speaking, the covariance between <math>X</math> and <math>Y</math> measures the correlation between <math>X</math> and <math>Y</math>. Note that <math>Cov(X,X)=Var(X)</math> and from Cauchy-Schwarz we get
<math display="block">
\left\vert Cov(X,Y)\right\vert\leq \sqrt{Var(X)}\cdot\sqrt{Var(Y)}.
</math>
The application <math>(X,Y)\mapsto Cov(X,Y)</math> is a bilinear form on <math>L^2(\Omega,\A,\p)</math>. We also note that <math>K_X</math> is symmetric and positive, i.e. if <math>\lambda_1,...,\lambda_d\in\R</math>, <math>\lambda=(\lambda_1,...,\lambda_d)^T</math>, then
<math display="block">
\left\langle K_X\lambda,\lambda\right\rangle=\sum_{i,j=1}^d\lambda_i\lambda_jCov(X_i,X_j)\geq 0.
</math>
So we get
<math display="block">
\begin{align*}
\sum_{i,j=1}^d\lambda_i\lambda_jCov(X_i,X_j)&=Var\left(\sum_{j=1}^d\lambda_jX_j\right)=\E\left[\left(\sum_{j=1}^d\lambda_jX_j-\E\left[\sum_{j=1}^d\lambda_j X_j\right]\right)^2\right]\\
&=\E\left[\left(\sum_{j=1}^d\lambda_jX_j-\sum_{j=1}^d\lambda_j\E[X_j]\right)^2\right]=\E\left[\left(\sum_{j=1}^d (X_j-\E[X_j])\right)^2\right]\\
&=\E\left[\sum_{j=1}\lambda_j(X_j-\E[X_j])\sum_{i=1}^d\lambda_i(X_i-\E[X_i])\right]\\
&=\sum_{i,j=1}^d\lambda_i\lambda_j\E[(X_i-\E[X_i])(X_j-\E[X_j])]\geq 0
\end{align*}
</math>
\begin{exer} If <math>A</math> is a matrix of size <math>n\times d</math>, and <math>Y=AX</math>, then prove that
<math display="block">
K_Y=AK_XA^T.
</math>
\end{exer}
{{alert-info |
Set <math>X=(X_1,...,X_d)^T</math> and <math>XX^T=(X_iX_j)_{1\leq i,j\leq n}</math>, then informally
<math display="block">
K_X=\E[XX^T]=(\E[X_iX_j])_{1\leq i,j\leq n},
</math>
and for <math>Y=AX</math> we get
<math display="block">
K_Y=\E[AXX^TA^T]=A\E[XX^T]A^T=AK_XA^T.
</math>
}}
===Linear Regression===
Let <math>(\Omega,\A,\p)</math> be a probability space. Let <math>X,Y_1,...,Y_n</math> be r.v.'s in <math>L^2(\Omega,\A,\p)</math>. We want the best approximation of <math>X</math> as an affine function of <math>Y_1,...,Y_n</math>. More precisely we want to minimize
<math display="block">
\E[(X-\E[\beta_0+\beta_1Y_1+...+\beta_nY_n])^2]
</math>
over all possible choices of <math>(\beta_0,\beta_1,...,\beta_n)</math>.
{{proofcard|Proposition|prop-2|Let <math>(\Omega,\A,\p)</math> be a probability space. Let <math>X,Y\in L^1(\Omega,\A,\p)</math> be two r.v.'s. Then
<math display="block">
\inf_{(\beta_0,\beta_1,...,\beta_n)\in\R^{n+1}}\E[(X-(\beta_0+\beta_1Y_1+...+\beta_nY_n))^2]=\E[(X-Z)^2],
</math>
where <math>Z=\E[X]+\sum_{j=1}^n\alpha_j(Y_j-\E[Y_j])</math> and the <math>\alpha_j</math>'s are solutions to the system
<math display="block">
\sum_{j=1}^n\alpha_jCov(Y_j,Y_k)=Cov(X,Y_k)_{1\leq k\leq n}.
</math>
In particular if <math>K_Y</math> is invertible, we have <math>\alpha=Cov(X,Y)K_Y^{-1}</math>, where
<math display="block">
Cov(X,Y)=\begin{pmatrix}Cov(X,Y_1)\\ \vdots \\ Cov(X,Y_n)\end{pmatrix}
</math>
.
|Let <math>H</math> be the linear subspace of <math>L^2(\Omega,\A,\p)</math> spanned by <math>\{1,Y_1,...,Y_n\}</math>. Then we know that the r.v. <math>Z</math>, which minimizes
<math display="block">
\|X-U\|_2=\E[(X-U)^2]
</math>
for <math>U\in H</math>, is the orthogonal projection of <math>X</math> on <math>H</math>. We can thus write
<math display="block">
Z=\alpha_0+\sum_{j=1}^n\alpha_j(Y_j-\E[Y_j]).
</math>
The orthogonality of <math>X-Z</math> to <math>H</math> can be written as <math>\E[(X-Z)\cdot 1]=0</math>. Therefore <math>\E[X]=\E[Z]</math> and thus <math>\alpha_0=\E[X]</math>. Moreover, we get <math>\E[(X-Z)(Y_k-\E[Y_k])]=0</math>, which implies that for all <math>k\in\{1,...,n\}</math> we get <math>\E[(X-\E[X]+\E[Z]-Z)]\cdot(Y_k-\E[Y_k])=0</math>.}}
{{alert-info |
When <math>n=1</math>, we have
<math display="block">
Z=\E[X]+\frac{Cov(X,Y)}{Var(Y)}(Y-\E[Y]).
</math>
}}
==General references==
{{cite arXiv|last=Moshayedi|first=Nima|year=2020|title=Lectures on Probability Theory|eprint=2010.16280|class=math.PR}}

Revision as of 01:53, 8 May 2024

[math] \newcommand{\R}{\mathbb{R}} \newcommand{\A}{\mathcal{A}} \newcommand{\B}{\mathcal{B}} \newcommand{\N}{\mathbb{N}} \newcommand{\C}{\mathbb{C}} \newcommand{\Rbar}{\overline{\mathbb{R}}} \newcommand{\Bbar}{\overline{\mathcal{B}}} \newcommand{\Q}{\mathbb{Q}} \newcommand{\E}{\mathbb{E}} \newcommand{\p}{\mathbb{P}} \newcommand{\one}{\mathds{1}} \newcommand{\0}{\mathcal{O}} \newcommand{\mat}{\textnormal{Mat}} \newcommand{\sign}{\textnormal{sign}} \newcommand{\CP}{\mathcal{P}} \newcommand{\CT}{\mathcal{T}} \newcommand{\CY}{\mathcal{Y}} \newcommand{\F}{\mathcal{F}} \newcommand{\mathds}{\mathbb}[/math]

Moments and Variance

Let [math](\Omega,\A,\p)[/math] be a probability space. Let [math]X[/math] be a r.v. and let [math]p\geq 1[/math] be an integer (or even a real number). The [math]p[/math]-th moment of [math]X[/math] is by definition [math]\E[X^p][/math], which is well defined when [math]X\geq 0[/math] or [math]\E[\vert X\vert^p] \lt \infty[/math], which is by definition

[[math]] \E[\vert X\vert^p]=\int_\Omega\vert X(\omega)\vert^pd\p(\omega) \lt \infty. [[/math]]

When [math]p=1[/math], we get the expected value. We say that [math]X[/math] is [math]centered[/math] if [math]\E[X]=0[/math]. The spaces [math]L^p(\Omega,\A,\p)[/math] for [math]p\in[1,\infty)[/math] are defined as we have seen in the course [math]Measure[/math] [math]and[/math] [math]Integral[/math]. From Hölder's inequality we can observe that

[[math]] \E[\vert XY\vert]\leq \E[\vert X\vert^p]^{\frac{1}{p}}\E[\vert Y\vert^q]^{\frac{1}{q}}, [[/math]]

whenever [math]p,q\geq 1[/math] and [math]\frac{1}{p}+\frac{1}{q}=1[/math]. If we take [math]Y=1[/math] above, we obtain

[[math]] \E[X]\leq \E[\vert X\vert^p]^{\frac{1}{p}}, [[/math]]

which means [math]\|X\|_1\leq \|X\|_p[/math]. This can be extended as [math]\|X\|_r\leq \|X\|_p[/math] if [math]r\leq p[/math]. So it follows that [math]L^p(\Omega,\A,\p)\subset L^r(\Omega,\A,\p)[/math]. For [math]p=q=2[/math] we get the Cauchy-Schwarz inequality as follows

[[math]] \E[\vert XY\vert]\leq \E[\vert X\vert^2]^{\frac{1}{2}}\E[\vert Y\vert^2]^{\frac{1}{2}}. [[/math]]

With [math]Y=1[/math] we have [math]\E[\vert X\vert]^2=\E[X^2][/math].

Definition (Variance)

Let [math](\Omega,\A,\p)[/math] be a probability space. Consider a r.v. [math]X\in L^2(\Omega,\A,\p)[/math]. The variance of [math]X[/math] is defined as

[[math]] Var(X)=\E[(X-\E[X])^2] [[/math]]
and the standard deviation of [math]X[/math] is given by

[[math]] \sigma_X=\sqrt{Var(X)}. [[/math]]

Informally, the variance represents the deviation of [math]X[/math] around its mean [math]\E[X][/math]. Note that [math]Var(X)=0[/math] if and only if [math]X[/math] is a.s. constant.

Proposition


[[math]] Var(X)=\E[X^2]-\E[X]^2\text{and for all $a\in\R$ we get}\E[(X-a)^2]=Var(X)+(\E[X]-a)^2. [[/math]]

Consequently, we get

[[math]] Var(X)=\inf_{a\in R}\E[(X-a)^2]. [[/math]]


Show Proof

[[math]] Var(X)=\E[(X-\E[X])^2]=\E[X^2-\{\E[X]X+\E[X]^2\}]=\E[X^2]-2\E[X]\E[X]+\E[X]^2=\E[X^2]-\E[X]^2 [[/math]]
Moreover, we have

[[math]] \E[(X-a)^2]=\E[(X-\E[X])+(\E[X]-a)^2]=Var(X)+(\E[X]-a)^2, [[/math]]
which implies that for all [math]a\in\R[/math]

[[math]] \E[(X-a)^2]\geq Var(X) [[/math]]
and there is equality when [math]a=\E[X][/math].

\label{random3} It follows that if [math]X[/math] is centered (i.e. [math]\E[X]=0[/math]), we get [math]Var(X)=\E[X^2][/math]. Moreover, the following two simple inequalities are very often used.

  • (Markov inequality) If [math]X\geq 0[/math] and [math]a\geq 0[/math] then
    [[math]] \boxed{\p[X \gt a]\leq \frac{1}{a}\E[X]}. [[/math]]
  • (Tchebisheff inequality)
    [[math]] \boxed{\p[\vert X-\E[X]\vert \gt a]\leq \frac{1}{a^2}Var(X)}. [[/math]]

\begin{proof}[Proof of Remark] We want to show both inequalities.

  • Note that
    [[math]] \p[X\geq a]=\E[\one_{\{X\geq a\}}]\leq \E\left[\frac{X}{a}\underbrace{\one_{\{X\geq a\}}}_{\leq 1}\right]\leq \E\left[\frac{X}{a}\right]. [[/math]]
  • This follows from (1) because [math]\vert X-\E[X]\vert[/math] is a positive r.v. and hence
    [[math]] \p[\vert X-\E[X]\vert\geq a]=\p[\vert X-\E[X]\vert^2\geq a^2]\leq \frac{1}{a^2}\underbrace{\E[\vert X-\E[X]\vert^2]}_{Var(X)}. [[/math]]

\end{proof}

Definition (Covariance)

Let [math](\Omega,\A,\p)[/math] be a probability space. Consider two r.v.'s [math]X,Y\in L^2(\Omega,\A,\p)[/math]. The covariance of [math]X[/math] and [math]Y[/math] is defined as

[[math]] Cov(X,Y)=\E[(X-\E[X])(Y-\E[Y])]=\E[XY]-\E[X]\E[Y]. [[/math]]

If [math]X=(X_1,...,X_d)\in\R^d[/math] is a r.v. such that [math]\forall i\in\{1,...,d\}[/math], [math]X_i\in L^2(\Omega,\A,\p)[/math], then the covariance matrix of [math]X[/math] is defined as

[[math]] K_X=\left( Cov(X_i,X_j)\right)_{1\leq i,j\leq d}. [[/math]]

Informally speaking, the covariance between [math]X[/math] and [math]Y[/math] measures the correlation between [math]X[/math] and [math]Y[/math]. Note that [math]Cov(X,X)=Var(X)[/math] and from Cauchy-Schwarz we get

[[math]] \left\vert Cov(X,Y)\right\vert\leq \sqrt{Var(X)}\cdot\sqrt{Var(Y)}. [[/math]]

The application [math](X,Y)\mapsto Cov(X,Y)[/math] is a bilinear form on [math]L^2(\Omega,\A,\p)[/math]. We also note that [math]K_X[/math] is symmetric and positive, i.e. if [math]\lambda_1,...,\lambda_d\in\R[/math], [math]\lambda=(\lambda_1,...,\lambda_d)^T[/math], then

[[math]] \left\langle K_X\lambda,\lambda\right\rangle=\sum_{i,j=1}^d\lambda_i\lambda_jCov(X_i,X_j)\geq 0. [[/math]]

So we get

[[math]] \begin{align*} \sum_{i,j=1}^d\lambda_i\lambda_jCov(X_i,X_j)&=Var\left(\sum_{j=1}^d\lambda_jX_j\right)=\E\left[\left(\sum_{j=1}^d\lambda_jX_j-\E\left[\sum_{j=1}^d\lambda_j X_j\right]\right)^2\right]\\ &=\E\left[\left(\sum_{j=1}^d\lambda_jX_j-\sum_{j=1}^d\lambda_j\E[X_j]\right)^2\right]=\E\left[\left(\sum_{j=1}^d (X_j-\E[X_j])\right)^2\right]\\ &=\E\left[\sum_{j=1}\lambda_j(X_j-\E[X_j])\sum_{i=1}^d\lambda_i(X_i-\E[X_i])\right]\\ &=\sum_{i,j=1}^d\lambda_i\lambda_j\E[(X_i-\E[X_i])(X_j-\E[X_j])]\geq 0 \end{align*} [[/math]]

\begin{exer} If [math]A[/math] is a matrix of size [math]n\times d[/math], and [math]Y=AX[/math], then prove that

[[math]] K_Y=AK_XA^T. [[/math]]

\end{exer}

Set [math]X=(X_1,...,X_d)^T[/math] and [math]XX^T=(X_iX_j)_{1\leq i,j\leq n}[/math], then informally

[[math]] K_X=\E[XX^T]=(\E[X_iX_j])_{1\leq i,j\leq n}, [[/math]]
and for [math]Y=AX[/math] we get

[[math]] K_Y=\E[AXX^TA^T]=A\E[XX^T]A^T=AK_XA^T. [[/math]]

Linear Regression

Let [math](\Omega,\A,\p)[/math] be a probability space. Let [math]X,Y_1,...,Y_n[/math] be r.v.'s in [math]L^2(\Omega,\A,\p)[/math]. We want the best approximation of [math]X[/math] as an affine function of [math]Y_1,...,Y_n[/math]. More precisely we want to minimize

[[math]] \E[(X-\E[\beta_0+\beta_1Y_1+...+\beta_nY_n])^2] [[/math]]

over all possible choices of [math](\beta_0,\beta_1,...,\beta_n)[/math].

Proposition

Let [math](\Omega,\A,\p)[/math] be a probability space. Let [math]X,Y\in L^1(\Omega,\A,\p)[/math] be two r.v.'s. Then

[[math]] \inf_{(\beta_0,\beta_1,...,\beta_n)\in\R^{n+1}}\E[(X-(\beta_0+\beta_1Y_1+...+\beta_nY_n))^2]=\E[(X-Z)^2], [[/math]]
where [math]Z=\E[X]+\sum_{j=1}^n\alpha_j(Y_j-\E[Y_j])[/math] and the [math]\alpha_j[/math]'s are solutions to the system

[[math]] \sum_{j=1}^n\alpha_jCov(Y_j,Y_k)=Cov(X,Y_k)_{1\leq k\leq n}. [[/math]]
In particular if [math]K_Y[/math] is invertible, we have [math]\alpha=Cov(X,Y)K_Y^{-1}[/math], where

[[math]] Cov(X,Y)=\begin{pmatrix}Cov(X,Y_1)\\ \vdots \\ Cov(X,Y_n)\end{pmatrix} [[/math]]
.


Show Proof

Let [math]H[/math] be the linear subspace of [math]L^2(\Omega,\A,\p)[/math] spanned by [math]\{1,Y_1,...,Y_n\}[/math]. Then we know that the r.v. [math]Z[/math], which minimizes

[[math]] \|X-U\|_2=\E[(X-U)^2] [[/math]]
for [math]U\in H[/math], is the orthogonal projection of [math]X[/math] on [math]H[/math]. We can thus write

[[math]] Z=\alpha_0+\sum_{j=1}^n\alpha_j(Y_j-\E[Y_j]). [[/math]]
The orthogonality of [math]X-Z[/math] to [math]H[/math] can be written as [math]\E[(X-Z)\cdot 1]=0[/math]. Therefore [math]\E[X]=\E[Z][/math] and thus [math]\alpha_0=\E[X][/math]. Moreover, we get [math]\E[(X-Z)(Y_k-\E[Y_k])]=0[/math], which implies that for all [math]k\in\{1,...,n\}[/math] we get [math]\E[(X-\E[X]+\E[Z]-Z)]\cdot(Y_k-\E[Y_k])=0[/math].

When [math]n=1[/math], we have

[[math]] Z=\E[X]+\frac{Cov(X,Y)}{Var(Y)}(Y-\E[Y]). [[/math]]

General references

Moshayedi, Nima (2020). "Lectures on Probability Theory". arXiv:2010.16280 [math.PR].