Gaussian random vectors

label{Ch-RV}

We now turn to points that are chosen at random from the whole space according to a Gaussian distribution with mean zero and unit variance. Our first result is the so-called Gaussian Annulus Theorem which states that a point chosen at random will, with high probability, be located in a small annulus around the sphere with radius [math]\sqrt{d}[/math]. On a first sight that might seem surprising as the density function has its largest values around zero and decays if we move away from zero. The results of Chapter \ref{Ch-VC} however show that close to the origin there is located only little volume and since for the probability to be in a Borel set [math]A\subseteq\mathbb{R}^d[/math] we ought to integrate the density function over [math]A[/math], the lack of volume around the origin makes this integral small if [math]A[/math] is located in the vicinity of zero. At a certain radius then the concentration of volume compensates the smaller values of the density function. Going beyond this radius, the small values of the density function determine the value of the integral and make it small again. \smallskip Before we can start, we need to fill in several facts about Gaussian random vectors, namely that

spherical Gaussian random vectors are precisely those which are independently coordinate-wise Gaussian (Proposition),
a linear combination of normal random variables is Gaussian with mean zero and variance given by the sum of the squares of the coefficients (Proposition).

The reader familiar with the above can skip the proofs and jump directly to Lemma which will be the final preparation for the Gaussian Annulus Theorem.

\smallskip

Proposition

Let [math](\Omega,\Sigma,\P)[/math] be a probability space and [math]X\colon\Omega\rightarrow\mathbb{R}^d[/math] be a random vector with coordinate functions [math]X_1,\dots,X_d[/math]. Then [math]X[/math] is spherically Gaussian distributed with mean zero and variance [math]\sigma \gt 0[/math] if and only if its coordinates are independent and

[[math]] \P[X_i\in A] = {\medfrac{1}{\sqrt{2\pi}\sigma}} \int_A\exp\bigl(-\medfrac{x^2}{2\sigma^2}\bigr)\dd\lambda(x) [[/math]]

holds for every Borel set [math]A\subseteq\mathbb{R}[/math] and every [math]i=1,\dots,d[/math].

Show Proof

Let [math]A=A_1\times\cdots\times A_d\subseteq\mathbb{R}^d[/math] be given with Borel sets [math]A_i\subseteq\mathbb{R}[/math]. By Fubini's Theorem we compute

[[math]] \begin{align*} \P[X\in A] &= {\medfrac{1}{(2\pi\sigma^2)^{d/2}}} \int_A\exp\bigl(-\medfrac{\|x\|^2}{2\sigma^2}\bigr)\dd\lambda^d(x)\\ & = {\medfrac{1}{(2\pi\sigma^2)^{d/2}}} \int_A\exp\bigl(-\medfrac{x_1^2+\cdots+x_d^2}{2\sigma^2}\bigr)\dd\lambda^d(x)\\ & = {\medfrac{1}{(2\pi\sigma^2)^{d/2}}} \int_A\exp\bigl(-\medfrac{x_1^2}{2\sigma^2}\bigr)\cdots\exp\bigl(-\medfrac{x_d^2}{2\sigma^2}\bigr)\dd\lambda^d(x)\\ & = \medfrac{1}{\sqrt{2\pi}\sigma} \int_{A_1}\exp\bigl(-\medfrac{x_1^2}{2\sigma^2}\bigr)\dd\lambda(x_1)\cdots \medfrac{1}{\sqrt{2\pi}\sigma}\int_{A_d}\exp\bigl(-\medfrac{x_d^2}{2\sigma^2}\bigr)\dd\lambda(x_d)\\ \phantom{\int}& = \P[X_1\in A_1]\cdots\P[X_d\in A_d] \end{align*} [[/math]]

where the first equality is true if [math]X\sim\mathcal{N}(0,1)[/math] holds and the last if [math]X_i\sim\mathcal{N}(0,1)[/math] for each [math]i=1,\dots,d[/math]. This computation now shows both implications. \smallskip \textquotedblleft{}[math]\Rightarrow[/math]\textquotedblright{} Let [math]B\subseteq\mathbb{R}[/math] be Borel and let [math]1\leqslant i\leqslant d[/math] be fixed. Then we have [math]\P[X_i\in B] = \P[X\in\mathbb{R}^{i-1}\times{}B\times\mathbb{R}^{d-i}][/math] and reading the computation above from the beginning until line four shows

[[math]] \P[X_i\in B] = \medfrac{1}{\sqrt{2\pi}\sigma} \int_{B}\exp\bigl(-\medfrac{x_i^2}{2\sigma^2}\bigr)\dd\lambda(x_i) [[/math]]

as all the other integrals are equal to one. \smallskip \textquotedblleft{}[math]\Leftarrow[/math]\textquotedblright{} Let [math]A=A_1\times\cdots\times A_d\subseteq\mathbb{R}^d[/math] with bounded and open intervals [math]A_i\subseteq\mathbb{R}[/math] be an open cuboid. Since the [math]X_i[/math]'s are independent, we get [math]\P[X\in A] = \P[X_1\in A_1]\cdots\P[X_d\in A_d][/math]. Reading now our initial computation backwards until the very first equality, we get

[[math]] \P[X\in A]= {\medfrac{1}{(2\pi\sigma^2)^{d/2}}} \int_A\exp\bigl(-\medfrac{\|x\|^2}{2\sigma^2}\bigr)\dd\lambda^d(x) [[/math]]

which extends now to every Borel set in [math]\mathbb{R}^d[/math] as the cuboids are a generator.

■

As mentioned we need also to show that the distribution of the sum of Gaussian random variables is again Gaussian. We start with the case of two summands.

Lemma

Let [math]X,\,Y\colon\Omega\rightarrow\mathbb{R}[/math] be independent real random variables with probability density functions [math]\rho[/math] and [math]\sigma[/math]. Then the random variable [math]X+Y[/math] has a probability density function which is given by the convolution [math]\rho\Conv\sigma\colon\mathbb{R}\rightarrow\mathbb{R}[/math], [math](\rho\Conv\sigma)(s)=\int_{\mathbb{R}}\rho(s-t)\sigma(t)\dd\lambda(t)[/math].

Show Proof

{{{4}}}

■

In order to extend the above to more than two summands, we need to establish first that the convolution of density functions is commutative and associative.

Lemma

Let [math]\rho,\,\sigma,\,\tau\colon\mathbb{R}\rightarrow\mathbb{R}[/math] be integrable. Then we have [math]\rho\Conv\sigma=\sigma\Conv\rho[/math] and [math](\rho\Conv\sigma)\Conv\tau=\rho\Conv(\sigma\Conv\tau)[/math].

Show Proof

By substituting [math]u:=s-t[/math] we see that

[[math]] (\rho\Conv\sigma)(s)=\int_{\mathbb{R}}\rho(s-t)\sigma(t)\dd\lambda(t)=\int_{\mathbb{R}}\rho(u)\sigma(s-u)\dd\lambda(u)=(\sigma\Conv\rho)(s) [[/math]]

holds for every [math]s\in\mathbb{R}[/math]. \smallskip The proof of Lemma shows as a byproduct that the convolution of any two density functions is again a density function and in particular integrable. We thus get

[[math]] \begin{align*} [(\rho\Conv\sigma)\Conv\tau](s)&=\int_{\mathbb{R}}(\rho\Conv\sigma)(u)\tau(s-u)\dd\lambda(u)\\ & = \int_{\mathbb{R}}\int_{\mathbb{R}}\rho(t)\sigma(u-t)\dd\lambda(t)\tau(s-u)\dd\lambda(u)\\ & = \int_{\mathbb{R}}\rho(t)\int_{\mathbb{R}}\sigma(u-t)\tau(s-u)\dd\lambda(u)\dd\lambda(t)\\ & = \int_{\mathbb{R}}\rho(t)\int_{\mathbb{R}}\sigma(s-v-t)\tau(v)\dd\lambda(v)\dd\lambda(t)\\ & = \int_{\mathbb{R}}\rho(t)(\sigma\Conv\tau)(s-t)\dd\lambda(t)\\ &=[\rho\Conv(\sigma\Conv\tau)](s) \end{align*} [[/math]]

where we substituted [math]v:=s-u[/math] in the fourth equality.

■

In view of Lemma we can drop parentheses and simply write [math]\rho_1\Conv\cdots\Conv\rho_n[/math] for the convolution of [math]n[/math] density functions. Employing this, we can now extend Lemma to the sum of more than two random variables.

Proposition

For [math]i=1,\dots,d[/math] let [math]X_i\colon\Omega\rightarrow\mathbb{R}[/math] be independent Gaussian random variables with zero mean and unit variance. Then [math]X:=X_1+\cdots+X_d[/math] is a Gaussian random variable with mean zero and variance [math]d[/math].

Show Proof

{{{4}}}

■

Let us now show what we later really need, namely the case of a linear combination of Gaussian random variables with zero mean and unit variance.

Proposition

For [math]i=1,\dots,d[/math] let [math]X_i\sim\mathcal{N}(0,1)[/math] be independent Gaussian random variables and let [math]\lambda_i\not=0[/math] be real number. Then [math]X:=\lambda_1X_1+\cdots+\lambda_dX_d\sim\mathcal{N}(0,\sigma^2)[/math] with [math]\sigma^2:=\lambda_1^2+\cdots+\lambda_d^2\not=0[/math].

Show Proof

We observe that for [math]\lambda\not=0[/math], [math]X\sim\mathcal{N}(0,1)[/math] and a Borel set [math]A[/math] the following holds

[[math]] \P\bigl[\lambda X\in A\bigr]=\P\bigl[X\in \textfrac{1}{\lambda}A\bigr]=\int_{\frac{1}{\lambda}A}\rho(x)\dd x = \int_{A}\rho(\textfrac{1}{\lambda})\textfrac{1}{\lambda}\dd x [[/math]]

where [math]\rho[/math] is the Gaussian density function. This means that the random variable [math]\lambda X[/math] is given by the density function [math]\rho(\frac{\cdot}{\lambda})\frac{1}{\lambda}[/math]. Subbing in the Gaussian density, the latter becomes

[[math]] \rho(\medfrac{t}{\lambda})\medfrac{1}{\lambda} = \medfrac{1}{\sqrt{2\pi}}\exp\bigl(-(\medfrac{t}{2})^2/2\bigr)\medfrac{1}{\lambda} = \medfrac{1}{\sqrt{2\pi}\lambda}\exp\bigl(-(\medfrac{t^2}{2\lambda^2})\bigr) [[/math]]

for [math]t\in\mathbb{R}[/math], which shows that [math]\lambda X\sim\mathcal{N}(0,\lambda^2)[/math]. We know now that [math]\lambda_iX_i\sim\mathcal{N}(0,\lambda_i^2)[/math] holds for [math]i=1,\dots,d[/math] and need to establish that their sum's density function is given by

[[math]] (\rho_1\Conv\cdots\Conv\rho_d)(x) = (2\pi\sigma^2)^{-1/2}\exp(-x/(2\sigma^2)) [[/math]]

where

[[math]] \rho_i(x)=(2\pi\sigma_i^2)^{-1/2}\exp(-x/(2\sigma_i^2)). [[/math]]

This can be achieved similarly to \eqref{CONV-COMP}. Indeed, let us abbreviate [math]a:=\sigma_1^2[/math] and [math]b:=\sigma_2^2[/math] and [math]c:=a+b[/math], i.e., [math](b+a)/c=1[/math]. Then we get

[[math]] \begin{align*} (\rho_1\Conv{}\rho_2)(s) & =\medfrac{1}{2\pi\sqrt{ab}}\int_{\mathbb{R}}\exp\bigl(-\medfrac{(s-t)^2}{2a}\bigr)\exp\bigl(-\medfrac{t^2}{2b}\bigr)\dd\lambda(t)\\ & =\medfrac{1}{2\pi\sqrt{ab}}\int_{\mathbb{R}}\exp\bigl(-\medfrac{b(s^2-2st+t^2)+at^2}{2ab}\bigr)\dd\lambda(t)\\ & =\medfrac{1}{2\pi\sqrt{ab}}\int_{\mathbb{R}}\exp\bigl(-\medfrac{t^2(b+a)-2stb+bs^2}{2ab}\bigr)\dd\lambda(t)\\ & =\medfrac{1}{2\pi c^2\sqrt{ab/c}}\int_{\mathbb{R}}\exp\bigl(-\medfrac{t^2(b+a)/c-2stb/c+bs^2/c}{2ab/c}\bigr)\dd\lambda(t)\\ & =\medfrac{1}{2\pi c^2\sqrt{ab/c}}\int_{\mathbb{R}}\exp\bigl(-\medfrac{(t-(bs)/c)^2 - (sb/c)^2 +s^2(b/c)}{2ab/c}\bigr)\dd\lambda(t)\\ & =\medfrac{1}{\sqrt{2\pi c}}\exp(- \medfrac{(sb/c)^2 -s^2(b/c)}{2ab/c})\medfrac{1}{\sqrt{2\pi(ab/c)}}\int_{\mathbb{R}}\exp\bigl(-\medfrac{(t-(bs)/c)^2}{2ab/c}\bigr)\dd\lambda(t)\\ & =\medfrac{1}{\sqrt{2\pi c}}\exp(- \medfrac{(sb/c)^2c^2 -s^2(b/c)c^2}{2abc})\\ & =\medfrac{1}{\sqrt{2\pi c}}\exp(- \medfrac{s^2(b^2 -bc)}{2abc})\\ & =\medfrac{1}{\sqrt{2\pi c}}\exp(- \medfrac{s^2}{2c}) \end{align*} [[/math]]

where the integral can be seen to be one by the substitution [math]u:=t-(bs)/c[/math]. Iteration of this completes the proof.

■

We mention that Proposition extends further to the case of a linear combination of arbitrary Gaussian random variables which not necessarily have mean zero or unit variance. The proof then becomes again one step more complicated than the above, as the means [math]\mu_i[/math] appear inside the exponential functions. Since for our purposes below the result as stated above will be sufficient, we leave this last generalization to the reader as Problem.

\smallskip Our final preparation is the following formula for the integral of the Gaussian function against a polynomial which we will need, in order to estimate moments in the next proof.

Lemma

For [math]k\in\mathbb{N}[/math] the equality

[[math]] \medfrac{1}{\sqrt{2\pi}}\int_{\mathbb{R}}t^{2k}\exp(-t^2/2)\dd t = \medfrac{(2k)!}{2^kk!} [[/math]]

holds.

Show Proof

We first use Lebesgue's Theorem to see that

[[math]] \begin{align*} \medfrac{\operatorname{d}^k}{\dd a^k}\Bigl(\int_{\mathbb{R}}\exp(-at^2)\dd t\Bigr)\Big|_{a=\frac{1}{2}} &= \Bigl(\int_{\mathbb{R}}\medfrac{\operatorname{d}^k}{\dd a^k}\exp(-at^2)\dd t\Bigr)\Big|_{a=\frac{1}{2}}\\ & = \int_{\mathbb{R}}(-t^2)^k\exp(-at^2)\dd t\Big|_{a=\frac{1}{2}}\\ & = (-1)^k\int_{\mathbb{R}}t^{2k}\exp(-t^2/2)\dd t \end{align*} [[/math]]

holds where [math]a \gt 0[/math] and [math]k\in\mathbb{N}[/math] is fixed. On the other hand substituting [math]u:=\sqrt{a}t[/math] leads to

[[math]] \int_{\mathbb{R}}\exp(-at^2)\dd t = \medfrac{1}{\sqrt{a}}\int_{\mathbb{R}}\exp(-u^2)\dd u = \sqrt{\medfrac{\pi}{a}} [[/math]]

now for every [math]a \gt 0[/math]. Combining both equations we obtain

[[math]] \int_{\mathbb{R}}t^{2k}\exp(-t^2/2)\dd t = (-1)^k\medfrac{\operatorname{d}^k}{\dd a^k}\Bigl(\int_{\mathbb{R}}\exp(-at^2)\dd t\Bigr)\Big|_{a=\frac{1}{2}}= (-1)^k\medfrac{\operatorname{d}^k}{\dd a^k}\Bigl(\medfrac{\sqrt{\pi}}{\sqrt{a}}\,\Bigr)\Big|_{a=\frac{1}{2}} [[/math]]

and thus need to compute the derivative on the right hand side. For that purpose we compute

[[math]] \begin{align*} \smallfrac{\operatorname{d}^k}{\dd a^k}(a^{-1/2}) & = (-1/2)\cdot(-1/2-1)\cdots(-1/2-(k-1))\,a^{-1/2-k}\\ & = \smallfrac{(-1)^k}{2^k}(1\cdot 3\cdots(2k-3)\cdot (2k-1))\,a^{-1/2-k}\\ & = \smallfrac{(-1)^k}{2^k}\smallfrac{1\cdot2\cdot 3\cdots(2k-3)\cdot (2k-2)\cdot(2k-1)\cdot(2k)}{2\cdot4\cdots(2k-2)\cdot(2k)}\,a^{-1/2-k}\\ & = \smallfrac{(-1)^k}{2^k}\smallfrac{(2k)!}{2^k k!}\,a^{-1/2-k}. \end{align*} [[/math]]

Putting everything together we arrive at

[[math]] \begin{align*} \smallfrac{1}{\sqrt{2\pi}}\int_{\mathbb{R}}t^{2k}\exp(-t^2/2)\dd t & = \smallfrac{1}{\sqrt{2\pi}}(-1)^k\smallfrac{\operatorname{d}^k}{\dd a^k}\Bigl(\smallfrac{\sqrt{\pi}}{\sqrt{a}}\,\Bigr)\Big|_{a=\frac{1}{2}}\\ & = \smallfrac{(-1)^k}{\sqrt{2}}\smallfrac{(-1)^k}{2^k}\smallfrac{(2k)!}{2^k k!}\,a^{-1/2-k}\Big|_{a=\frac{1}{2}}\\ & = \smallfrac{1}{\sqrt{2}}\,\smallfrac{1}{2^k}\,\smallfrac{(2k)!}{2^k k!}2^{1/2+k} \end{align*} [[/math]]

which after cancelation reduces precisely to the expression on the right hand side in the Lemma.

■

Now we can prove the first main result of this chapter. Notice that the inequality is only non-trivial, i.e., the right hand side strictly less than one, if [math]\epsilon \gt 4\ln2\,(\approx2.8)[/math] holds. This means, although we are using the variable [math]\epsilon[/math] here, the typical intuition that we can let [math]\epsilon\rightarrow0[/math] does not apply.

Theorem

(Gaussian Annulus Theorem) Let [math]x\in\mathbb{R}^d[/math] be drawn at random with respect to the spherical Gaussian distribution with zero mean and unit variance. Then

[[math]] \P\bigl[\big|\|x\|-\sqrt{d}\big|\geqslant\epsilon\bigr]\leqslant 2\exp(-c\epsilon^2). [[/math]]

holds for every [math]0\leqslant\epsilon\leqslant\sqrt{d}[/math] where [math]c=1/16[/math].

Show Proof

{{{4}}}

■

In Chapter \ref{Ch-INTRO} we propagated the intuition that a point chosen at random will have norm close to [math]\sqrt{d}[/math] with ‘a high probability.’ More precisely, Figure suggested that the deviation from [math]\sqrt{d}[/math] to be expected will not increase with [math]d[/math]. Theorem allows to quantify this intuition as follows. If we pick, e.g., [math]\epsilon=10[/math] then a random point's norm will satisfy [math]\sqrt{d}-10\leqslant \|x\|\leqslant \sqrt{d}+10[/math] with probability greater or equal to [math]0.99[/math] whenever we are in a space with dimension [math]d\geqslant100[/math]. \smallskip We note the following byproduct the proof of Theorem which gives a similar estimate but of the square of the norm. Notice that one of our first results in Chapter \ref{Ch-INTRO} was the equality [math]\E(\|X\|^2)=d[/math] for any fixed [math]d[/math]---and that we on the other hand proved a statement without the squares, namely [math]\E(\|X\|)\approx\sqrt{d}[/math], asymptotically.

Proposition

Let [math]x\in\mathbb{R}^d[/math] be drawn at random with respect to the spherical Gaussian distribution with zero mean and unit variance. Then

[[math]] \P\bigl[\big|\|x\|^2-d\big|\geqslant\epsilon\bigr]\leqslant 2\exp\bigl(-c\min\bigl(\medfrac{\epsilon^2}{2d},\epsilon\bigr)\bigr). [[/math]]

holds for every [math]\epsilon \gt 0[/math] where [math]c=1/8[/math].

Show Proof

We define the random variables [math]Y_i[/math] as in the proof of Theorem and then proceed using the Bernstein inequality as in the last part of the previous proof. This way we obtain

[[math]] \P\bigl[\bigl|\|X\|^2-d\bigr|\geqslant\epsilon\bigr] = \P\bigl[\bigl|Y_1+\cdots+Y_d\bigr|\geqslant\medfrac{\epsilon}{2}\bigr]\leqslant 2\exp\bigl(-C\min\bigl(\smallfrac{\epsilon^2}{4d},\medfrac{\epsilon}{2}\bigr)\bigr) [[/math]]

which leads to the claimed inequality taking [math]C=1/4[/math] into account.

■

\smallskip In the next two theorems we divide by [math]\|x\|[/math] and [math]\|y\|[/math] where [math]x[/math] and [math]y[/math] are picked at random. Notice however that [math]x=0[/math] or [math]y=0[/math] occurs only with probability zero. To be completely precise, we could use the conditional probability [math]\P[|\langle{}x/\|x\|,y/\|y\|\rangle{}|\leqslant\epsilon\,|\,x\not=0\text{ and }y\not=0][/math]. We will instead tacitly assume this. Observe that the bound below is non-trivial if [math]\epsilon \gt 2/(\sqrt{d}-7)[/math]. In this result it thus possible to think of [math]\epsilon\rightarrow0[/math], but only if simultaneously [math]d\rightarrow\infty[/math] suitably fast.

\smallskip

Theorem

Let [math]x,y\in\mathbb{R}^d[/math] be drawn at random with respect to the spherical Gaussian distribution with zero mean and unit variance. Then for every [math]\epsilon \gt 0[/math] and for all [math]d\geqslant1[/math] the estimate

[[math]] \P\bigl[\big|\bigl\langle{}\medfrac{x}{\|x\|},\medfrac{y}{\|y\|}\bigr\rangle{}\big|\geqslant\epsilon\bigr]\leqslant \medfrac{2/\epsilon+7}{\sqrt{d}} [[/math]]

holds.

Show Proof

Let [math]y\in\mathbb{R}^d\backslash\{0\}[/math] be fixed and let [math]X\colon\Omega\rightarrow\mathbb{R}^d[/math] be a random vector with [math]X\sim\mathcal{N}(0,1)[/math] whose coordinate functions we denote by [math]X_i[/math] for [math]i=1,\dots d[/math]. We define

[[math]] U_y\colon\Omega\rightarrow\mathbb{R}^d,\;U_y(\omega)=\bigl\langle{}X(\omega),\medfrac{y}{\|y\|}\bigr\rangle{}=\sum_{i=1}^d\medfrac{y_i}{\|y\|}X_i(\omega) [[/math]]

which is itself a random variable which has by Proposition Gaussian distribution with [math]\mu=0[/math] and

[[math]] \sigma^2=\sum_{i=1}^d\medfrac{y_i^2}{\|y\|^2}= \medfrac{1}{\|y\|^2}\sum_{i=1}^d y_i^2 = 1. [[/math]]

This shows that [math]U_y\sim\mathcal{N}(0,1)[/math] is normally distributed. In particular, we have

[[math]] \P\bigl[|U_y|\leqslant \delta\bigr] = \medfrac{1}{\sqrt{2\pi}}\int_{-\delta}^{\delta}\exp(-t^2/2)\dd t [[/math]]

for any [math]\delta \gt 0[/math]. Observe that the right hand side is independent of [math]y[/math]. Therefore, if [math]Y\colon\Omega\rightarrow\mathbb{R}^d[/math] is any other random vector, we can consider the composite random vector [math]U_Y(X)\colon\Omega\rightarrow\mathbb{R}^d[/math] and get for any [math]\delta \gt 0[/math]

[[math]] \begin{align*} \P\bigl[|U_Y(X)|\leqslant \delta\bigr] & = \P\bigl(\{\omega\in\Omega\:;\:(X(\omega),Y(\omega))\in A\}\bigr)= \int_A\rho(x)\rho(y)d(x,y)\\ & = \int_{\mathbb{R}}\rho(x)\int_{A_y}\rho(x)\dd x \dd y = \int_{\mathbb{R}}\rho(x)\P(\{\omega\in\Omega\:;\:X(\omega)\in A_y\}) \dd y \\ & = \int_{\mathbb{R}}\rho(t)\underbrace{\P(\{\omega\in\Omega\:;\:|U_y(X(\omega))|\leqslant\delta\})}_{\text{independent of $y$}} \dd y = 1\cdot\P\bigl[|U_y(X)|\leqslant\delta\bigr] \\ &= \medfrac{1}{\sqrt{2\pi}}\int_{-\delta}^{\delta}\exp(-t^2/2)\dd t \geqslant 1-\medfrac{2}{\sqrt{2\pi}}\int_{\delta}^{\infty}1/t^2\dd t = 1-\medfrac{2}{\sqrt{2\pi}\delta} \end{align*} [[/math]]

where [math]\rho[/math] is the Gaussian density function and [math]A=\{(x,y)\in\mathbb{R}^2\:;\:|U_y(x)|\leqslant\delta\}[/math]. Let now [math]X[/math] and [math]Y\colon\Omega\rightarrow\mathbb{R}^d[/math] both be normally distributed random vectors and let [math]\epsilon \gt 0[/math] be given. We first use the Theorem of Total Probability and drop the second summand. Then we rewrite the first term by using the random variable [math]U_Y(X)[/math] from above and the second term such that the Gaussian Annulus Theorem becomes applicable. We get

[[math]] \begin{align*} \P\bigl[\big|\bigl\langle{}\textfrac{X}{\|X\|},\textfrac{Y}{\|Y\|}\bigr\rangle{}\big|\leqslant\epsilon\bigr] & \geqslant \P\bigl[\big|\bigl\langle{}\textfrac{X}{\|X\|},\textfrac{Y}{\|Y\|}\bigr\rangle{}\big|\leqslant\epsilon\;\big|\;\|X\|\geqslant\textfrac{\sqrt{d}}{2}\bigr]\cdot\P\bigl[\|X\|\geqslant\textfrac{\sqrt{d}}{2}\bigr] \\ &= \P\bigl[\big|\bigl\langle{}X,\textfrac{Y}{\|Y\|}\rangle{}|\leqslant\epsilon\,\|X\|\;\big|\;\|X\|\geqslant\textfrac{\sqrt{d}}{2}\bigr]\cdot\P\bigl[\|X\|\geqslant\textfrac{\sqrt{d}}{2}\bigr] \\ &\geqslant \P\bigl[|U_Y(X)|\leqslant\textfrac{\epsilon\sqrt{d}}{2}\bigr]\cdot\P\bigl[|\|X\|-\sqrt{d}|\geqslant\textfrac{\sqrt{d}}{2}\bigr] \\ &\geqslant \Bigl(1-\medfrac{2}{\sqrt{2\pi}(\epsilon\sqrt{d}/2)}\Bigr)\Bigl(1-2\exp\bigl(-c(\sqrt{d}/2)^2\bigr)\Bigr) \\ &= \Bigl(1-\medfrac{4}{\sqrt{2\pi d}\epsilon}\Bigr)\Bigl(1-2\exp\bigl(-\medfrac{cd}{4}\bigr)\Bigr) \\ &\geqslant 1- \medfrac{4}{\sqrt{2\pi}\epsilon}\medfrac{1}{\sqrt{d}} -2\exp\bigl(-\medfrac{d}{64} \bigr) \end{align*} [[/math]]

since [math]c=1/16[/math]. We see now already that the second summand is the one that goes slower to zero than the third. To get rid of the exponential term we thus only need to find the right constant. We claim that

[[math]] 2\exp\bigl(-\medfrac{x}{64}\bigr)\leqslant \medfrac{7}{\sqrt{x}} [[/math]]

holds for every [math]d\geqslant1[/math]. Indeed, the above is equivalent to [math]f(d):=\frac{49}{4}\exp\bigl(\frac{d}{32}\bigr)-d\geqslant0[/math]. Considering [math]f\colon(0,\infty)\rightarrow\mathbb{R}[/math] we get [math]f'(d)=0[/math] if and only if [math]d=32\ln(\frac{128}{49})[/math] and at this point needs to be the global minimum of [math]f[/math]. Using a calculator one can see that [math]f(32\ln(\frac{128}{49})) \gt 0[/math]. Consequently, we get

[[math]] \P\bigl[\big|\bigl\langle{}\textfrac{X}{\|X\|},\textfrac{Y}{\|Y\|}\bigr\rangle{}\big|\geqslant\epsilon\bigr] \leqslant \medfrac{4}{\sqrt{2\pi}\epsilon}\medfrac{1}{\sqrt{d}} +\medfrac{7}{\sqrt{d}} \leqslant \bigl(\medfrac{2}{\epsilon}+7\bigr) \medfrac{1}{\sqrt{d}} [[/math]]

which concludes the proof.

■

Theorem now also quantifies our intuition that two points chosen at random will have scalar product close to zero and thus will be ‘almost orthogonal’ with ‘a high probability.’ If we pick for instance [math]\epsilon = 0.1[/math] and [math]d\geqslant 100\,000[/math], then by Theorem we get that two points drawn at random will, with probability greater or equal to [math]0.9[/math], have (normalized) scalar product less or equal to [math]0.1[/math]. This corresponds to an angle of [math]90^{\circ}\pm 6^{\circ}[/math]. \smallskip The proof of the previous result shows also the following.

Corollary

Let [math]X\sim\mathcal{N}(0,1,\mathbb{R}^d)[/math] and [math]0\not=\xi\in\mathbb{R}^d[/math] be fixed. Then we have

[[math]] \P\big[|\langle{}X,\xi\rangle{}|\geqslant\epsilon\bigr]\leqslant \medfrac{4}{\sqrt{2\pi}}\medfrac{\|\xi\|}{\epsilon\sqrt{d}} [[/math]]

for every [math]\epsilon \gt 0[/math].

Show Proof

In the notation of the proof of Theorem we calculate

[[math]] \begin{align*} \P\big[\bigl|\langle{}X,\xi\rangle{}\bigr|\leqslant\epsilon\bigr] & = \P\big[\bigl|\langle{}X,\medfrac{\xi}{\|\xi\|}\rangle{}\bigr|\leqslant\medfrac{\epsilon}{\|\xi\|}\bigr] = \P\big[\bigl|U_{\xi}(X)\bigr|\leqslant\medfrac{\epsilon}{\|\xi\|}\bigr] \geqslant 1-\medfrac{4}{\sqrt{2\pi}}\medfrac{\|\xi\|}{\epsilon\sqrt{d}} \end{align*} [[/math]]

which establishes the result.

■

\smallskip Since the probability that two random points are exactly orthogonal is obviously zero, we have on the other hand to expect that the probability of being very close to zero is again small. The estimate below is non-trivial if [math]\epsilon \lt 1[/math] and [math]d \gt 16\ln(\frac{2}{1-\epsilon})[/math]. In particular we can now let [math]\epsilon[/math] tend to zero for a fixed [math]d[/math] if we wish so. However, the threshold on the left hand side tends to zero for [math]d\rightarrow\infty[/math] even if we fix [math]0 \lt \epsilon \lt 1[/math] since we divide by [math]\sqrt{d}[/math] inside the square bracket.

Theorem

Let [math]x[/math], [math]y\in\mathbb{R}^d[/math] be drawn at random with respect to the spherical Gaussian distribution with zero mean and unit variance. Then for [math]\epsilon \gt 0[/math] and [math]d\geqslant1[/math] we have

[[math]] \P\bigl[\bigl|\bigl\langle{}\medfrac{x}{\|x\|},\medfrac{y}{\|y\|}\bigr\rangle\bigl|\leqslant\medfrac{\epsilon}{2\sqrt{d}}\,\bigr]\leqslant \epsilon+2\exp(-cd) [[/math]]

where [math]c=1/16[/math].

Show Proof

Let [math]X,Y\colon\Omega\rightarrow\mathbb{R}^d[/math] be random vectors with normal distribution. Let [math]U_Y(X)[/math] be defined as in the proof of Theorem. We compute

[[math]] \begin{align*} \P\bigl[\bigl|\bigl\langle{}\medfrac{X}{\|X\|},\medfrac{Y}{\|Y\|}\bigr\rangle\bigl|\geqslant\medfrac{\epsilon}{2\sqrt{d}}\,\bigr] & = 1-\P\bigl[\bigl|\bigl\langle{}{\textstyle\frac{X}{\|X\|},\frac{Y}{\|Y\|}}\bigr\rangle\bigl| \lt \medfrac{\epsilon}{2\sqrt{d}}\bigr]\\ & = 1-\P\bigl[\medfrac{1}{\|X\|}\cdot\bigl|U_Y(X)\bigl| \lt \medfrac{1}{2\sqrt{d}}\cdot\epsilon\bigr]\\ & \geqslant 1-\P\bigl[\medfrac{1}{\|X\|} \lt \medfrac{1}{2\sqrt{d}} \:\text{ or }\: \bigl|U_Y(X)\bigl| \lt \epsilon\bigr]\\ & \geqslant 1-\Bigl(\P\bigl[\|X\| \gt 2\sqrt{d}\bigr] +\P\bigl[\bigl|U_Y(X)\bigl| \lt \epsilon\bigr]\Bigr)\\ & = \P\bigl[\|X\|\leqslant 2\sqrt{d}\bigr] -\P\bigl[\bigl|U_Y(X)\bigl| \lt \epsilon\bigr]\\ & = \P\bigl[\|X\|-\sqrt{d} \leqslant \sqrt{d}\bigr] -\medfrac{1}{\sqrt{2\pi}}\int_{-\epsilon}^{\epsilon}\exp(-t^2/2)\dd t\\ & \geqslant \P\bigl[\bigl|\|X\|-\sqrt{d}\bigr| \leqslant \sqrt{d}\bigr] -\medfrac{1}{\sqrt{2\pi}}2\epsilon\\ & \geqslant 1-2\exp\Bigl(-\medfrac{1}{16}(\sqrt{d})^2\Bigr) -\sqrt{2/\pi}\,\epsilon\\ & \geqslant 1-2\exp\bigl(\medfrac{d}{16}\bigr) -\epsilon \end{align*} [[/math]]

where we used Theorem and the formula for [math]\P\bigl[|U_Y(X)|\leqslant \delta\bigr][/math] that we established in the proof of Theorem.

■

If we now fix the dimension such that the exponential term is already quite small (e.g., [math]d=100[/math], then [math]2\exp(-cd) \lt 0.01[/math]) and pick, e.g., [math]\epsilon=0.1[/math], then the result shows that the (normalized) scalar product of two random points is less than [math]0.005[/math] only with a probability less than [math]0.11[/math]. \smallskip Combining the last two results we can think of the scalar products for a fixed high dimension [math]d[/math], with high probability, to be small numbers which are however bounded away from zero. \smallskip Our last result in this section quantifies our finding from Chapter \ref{Ch-INTRO} that for two points [math]x[/math] and [math]y[/math] drawn at random from a unit Gaussian we have [math]\|x-y\|\approx\sqrt{2d}[/math]. Using [math]\epsilon=18[/math] in (ii) below shows that the values possible for the dimension are [math]d \gt 324[/math]. In this case the estimate is however still trivial. Interesting cases appear only if [math]\epsilon \gt 18[/math] and [math]d[/math] so large that the sum on the right hand side of (ii) stays below one.

Theorem

{{{3}}}

Show Proof

{{{4}}}

■

Since we do need this later, let us formulate the following probability estimate that we established in the proof above.

Corollary

Let [math]x[/math], [math]y\in\mathbb{R}^d[/math] be drawn at random with respect to the spherical Gaussian distribution with zero mean and unit variance. Then

[[math]] \forall\:\epsilon \gt 18,\,d\geqslant\epsilon^2\colon \P\bigl[\bigl|\|x-y\|^2-2d\bigr|\geqslant\epsilon\sqrt{2d}\bigr]\leqslant \medfrac{18}{\epsilon}+\medfrac{8}{\sqrt{d}} [[/math]]

holds.\hfill\qed

\section*{Problems}

General references

Wegner, Sven-Ake (2024). "Lecture Notes on High-Dimensional Data". arXiv:2101.05841 [math.FA].