guide:D9c33cd067: Difference between revisions

From Stochiki
No edit summary
mNo edit summary
 
(2 intermediate revisions by the same user not shown)
Line 1: Line 1:
<div class="d-none"><math>
<div class="d-none"><math>
\newcommand{\indexmark}[1]{#1\markboth{#1}{#1}}
\newcommand{\smallfrac}[2]{\frac{#1}{#2}}
\newcommand{\red}[1]{\textcolor{red}{#1}}
\newcommand{\medfrac}[2]{\frac{#1}{#2}}
\newcommand{\NOTE}[1]{$^{\textcolor{red}\clubsuit}$\marginpar{\setstretch{0.5}$^{\scriptscriptstyle\textcolor{red}\clubsuit}$\textcolor{blue}{\bf\tiny #1}}}
\newcommand{\textfrac}[2]{\frac{#1}{#2}}
\newcommand\xoverline[2][0.75]{%
    \sbox{\myboxA}{$\m@th#2$}%
    \setbox\myboxB\null% Phantom box
    \ht\myboxB=\ht\myboxA%
    \dp\myboxB=\dp\myboxA%
    \wd\myboxB=#1\wd\myboxA% Scale phantom
    \sbox\myboxB{$\m@th\overline{\copy\myboxB}$}%  Overlined phantom
    \setlength\mylenA{\the\wd\myboxA}%  calc width diff
    \addtolength\mylenA{-\the\wd\myboxB}%
    \ifdim\wd\myboxB<\wd\myboxA%
      \rlap{\hskip 0.35\mylenA\usebox\myboxB}{\usebox\myboxA}%
    \else
        \hskip -0.5\mylenA\rlap{\usebox\myboxA}{\hskip 0.5\mylenA\usebox\myboxB}%
    \fi}
\newcommand{\smallfrac}[2]{\scalebox{1.35}{\ensuremath{\frac{#1}{#2}}}}
\newcommand{\medfrac}[2]{\scalebox{1.2}{\ensuremath{\frac{#1}{#2}}}}
\newcommand{\textfrac}[2]{{\textstyle\ensuremath{\frac{#1}{#2}}}}
\newcommand{\nsum}[1][1.4]{% only for \displaystyle
    \mathop{%
        \raisebox
            {-#1\depthofsumsign+1\depthofsumsign}
\newcommand{\tr}{\operatorname{tr}}
\newcommand{\tr}{\operatorname{tr}}
\newcommand{\e}{\operatorname{e}}
\newcommand{\e}{\operatorname{e}}
\newcommand{\B}{\operatorname{B}}
\newcommand{\B}{\operatorname{B}}
\newcommand{\Bbar}{\xoverline[0.75]{\operatorname{B}}}
\newcommand{\Bbar}{\overline{\operatorname{B}}}
\newcommand{\pr}{\operatorname{pr}}
\newcommand{\pr}{\operatorname{pr}}
\newcommand{\dd}{\operatorname{d}\hspace{-1pt}}
\newcommand{\dd}{\operatorname{d}\hspace{-1pt}}
Line 33: Line 12:
\newcommand{\V}{\operatorname{V}}
\newcommand{\V}{\operatorname{V}}
\newcommand{\Cov}{\operatorname{Cov}}
\newcommand{\Cov}{\operatorname{Cov}}
\newcommand{\Bigsum}[2]{\ensuremath{\mathop{\textstyle\sum}_{#1}^{#2}}}
\newcommand{\Bigsum}[2]{\mathop{\textstyle\sum}_{#1}^{#2}}
\newcommand{\ran}{\operatorname{ran}}
\newcommand{\ran}{\operatorname{ran}}
\newcommand{\card}{\#}
\newcommand{\card}{\#}
\newcommand{\Conv}{\mathop{\scalebox{1.1}{\raisebox{-0.08ex}{$\ast$}}}}%
\renewcommand{\P}{\operatorname{P}}
\renewcommand{\L}{\operatorname{L}}
\newcommand{\mathds}{\mathbb}
</math>
</div>


\usepackage{pgfplots}
\newcommand{\filledsquare}{\begin{picture}(0,0)(0,0)\put(-4,1.4){$\scriptscriptstyle\text{\ding{110}}$}\end{picture}\hspace{2pt}}
\newcommand{\mathds}{\mathbb}</math></div>
label{Ch-VC}
In this chapter we denote by <math>\lambda^d</math> the <math>d</math>-dimensional Lebesgue measure on <math>\mathbb{R}^d</math>. The euclidean norm we denote by <math>\|\cdot\|</math> and balls with respect to this norm are denoted by <math>\B_r(x)</math> and <math>\Bbar_r(x)</math> respectively.
In this chapter we denote by <math>\lambda^d</math> the <math>d</math>-dimensional Lebesgue measure on <math>\mathbb{R}^d</math>. The euclidean norm we denote by <math>\|\cdot\|</math> and balls with respect to this norm are denoted by <math>\B_r(x)</math> and <math>\Bbar_r(x)</math> respectively.
\smallskip
 
We start with the following observation. Let <math>A\subseteq\mathbb{R}^d</math> be a measurable star-shaped set and assume that zero is a center. Then <math>(1-\epsilon)A\subseteq A</math> and we can interpret <math>A\backslash(1-\epsilon)A</math> as that part of <math>A</math> that is close to its surface.
We start with the following observation. Let <math>A\subseteq\mathbb{R}^d</math> be a measurable star-shaped set and assume that zero is a center. Then <math>(1-\epsilon)A\subseteq A</math> and we can interpret <math>A\backslash(1-\epsilon)A</math> as that part of <math>A</math> that is close to its surface.
\begin{center}
 
\begin{picture}(150,80)(0,0)
<div class="d-flex justify-content-center">
\put(0,0){<div class="d-flex justify-content-center">
[[File:tikzcacb22a4.png | 400px | thumb |Volume close to the surface of a ‘nice’ subset of <math>\mathbb{R}^2</math>.]]
[[File:tikzcacb22a4.png | 400px | thumb | ]]
</div>
</div>}
 
\put(9,8){<div class="d-flex justify-content-center">
[[File:tikz59584e4a.png | 400px | thumb |  ]]
</div>}
\end{picture}
\nopagebreak[4]
\begin{fig}Volume close to the surface of a ‘nice’ subset of <math>\mathbb{R}^2</math>.\end{fig}
\end{center}
If we compute the measure of the latter set, then we get
If we compute the measure of the latter set, then we get


Line 68: Line 40:
</math>
</math>
which will be close to one if <math>\epsilon</math> is small and <math>d</math> is large. This means that in this case the measure of <math>A</math> is located close to the surface of <math>A</math> whereas its center contributes only little to its volume. Our first aim in this section is to quantify the latter observation in the case that <math>A</math> is the unit ball.
which will be close to one if <math>\epsilon</math> is small and <math>d</math> is large. This means that in this case the measure of <math>A</math> is located close to the surface of <math>A</math> whereas its center contributes only little to its volume. Our first aim in this section is to quantify the latter observation in the case that <math>A</math> is the unit ball.
\smallskip
 
{{proofcard|Lemma|LEM-9-1|For <math>d\geqslant 1</math> and <math>0 < \epsilon\leqslant1</math> we have <math>(1-\epsilon)^d\leqslant \exp(-\epsilon d)</math>.
{{proofcard|Lemma|LEM-9-1|For <math>d\geqslant 1</math> and <math>0 < \epsilon\leqslant1</math> we have <math>(1-\epsilon)^d\leqslant \exp(-\epsilon d)</math>.
|Define <math>f\colon[0,1]\rightarrow\mathbb{R}</math> via <math>f(x)=\exp(-x)+x-1</math>. Then <math>f(0)=0</math> and <math>f'(x)=1-\exp(-x)\geqslant0</math> holds for all <math>x\in[0,1]</math>. Therefore <math>f(x)\geqslant0</math> and thus <math>\exp(-x)\geqslant1-x\geqslant0</math> holds for all <math>x\in[0,1]</math>. Evaluating at <math>x=\epsilon</math> and taking the <math>d</math>--th power finishes the proof.}}
|Define <math>f\colon[0,1]\rightarrow\mathbb{R}</math> via <math>f(x)=\exp(-x)+x-1</math>. Then <math>f(0)=0</math> and <math>f'(x)=1-\exp(-x)\geqslant0</math> holds for all <math>x\in[0,1]</math>. Therefore <math>f(x)\geqslant0</math> and thus <math>\exp(-x)\geqslant1-x\geqslant0</math> holds for all <math>x\in[0,1]</math>. Evaluating at <math>x=\epsilon</math> and taking the <math>d</math>--th power finishes the proof.}}
Line 79: Line 51:
</math>
</math>
i.e., at least <math>1-\exp(-\epsilon d)</math> fraction of the volume of the <math>d</math>-dimensional unit ball is located <math>\epsilon</math>-close to its surface.
i.e., at least <math>1-\exp(-\epsilon d)</math> fraction of the volume of the <math>d</math>-dimensional unit ball is located <math>\epsilon</math>-close to its surface.
\begin{center}
<div class{{=}}"d-flex justify-content-center">
\begin{picture}(135,90)(0,0)
[[File:tikz4262b634.png | 400px | thumb | Volume close to the surface of the unit ball. ]]
\put(0,-5){<div class="d-flex justify-content-center">
</div>
[[File:tikz4262b634.png | 400px | thumb | ]]
</div>}
\end{picture}\nopagebreak[4]
\begin{fig}Volume close to the surface of the unit ball.\end{fig}
\end{center}
|We compute
|We compute


Line 125: Line 92:
</math>
</math>
where <math>W_{\epsilon}=\bigl\{(x_1,\dots,x_d)\in \Bbar_1(0)\:;\:|x_1|\leqslant\epsilon\bigr\}</math>.
where <math>W_{\epsilon}=\bigl\{(x_1,\dots,x_d)\in \Bbar_1(0)\:;\:|x_1|\leqslant\epsilon\bigr\}</math>.
\begin{center}
<div class{{=}}"d-flex justify-content-center">
\begin{picture}(190,110)(0,0)
[[File:tikz62fda0be.png | 400px | thumb | Most of the volume is located close to the equator of the unit ball. ]]
\put(0,0){<div class="d-flex justify-content-center">
</div>|We first observe that the statement is trivial for <math>\epsilon\geqslant1</math>, so we may assume <math>\epsilon < 1</math> for the rest of the proof. We now compute the volume of that part  
[[File:tikz62fda0be.png | 400px | thumb | ]]
</div>}
\end{picture}
\nopagebreak[4]
\begin{fig}Most of the volume is located close to the equator of the unit ball.\end{fig}
\end{center}
|We first observe that the statement is trivial for <math>\epsilon\geqslant1</math>, so we may assume <math>\epsilon < 1</math> for the rest of the proof. We now compute the volume of that part  


<math display="block">
<math display="block">
Line 150: Line 110:
</math>
</math>
where the ball is taken in <math>\mathbb{R}^{d-1}</math>, compare the picture below.
where the ball is taken in <math>\mathbb{R}^{d-1}</math>, compare the picture below.
\begin{center}
 
\begin{picture}(190,75)(0,0)
<div class{{=}}"d-flex justify-content-center">
\put(0,-5){<div class="d-flex justify-content-center">
[[File:tikz53fe5393.png | 400px | thumb | Slice of the unit ball <math>\Bbar_1(0)</math>. ]]
[[File:tikz53fe5393.png | 400px | thumb | ]]
</div>
</div>}
 
\end{picture}\nopagebreak[4]
\begin{fig} Slice of the unit ball <math>\Bbar_1(0)</math>.\end{fig}
\end{center}
Now we employ Cavalieri's Principle and get
Now we employ Cavalieri's Principle and get


<span id="Caval-1"/>
<span id{{=}}"Caval-1"/>
<math display="block">
<math display="block">
\begin{equation}\label{Caval-1}
\begin{equation}\label{Caval-1}
Line 171: Line 128:
\end{equation}
\end{equation}
</math>
</math>


Next we estimate the integral at the end of \eqref{Caval-1} in two ways. Firstly, we put <math>\delta=0</math> and observe that <math>0\leqslant x_1\leqslant1/\sqrt{d-1}</math> implies <math>x_1^2\leqslant1/(d-1)</math>. This in turn leads to <math>1-1/(d-1)\leqslant 1-x_1^2</math> which implies <math>(1-1/(d-1))^{(d-1)/2}\leqslant (1-x_1^2)^{(d-1)/2}</math>. It follows
Next we estimate the integral at the end of \eqref{Caval-1} in two ways. Firstly, we put <math>\delta=0</math> and observe that <math>0\leqslant x_1\leqslant1/\sqrt{d-1}</math> implies <math>x_1^2\leqslant1/(d-1)</math>. This in turn leads to <math>1-1/(d-1)\leqslant 1-x_1^2</math> which implies <math>(1-1/(d-1))^{(d-1)/2}\leqslant (1-x_1^2)^{(d-1)/2}</math>. It follows


<span id="Caval-2"/>
<span id{{=}}"Caval-2"/>
<math display="block">
<math display="block">
\begin{equation}\label{Caval-2}
\begin{equation}\label{Caval-2}
Line 189: Line 145:
where we used Bernoulli's inequality in the last estimate as <math>(d-1)/2\geqslant1</math> holds due to our assumption <math>d\geqslant3</math>. Secondly, we estimate the integral for <math>\delta=\epsilon > 0</math> from above. We employ <math>1-x_1^2\leqslant \exp(-x_1^2)</math> from [[#LEM-9-2 |Lemma]] and <math>x_1/\epsilon\geqslant1</math> to get
where we used Bernoulli's inequality in the last estimate as <math>(d-1)/2\geqslant1</math> holds due to our assumption <math>d\geqslant3</math>. Secondly, we estimate the integral for <math>\delta=\epsilon > 0</math> from above. We employ <math>1-x_1^2\leqslant \exp(-x_1^2)</math> from [[#LEM-9-2 |Lemma]] and <math>x_1/\epsilon\geqslant1</math> to get


<span id="Caval-3"/>
<span id{{=}}"Caval-3"/>
<math display="block">
<math display="block">
\begin{equation}\label{Caval-3}
\begin{equation}\label{Caval-3}
Line 200: Line 156:
\end{equation}
\end{equation}
</math>
</math>


Combining \eqref{Caval-1}--\eqref{Caval-3} we obtain
Combining \eqref{Caval-1}--\eqref{Caval-3} we obtain
Line 223: Line 178:
|The function <math>f\colon(0,\infty)\rightarrow\mathbb{R}</math>, <math>f(a)=\frac{2}{a}\exp(-\frac{a^2}{2})</math> is strictly decreasing as both of its factors are strictly decreasing and strictly positive. We have <math>f(1)=2/\sqrt{e} > 1</math> and <math>f(2)=1/\e^2 < 1</math>. Consequently there is <math>a_0 > 0</math> such that <math>f(a_0)=1</math> and necessarily <math>1 < a_0 < 1</math>. For <math>a > a_0</math> we get <math>f(a) < 1</math> and for <math>a < a_0</math> we get <math>f(a) > 1</math>, so <math>a_0</math> is unique.}}
|The function <math>f\colon(0,\infty)\rightarrow\mathbb{R}</math>, <math>f(a)=\frac{2}{a}\exp(-\frac{a^2}{2})</math> is strictly decreasing as both of its factors are strictly decreasing and strictly positive. We have <math>f(1)=2/\sqrt{e} > 1</math> and <math>f(2)=1/\e^2 < 1</math>. Consequently there is <math>a_0 > 0</math> such that <math>f(a_0)=1</math> and necessarily <math>1 < a_0 < 1</math>. For <math>a > a_0</math> we get <math>f(a) < 1</math> and for <math>a < a_0</math> we get <math>f(a) > 1</math>, so <math>a_0</math> is unique.}}


Substituting <math>a=\epsilon\sqrt{d-1}</math> shows that the factor in [[#WC-THM |Theorem]] is strictly positive if and only if <math>\epsilon > a_0/\sqrt{d-1}</math> holds with some <math>a_0\in(1,2)</math>, compare [[#Pa0 |Problem]]. That is, as bigger as we choose the dimension, the smaller we can make <math>\epsilon > 0</math>.
Substituting <math>a=\epsilon\sqrt{d-1}</math> shows that the factor in [[#WC-THM |Theorem]] is strictly positive if and only if <math>\epsilon > a_0/\sqrt{d-1}</math> holds with some <math>a_0\in(1,2)</math>, compare [[exercise:1cae9d071a |Problem]]. That is, as bigger as we choose the dimension, the smaller we can make <math>\epsilon > 0</math>.
\smallskip
 
We however ''cannot'' let <math>\epsilon</math> tend to zero for a fixed dimension <math>d</math>. The interpretation of [[#WC-THM |Theorem]], i.e., that <math>1-\frac{2}{\epsilon\sqrt{d-1}}\exp({-\frac{\epsilon^2(d-1)}{2}})</math> fraction of the volume of the <math>d</math>-dimensional unit ball is located <math>\epsilon</math>-close to the equator, is only valid if <math>\epsilon</math> is not too small. Although this might right now be fairly obvious, we point out that in contrast to the ‘typical epsilontics’, <math>\epsilon > 0</math> cannot be thought of being arbitrarily small, but what we can choose for it depends on the space's dimension to be large enough.
We however ''cannot'' let <math>\epsilon</math> tend to zero for a fixed dimension <math>d</math>. The interpretation of [[#WC-THM |Theorem]], i.e., that <math>1-\frac{2}{\epsilon\sqrt{d-1}}\exp({-\frac{\epsilon^2(d-1)}{2}})</math> fraction of the volume of the <math>d</math>-dimensional unit ball is located <math>\epsilon</math>-close to the equator, is only valid if <math>\epsilon</math> is not too small. Although this might right now be fairly obvious, we point out that in contrast to the ‘typical epsilontics’, <math>\epsilon > 0</math> cannot be thought of being arbitrarily small, but what we can choose for it depends on the space's dimension to be large enough.
\smallskip
 
For later use we note the following reformulation of [[#WC-THM |Theorem]].
For later use we note the following reformulation of [[#WC-THM |Theorem]].
{{proofcard|Corollary|WC-COR|Let <math>d\geqslant3</math> and <math>\epsilon > 0</math>. Then we have
{{proofcard|Corollary|WC-COR|Let <math>d\geqslant3</math> and <math>\epsilon > 0</math>. Then we have
Line 233: Line 188:
\frac{\lambda^d(\{(x_1,\dots,x_d)\in\Bbar_1(0)\:;\:|x_1| > \epsilon\})}{\lambda^d(\Bbar_1(0))} \leqslant\smallfrac{2}{\epsilon\sqrt{d-1}}\exp\bigl({-\smallfrac{\epsilon^2(d-1)}{2}}\bigr)
\frac{\lambda^d(\{(x_1,\dots,x_d)\in\Bbar_1(0)\:;\:|x_1| > \epsilon\})}{\lambda^d(\Bbar_1(0))} \leqslant\smallfrac{2}{\epsilon\sqrt{d-1}}\exp\bigl({-\smallfrac{\epsilon^2(d-1)}{2}}\bigr)
</math>
</math>
for every <math>\epsilon > 0</math>.
for every <math>\epsilon > 0</math>.|In the notation of [[#WC-THM |Theorem]] we have
|In the notation of [[#WC-THM |Theorem]] we have


<math display="block">
<math display="block">
Line 281: Line 235:
holds.
holds.
|We first consider the case of two points <math>x, y\in\Bbar_1(0)</math>. As we are interested only in their scalar product it should make no difference if we draw both points at random or if we treat the first point's direction as fixed (but the length still chosen at random) and draw only the second one at random. Since <math>\lambda^d(\{0\})=0</math>, the first points will be non-zero with probability one and we can thus arrange a new coordinate system, in which the unit ball is still the same, but in which our first points is directed towards the (new) north pole of the ball. In these coordinates we can then apply the waist concentration theorem to see that the other point is likely to be located close to the new equator. This means in turn that the scalar product of both points will be close to zero.
|We first consider the case of two points <math>x, y\in\Bbar_1(0)</math>. As we are interested only in their scalar product it should make no difference if we draw both points at random or if we treat the first point's direction as fixed (but the length still chosen at random) and draw only the second one at random. Since <math>\lambda^d(\{0\})=0</math>, the first points will be non-zero with probability one and we can thus arrange a new coordinate system, in which the unit ball is still the same, but in which our first points is directed towards the (new) north pole of the ball. In these coordinates we can then apply the waist concentration theorem to see that the other point is likely to be located close to the new equator. This means in turn that the scalar product of both points will be close to zero.
\begin{center}
 
\begin{picture}(190,100)(0,0)
<div class{{=}}"d-flex justify-content-center">
\put(0,30){\rotatebox{-20}{<div class="d-flex justify-content-center">
[[File:tikz6e5b4b9b.png | 400px | thumb | We apply he Waist Concentration Theorem\\with respect to the new north pole. ]]
[[File:tikz6e5b4b9b.png | 400px | thumb | ]]
</div>
</div>}}
 
\end{picture}\nopagebreak[4]
\begin{fig} We apply he Waist Concentration Theorem\\with respect to the new north pole.\end{fig}
\end{center}


To make the above idea formal, we put <math>\epsilon=\sqrt{6\ln n}/\sqrt{d-1}</math> and consider two random variables <math>X,Y\colon\Omega\rightarrow\mathbb{R}^d</math> with <math>X,Y\sim\mathcal{U}(\Bbar_1(0))</math>. Then we can find for each <math>\omega_1\in\Omega</math> with <math>X(\omega_1)\not=0</math> an orthogonal linear map <math>T(\omega)\colon\mathbb{R}^d\rightarrow\mathbb{R}^d</math> with <math>T(\omega_1)X(\omega_1)=(\|X(\omega_1)\|,0,\dots,0)</math>. Indeed, we can put <math>f^{\scriptscriptstyle(1)}=X(\omega)/\|X(\omega)\|</math> and extend to an orthonormal basis <math>\mathcal{F}=\{ f^{\scriptscriptstyle(1)}, f^{\scriptscriptstyle(2)},\dots,f^{\scriptscriptstyle(d)}\}</math> by first applying the basis extension theorem and then the Gram-Schmidt procedure. The map <math>T(\omega_1)</math> is then defined as the linear extension of <math>T(\omega_1)f^{\scriptscriptstyle(j)}:=e^{\scriptscriptstyle(j)}</math> where the <math>e^{\scriptscriptstyle(j)}</math> are the standard unit vectors. We obtain <math>T(\omega_1)X(\omega_1)=T(\omega_1)(\|X(\omega_1)\|f^{\scriptscriptstyle(1)})=\|X(\omega_1)\|e^{\scriptscriptstyle(1)}</math> and thus
To make the above idea formal, we put <math>\epsilon=\sqrt{6\ln n}/\sqrt{d-1}</math> and consider two random variables <math>X,Y\colon\Omega\rightarrow\mathbb{R}^d</math> with <math>X,Y\sim\mathcal{U}(\Bbar_1(0))</math>. Then we can find for each <math>\omega_1\in\Omega</math> with <math>X(\omega_1)\not=0</math> an orthogonal linear map <math>T(\omega)\colon\mathbb{R}^d\rightarrow\mathbb{R}^d</math> with <math>T(\omega_1)X(\omega_1)=(\|X(\omega_1)\|,0,\dots,0)</math>. Indeed, we can put <math>f^{\scriptscriptstyle(1)}=X(\omega)/\|X(\omega)\|</math> and extend to an orthonormal basis <math>\mathcal{F}=\{ f^{\scriptscriptstyle(1)}, f^{\scriptscriptstyle(2)},\dots,f^{\scriptscriptstyle(d)}\}</math> by first applying the basis extension theorem and then the Gram-Schmidt procedure. The map <math>T(\omega_1)</math> is then defined as the linear extension of <math>T(\omega_1)f^{\scriptscriptstyle(j)}:=e^{\scriptscriptstyle(j)}</math> where the <math>e^{\scriptscriptstyle(j)}</math> are the standard unit vectors. We obtain <math>T(\omega_1)X(\omega_1)=T(\omega_1)(\|X(\omega_1)\|f^{\scriptscriptstyle(1)})=\|X(\omega_1)\|e^{\scriptscriptstyle(1)}</math> and thus
Line 327: Line 278:
</math>
</math>
as desired.}}
as desired.}}
If we apply the estimate  of [[#UO-THM |Theorem]] for only two points <math>x</math>, <math>y\in\Bbar_1(0)</math> we get only a probability of greater or equal to <math>1/2</math> for <math>\langle{}x,y\rangle{}</math> being smaller than <math>\sqrt{6\ln 2}/\sqrt{d-1}</math>. Going through the proof of of the theorem yields however that  
If we apply the estimate  of [[#UO-THM |Theorem]] for only two points <math>x</math>, <math>y\in\Bbar_1(0)</math> we get only a probability of greater or equal to <math>1/2</math> for <math>\langle{}x,y\rangle{}</math> being smaller than <math>\sqrt{6\ln 2}/\sqrt{d-1}</math>. Going through the proof of of the theorem yields however that  


Line 334: Line 286:
from where we can see that for fixed <math>\epsilon > 0</math> and <math>d</math> large enough the probability for the scalar product to be <math>\epsilon</math>--small will be close to one which means that two points picked at random from the high dimensional unit ball will be almost orthogonal with high probability.
from where we can see that for fixed <math>\epsilon > 0</math> and <math>d</math> large enough the probability for the scalar product to be <math>\epsilon</math>--small will be close to one which means that two points picked at random from the high dimensional unit ball will be almost orthogonal with high probability.


\smallskip
We remark here, that when interpreting the scalar product as a measure for how close to orthogonal two vectors are, one usually does normalize the vectors first. In the above result we did not do so, since, by [[#UA-THM |Theorem]] we know that the points considered in [[#UO-THM |Theorem]] will have norm close to one with high probability, anyway. We leave a formal proof as [[exercise:8dbb0c4f40 |Problem]].
We remark here, that when interpreting the scalar product as a measure for how close to orthogonal two vectors are, one usually does normalize the vectors first. In the above result we did not do so, since, by [[#UA-THM |Theorem]] we know that the points considered in [[#UO-THM |Theorem]] will have norm close to one with high probability, anyway. We leave a formal proof as [[#PROB-NORMALIZED-UA-THM |Problem]].
 
\section*{Problems}
 
\smallskip
 
 
 


==General references==
==General references==
{{cite arXiv|last1=Wegner|first1=Sven-Ake|year=2024|title=Lecture Notes on High-Dimensional Data|eprint=2101.05841|class=math.FA}}
{{cite arXiv|last1=Wegner|first1=Sven-Ake|year=2024|title=Lecture Notes on High-Dimensional Data|eprint=2101.05841|class=math.FA}}

Latest revision as of 00:05, 2 June 2024

[math] \newcommand{\smallfrac}[2]{\frac{#1}{#2}} \newcommand{\medfrac}[2]{\frac{#1}{#2}} \newcommand{\textfrac}[2]{\frac{#1}{#2}} \newcommand{\tr}{\operatorname{tr}} \newcommand{\e}{\operatorname{e}} \newcommand{\B}{\operatorname{B}} \newcommand{\Bbar}{\overline{\operatorname{B}}} \newcommand{\pr}{\operatorname{pr}} \newcommand{\dd}{\operatorname{d}\hspace{-1pt}} \newcommand{\E}{\operatorname{E}} \newcommand{\V}{\operatorname{V}} \newcommand{\Cov}{\operatorname{Cov}} \newcommand{\Bigsum}[2]{\mathop{\textstyle\sum}_{#1}^{#2}} \newcommand{\ran}{\operatorname{ran}} \newcommand{\card}{\#} \renewcommand{\P}{\operatorname{P}} \renewcommand{\L}{\operatorname{L}} \newcommand{\mathds}{\mathbb} [/math]

In this chapter we denote by [math]\lambda^d[/math] the [math]d[/math]-dimensional Lebesgue measure on [math]\mathbb{R}^d[/math]. The euclidean norm we denote by [math]\|\cdot\|[/math] and balls with respect to this norm are denoted by [math]\B_r(x)[/math] and [math]\Bbar_r(x)[/math] respectively.

We start with the following observation. Let [math]A\subseteq\mathbb{R}^d[/math] be a measurable star-shaped set and assume that zero is a center. Then [math](1-\epsilon)A\subseteq A[/math] and we can interpret [math]A\backslash(1-\epsilon)A[/math] as that part of [math]A[/math] that is close to its surface.

Volume close to the surface of a ‘nice’ subset of [math]\mathbb{R}^2[/math].

If we compute the measure of the latter set, then we get

[[math]] \lambda^d(A\backslash(1-\epsilon)A)=\lambda^d(A)-\lambda^d((1-\epsilon)A) = \lambda^d(A)-(1-\epsilon)^d\lambda^d(A) [[/math]]

and thus

[[math]] \smallfrac{\lambda^d(A\backslash(1-\epsilon)A)}{\lambda^d(A)}=(1-\epsilon)^d [[/math]]

which will be close to one if [math]\epsilon[/math] is small and [math]d[/math] is large. This means that in this case the measure of [math]A[/math] is located close to the surface of [math]A[/math] whereas its center contributes only little to its volume. Our first aim in this section is to quantify the latter observation in the case that [math]A[/math] is the unit ball.

Lemma

For [math]d\geqslant 1[/math] and [math]0 \lt \epsilon\leqslant1[/math] we have [math](1-\epsilon)^d\leqslant \exp(-\epsilon d)[/math].


Show Proof

Define [math]f\colon[0,1]\rightarrow\mathbb{R}[/math] via [math]f(x)=\exp(-x)+x-1[/math]. Then [math]f(0)=0[/math] and [math]f'(x)=1-\exp(-x)\geqslant0[/math] holds for all [math]x\in[0,1][/math]. Therefore [math]f(x)\geqslant0[/math] and thus [math]\exp(-x)\geqslant1-x\geqslant0[/math] holds for all [math]x\in[0,1][/math]. Evaluating at [math]x=\epsilon[/math] and taking the [math]d[/math]--th power finishes the proof.

Using the lemma we can now prove that most of the unit ball's volume is located close to its surface. This is very reasonable, since if we think of the ball consisting of many thin shells all of the same depth but with different radii, then the outer ones will have (in high dimensions significantly!) larger volume then the inner ones.

Theorem

(Surface Concentration Theorem) Let [math]d\geqslant1[/math] and [math]0 \lt \epsilon\leqslant1[/math]. Then we have

[[math]] \lambda^d(\Bbar_1(0)\backslash \B_{1-\epsilon}(0))\geqslant (1-\exp(-\epsilon d))\cdot\lambda^d(\Bbar_1(0)), [[/math]]
i.e., at least [math]1-\exp(-\epsilon d)[/math] fraction of the volume of the [math]d[/math]-dimensional unit ball is located [math]\epsilon[/math]-close to its surface.

Volume close to the surface of the unit ball.


Show Proof

We compute

[[math]] \begin{align*} \frac{\lambda^d(\Bbar_1(0)\backslash{}\B_{1-\epsilon}(0))}{\lambda^d(\Bbar_1(0))} &= \frac{\lambda^d(\Bbar_1(0))-\lambda^d(\B_{1-\epsilon}(0))}{\lambda^d(\Bbar_1(0))} = 1-\frac{\lambda^d((1-\epsilon)\B_{1}(0))}{\lambda^d(\Bbar_1(0))}\\ & = 1-\frac{(1-\epsilon)^d\lambda^d(\B_{1}(0))}{\lambda^d(\B_1(0))} = 1-(1-\epsilon)^d\geqslant 1-\exp(-\epsilon d) \end{align*} [[/math]]
where we used Lemma for the final estimate.

For the second concentration theorem we need the following lemma.

Lemma

For [math]d\geqslant3[/math] we have [math]\lambda^d(\Bbar_1(0))=\frac{2\pi}{d}\cdot\lambda^{d-2}(\Bbar_1(0))[/math] where the unit balls are taken in [math]\mathbb{R}^d[/math] and [math]\mathbb{R}^{d-2}[/math], respectively.


Show Proof

Since [math]\B_1(0)=\{(x_1,\dots,x_d)\in\mathbb{R}^d\:;\:x_1^2+\cdots+x_d^2\leqslant1\}[/math] we may compute

[[math]] \begin{align*} \lambda^d(\Bbar_1(0)) & = \int_{\Bbar_1(0)}1\dd\lambda^d\\ & = \int_{x_1^2+\cdots+x_d^2\leqslant1}1\dd\lambda^d(x_1,\dots,x_d)\\ & = \int_{x_1^2+x_2^2\leqslant1} \Big(\int_{x_3^2+\cdots+x_d^2\leqslant1-x_1^2-x_2^2}1\dd\lambda^{d-2}(x_3,\dots,x_d)\Big)d\lambda^2(x_1,x_2)\\ & = \int_{x_1^2+x_2^2\leqslant1} \lambda^{d-2}(\Bbar_{(1-x_1^2-x_2^2)^{1/2}}(0))\dd\lambda^2(x_1,x_2)\\ & = \int_{x_1^2+x_2^2\leqslant1}(1-x_1^2-x_2^2)^{(d-2)/2}\lambda^{d-2}(\Bbar_1(0))\dd\lambda^2(x_1,x_2)\\ & = \lambda^{d-2}\bigl(\Bbar_1(0)\bigr)\cdot\int_0^{2\pi}\int_0^1(1-r^2)^{(d-2)/2}\,r\dd r \dd\varphi\\ & = \lambda^{d-2}(\Bbar_1(0))\cdot\int_0^{2\pi} \dd\varphi \cdot \int_1^0u^{(d-2)/2} (-1/2) \dd u\\ & = \lambda^{d-2}(\Bbar_1(0)) \cdot 2\pi \cdot \int_0^1u^{d/2-1}/2 \dd u\\ & = \lambda^{d-2}(\Bbar_1(0)) \cdot \pi \cdot \medfrac{u^{d/2}}{d/2}\Big|_0^1 = \medfrac{2\pi}{d}\cdot\lambda^{d-2}(\Bbar_1(0)) \end{align*} [[/math]]
where we used Fubini's Theorem for the third and transformation into polar coordinates for the sixth equality.

After we have seen that most of the volume of the unit ball is concentrated close to its surface we show in the next result that if we designate one point of the surface as the north pole, then most of the unit ball's volume will be concentrated at its waist. As with the surface concentration, this is not surprising, since if we go from the center in the direction of the north pole, then the ball becomes thinner in all of the [math]d-1[/math] remaining dimensions. The precise result is as follows.

Theorem

(Waist Concentration Theorem) Let [math]d\geqslant3[/math] and [math]\epsilon \gt 0[/math]. Then we have

[[math]] \lambda^d(W_{\epsilon})\geqslant\bigl(1-\medfrac{2}{\epsilon\sqrt{d-1}}\exp\bigl({-\medfrac{\epsilon^2(d-1)}{2}}\bigr)\bigr)\cdot\lambda^d(\Bbar_1(0)) [[/math]]
where [math]W_{\epsilon}=\bigl\{(x_1,\dots,x_d)\in \Bbar_1(0)\:;\:|x_1|\leqslant\epsilon\bigr\}[/math].

Most of the volume is located close to the equator of the unit ball.

Show Proof

We first observe that the statement is trivial for [math]\epsilon\geqslant1[/math], so we may assume [math]\epsilon \lt 1[/math] for the rest of the proof. We now compute the volume of that part

[[math]] H^{\delta} = \{(x_1,\dots,x_d)\in \Bbar_1(0)\:;\:x_1\geqslant\delta\} [[/math]]
of the unit ball that is located above [math]\delta\geqslant0[/math] in [math]x_1[/math]-direction. We observe that for [math]x_1\in[\delta,1][/math] the corresponding section of [math]H^{\delta}[/math] equals

[[math]] \begin{align*} H_{x_1}^{\delta} &=\bigl\{(x_2,\dots,x_d)\in\mathbb{R}^{d-1}\:;\:(x_1,x_2,\dots,x_d)\in H^{\delta}\bigr\}\\ & = \bigl\{(x_2,\dots,x_d)\in\mathbb{R}^{d-1}\:;\:\|(x_1,x_2,\dots,x_d)\|\leqslant1\bigr\}\\ & =\bigl\{y\in\mathbb{R}^{d-1}\:;\:x_1^2+\|y\|^2\leqslant1\bigr\}\\ & = \Bbar_{(1-x_1^2)^{1/2}}(0) \end{align*} [[/math]]
where the ball is taken in [math]\mathbb{R}^{d-1}[/math], compare the picture below.

Slice of the unit ball [math]\Bbar_1(0)[/math].

Now we employ Cavalieri's Principle and get

[[math]] \begin{equation}\label{Caval-1} \begin{aligned} \lambda^d\bigl(H^{\delta}\bigr) & = \int_{\delta}^1\lambda^{d-1}(H^{\delta}_{x_1})\dd\lambda^1(x_1)\\ & = \int_{\delta}^1\lambda^{d-1}(\Bbar_{(1-x_1^2)^{1/2}}(0))\dd\lambda^1(x_1)\\ & = \int_{\delta}^1(1-x_1^2)^{(d-1)/2}\lambda^{d-1}(\Bbar_1(0))\dd\lambda^1(x_1)\\ & = \lambda^{d-1}(\Bbar_1(0))\cdot\int_{\delta}^1(1-x_1^2)^{(d-1)/2}\dd\lambda^1(x_1). \end{aligned} \end{equation} [[/math]]

Next we estimate the integral at the end of \eqref{Caval-1} in two ways. Firstly, we put [math]\delta=0[/math] and observe that [math]0\leqslant x_1\leqslant1/\sqrt{d-1}[/math] implies [math]x_1^2\leqslant1/(d-1)[/math]. This in turn leads to [math]1-1/(d-1)\leqslant 1-x_1^2[/math] which implies [math](1-1/(d-1))^{(d-1)/2}\leqslant (1-x_1^2)^{(d-1)/2}[/math]. It follows

[[math]] \begin{equation}\label{Caval-2} \begin{aligned} \int_{0}^1(1-x_1^2)^{(d-1)/2}\dd\lambda^1(x_1) & \geqslant \int_{0}^{1/\sqrt{d-1}}(1-x_1^2)^{(d-1)/2}\dd\lambda^1(x_1)\\ & \geqslant \int_{0}^{1/\sqrt{d-1}}(1-1/(d-1))^{(d-1)/2}\dd\lambda^1(x_1)\\ & = (1-1/(d-1))^{(d-1)/2}\cdot\medfrac{1}{\sqrt{d-1}}\\ & \geqslant \left(1-\smallfrac{d-1}{2}\cdot\smallfrac{1}{d-1}\right)\cdot\medfrac{1}{\sqrt{d-1}}\\ & = \medfrac{1}{2\sqrt{d-1}} \end{aligned} \end{equation} [[/math]]
where we used Bernoulli's inequality in the last estimate as [math](d-1)/2\geqslant1[/math] holds due to our assumption [math]d\geqslant3[/math]. Secondly, we estimate the integral for [math]\delta=\epsilon \gt 0[/math] from above. We employ [math]1-x_1^2\leqslant \exp(-x_1^2)[/math] from Lemma and [math]x_1/\epsilon\geqslant1[/math] to get

[[math]] \begin{equation}\label{Caval-3} \begin{aligned} \int_{\epsilon}^1(1-x_1^2)^{(d-1)/2}\dd\lambda^1(x_1) & \leqslant \int_{\epsilon}^1(\exp(-x_1^2))^{(d-1)/2}\,(x_1/\epsilon)\,\dd\lambda^1(x_1)\\ & = \medfrac{1}{\epsilon} \cdot \int_{\epsilon}^1x_1\exp(-{\textstyle\frac{d-1}{2}}x_1^2)\dd\lambda^1(x_1)\\ & = \medfrac{1}{\epsilon}\cdot\frac{\exp({\textstyle-\frac{d-1}{2}})- \exp({\textstyle-\frac{\epsilon^2(d-1)}{2}})}{-(d-1)}\\ & \leqslant \frac{\exp({\textstyle-\frac{\epsilon^2(d-1)}{2}})}{\epsilon(d-1)}. \end{aligned} \end{equation} [[/math]]

Combining \eqref{Caval-1}--\eqref{Caval-3} we obtain

[[math]] \lambda^d(H^0)\geqslant\frac{\lambda^{d-1}(\Bbar_1(0))}{2\sqrt{d-1}} \; \text{ and } \; \lambda^d(H^{\epsilon})\leqslant \lambda^{d-1}(\Bbar_1(0)) \frac{\exp({\textstyle-\frac{\epsilon^2(d-1)}{2}})}{\epsilon(d-1)}. [[/math]]

To finish the proof we observe that [math]H^0[/math] is the upper half of the unit ball and [math]H^0\backslash{}H^{\epsilon}[/math] is the upper half of [math]W_{\epsilon}[/math]. Therefore we derive from the above

[[math]] \begin{align*} \frac{\lambda^d(W_{\epsilon})}{\lambda^d(\Bbar_1(0))} & = \frac{2\cdot\lambda^d(H^0\backslash{}H^{\epsilon})}{2\cdot\lambda^d(H^0)} = \frac{\lambda^d(H^0)-\lambda^d(H^{\epsilon})}{\lambda^d(H^0)} = 1-\frac{\lambda^d(H^{\epsilon})}{\lambda^d(H^{0})}\\ & \geqslant 1-\frac{2\sqrt{d-1}\exp({\textstyle-\frac{\epsilon^2(d-1)}{2}})}{\epsilon(d-1)} = 1-\smallfrac{2}{\epsilon\sqrt{d-1}}\exp\bigl({-\smallfrac{\epsilon^2(d-1)}{2}}\bigr) \end{align*} [[/math]]
as desired.

The estimate in Theorem is only of value if the factor on the right hand side is strictly positive. The following lemma explains when this happens.

Lemma

There exists a unique [math]a_0 \gt 0[/math] such that [math]\frac{2}{a}\exp(-\frac{a^2}{2}) \lt 1[/math] holds for every [math]a \gt a_0[/math]. We have [math]a_0\in(1,2)[/math].


Show Proof

The function [math]f\colon(0,\infty)\rightarrow\mathbb{R}[/math], [math]f(a)=\frac{2}{a}\exp(-\frac{a^2}{2})[/math] is strictly decreasing as both of its factors are strictly decreasing and strictly positive. We have [math]f(1)=2/\sqrt{e} \gt 1[/math] and [math]f(2)=1/\e^2 \lt 1[/math]. Consequently there is [math]a_0 \gt 0[/math] such that [math]f(a_0)=1[/math] and necessarily [math]1 \lt a_0 \lt 1[/math]. For [math]a \gt a_0[/math] we get [math]f(a) \lt 1[/math] and for [math]a \lt a_0[/math] we get [math]f(a) \gt 1[/math], so [math]a_0[/math] is unique.

Substituting [math]a=\epsilon\sqrt{d-1}[/math] shows that the factor in Theorem is strictly positive if and only if [math]\epsilon \gt a_0/\sqrt{d-1}[/math] holds with some [math]a_0\in(1,2)[/math], compare Problem. That is, as bigger as we choose the dimension, the smaller we can make [math]\epsilon \gt 0[/math].

We however cannot let [math]\epsilon[/math] tend to zero for a fixed dimension [math]d[/math]. The interpretation of Theorem, i.e., that [math]1-\frac{2}{\epsilon\sqrt{d-1}}\exp({-\frac{\epsilon^2(d-1)}{2}})[/math] fraction of the volume of the [math]d[/math]-dimensional unit ball is located [math]\epsilon[/math]-close to the equator, is only valid if [math]\epsilon[/math] is not too small. Although this might right now be fairly obvious, we point out that in contrast to the ‘typical epsilontics’, [math]\epsilon \gt 0[/math] cannot be thought of being arbitrarily small, but what we can choose for it depends on the space's dimension to be large enough.

For later use we note the following reformulation of Theorem.

Corollary

Let [math]d\geqslant3[/math] and [math]\epsilon \gt 0[/math]. Then we have

[[math]] \frac{\lambda^d(\{(x_1,\dots,x_d)\in\Bbar_1(0)\:;\:|x_1| \gt \epsilon\})}{\lambda^d(\Bbar_1(0))} \leqslant\smallfrac{2}{\epsilon\sqrt{d-1}}\exp\bigl({-\smallfrac{\epsilon^2(d-1)}{2}}\bigr) [[/math]]
for every [math]\epsilon \gt 0[/math].

Show Proof

In the notation of Theorem we have

[[math]] \begin{align*} \frac{\lambda^d(\{(x_1,\dots,x_d)\in\Bbar_1(0)\:;\:|x_1| \gt \epsilon\})}{\lambda^d(\Bbar_1(0))} &= \frac{\lambda^d(\Bbar_1(0)\backslash W_{\epsilon})}{\lambda^d(\Bbar_1(0))}= \frac{\lambda^d(\Bbar_1(0))-\lambda^d(W_{\epsilon})}{\lambda^d(\Bbar_1(0))}\\ &\leqslant 1 - \Bigr( 1- \smallfrac{2}{\epsilon\sqrt{d-1}}\exp\bigl({-\smallfrac{\epsilon^2(d-1)}{2}}\bigr)\Bigl)\\ & = \smallfrac{2}{\epsilon\sqrt{d-1}}\exp\bigl(-\smallfrac{\epsilon^2(d-1)}{2}\bigr) \end{align*} [[/math]]
as claimed.

Now we will use the volume estimates from above in order to establish where points that we draw at random from the [math]d[/math]-dimensional unit ball will be located and what mutual angles between the points we can expect. To make our proofs formal, we will employ in the rest of the chapter a probability space [math](\Omega,\Sigma,\P)[/math] and random variables [math]X\colon\Omega\rightarrow\mathbb{R}^d[/math] with distribution [math]X\sim\mathcal{U}(\Bbar_1(0))[/math], compare Definition.

Theorem

Let [math]d\geqslant3[/math] and assume that we draw [math]n\geqslant2[/math] points [math]x^{\scriptscriptstyle(1)},\dots,x^{\scriptscriptstyle(n)}[/math] at random from the [math]d[/math]-dimensional unit ball. Then

[[math]] \P\bigl[\|x^{\scriptscriptstyle(j)}\|\geqslant 1-\medfrac{2\ln n}{d} \text{ for all }j=1,\dots,n\bigr]\geqslant 1-\medfrac{1}{n} [[/math]]
holds.


Show Proof

In view of the Surface Concentration Theorem, most of the unit ball's volume is located close to its surface and thus a point chosen uniformly at random is likely to be close to the surface. For a formal proof, we put [math]\epsilon=\frac{2\ln n}{d}[/math] and consider a random vector [math]X\sim\mathcal{U}(\Bbar_1(0))[/math]. For fixed [math]1\leqslant j\leqslant n[/math] we employ Theorem to get

[[math]] \P\bigl[\|X^{\scriptscriptstyle(j)}\| \lt 1-\epsilon\bigr]=\frac{\lambda^d(\B_{1-\epsilon}(0))}{\lambda^d(\Bbar_{1}(0))}\leqslant\exp(-\epsilon{}d)=\exp(-2\ln n)=1/n^2. [[/math]]
This implies

[[math]] \begin{align*} \P\bigl[\forall\:j\colon\|X^{\scriptscriptstyle(j)}\|\geqslant 1-\epsilon\bigr] &=1-\P\bigl[\exists\:j\colon\|X^{\scriptscriptstyle(j)}\|\leqslant 1-\epsilon\bigr]\\ &= 1-\Bigsum{j=1}{n}1/n^2=1-\medfrac{1}{n} \end{align*} [[/math]]
as claimed.

Although we get a trivial estimate if we sub [math]n=1[/math] into the Theorem, its proof showed that

[[math]] \P\bigl[\|x\|\geqslant 1-\epsilon\bigr] \geqslant 1-\exp(-\epsilon d) [[/math]]

holds, so if [math]d[/math] is large enough and [math]\epsilon \gt 0[/math] is not too small, a single point chosen at random will be [math]\epsilon[/math]--close to the surface of [math]\Bbar_1(0)[/math] with a high probability. We continue by studying angles between points.

Theorem

Let [math]d\geqslant3[/math] and assume that we draw [math]n\geqslant2[/math] points [math]x^{\scriptscriptstyle(1)},\dots,x^{\scriptscriptstyle(n)}[/math] uniformly at random from the [math]d[/math]-dimensional unit ball. Then

[[math]] \P\bigl[\bigl|\langle{}x^{\scriptscriptstyle(j)},x^{\scriptscriptstyle(k)}\rangle{}\bigr|\leqslant\medfrac{\sqrt{6\ln n}}{\sqrt{d-1}} \text{ for all }j\not=k\bigr]\geqslant 1-\medfrac{1}{n} [[/math]]
holds.


Show Proof

We first consider the case of two points [math]x, y\in\Bbar_1(0)[/math]. As we are interested only in their scalar product it should make no difference if we draw both points at random or if we treat the first point's direction as fixed (but the length still chosen at random) and draw only the second one at random. Since [math]\lambda^d(\{0\})=0[/math], the first points will be non-zero with probability one and we can thus arrange a new coordinate system, in which the unit ball is still the same, but in which our first points is directed towards the (new) north pole of the ball. In these coordinates we can then apply the waist concentration theorem to see that the other point is likely to be located close to the new equator. This means in turn that the scalar product of both points will be close to zero.

We apply he Waist Concentration Theorem\\with respect to the new north pole.


To make the above idea formal, we put [math]\epsilon=\sqrt{6\ln n}/\sqrt{d-1}[/math] and consider two random variables [math]X,Y\colon\Omega\rightarrow\mathbb{R}^d[/math] with [math]X,Y\sim\mathcal{U}(\Bbar_1(0))[/math]. Then we can find for each [math]\omega_1\in\Omega[/math] with [math]X(\omega_1)\not=0[/math] an orthogonal linear map [math]T(\omega)\colon\mathbb{R}^d\rightarrow\mathbb{R}^d[/math] with [math]T(\omega_1)X(\omega_1)=(\|X(\omega_1)\|,0,\dots,0)[/math]. Indeed, we can put [math]f^{\scriptscriptstyle(1)}=X(\omega)/\|X(\omega)\|[/math] and extend to an orthonormal basis [math]\mathcal{F}=\{ f^{\scriptscriptstyle(1)}, f^{\scriptscriptstyle(2)},\dots,f^{\scriptscriptstyle(d)}\}[/math] by first applying the basis extension theorem and then the Gram-Schmidt procedure. The map [math]T(\omega_1)[/math] is then defined as the linear extension of [math]T(\omega_1)f^{\scriptscriptstyle(j)}:=e^{\scriptscriptstyle(j)}[/math] where the [math]e^{\scriptscriptstyle(j)}[/math] are the standard unit vectors. We obtain [math]T(\omega_1)X(\omega_1)=T(\omega_1)(\|X(\omega_1)\|f^{\scriptscriptstyle(1)})=\|X(\omega_1)\|e^{\scriptscriptstyle(1)}[/math] and thus

[[math]] |\langle{}X(\omega_1),Y(\omega_2)\rangle{}| = |\langle{}T(\omega_1)X(\omega_1),T(\omega_1)Y(\omega_2)\rangle{}| = \|X(\omega_1)\|\cdot|\langle{}e^{\scriptscriptstyle(1)},T(\omega_1)Y(\omega_2)\rangle{}| [[/math]]
for [math](\omega_1,\omega_2)\in\Omega\times\Omega[/math] with [math]X(\omega_1)\not=0[/math]. We observe that [math]T(\omega_1)Y\sim\mathcal{U}(\Bbar_1(0))[/math] holds for each fixed [math]\omega_1[/math] and that [math]\langle{}e^{\scriptscriptstyle(1)},T(\omega_1)Y(\omega_2)\rangle{}[/math] is the first coordinate of [math]T(\omega_1)Y(\omega_2)[/math]. Therefore we can apply Corollary to compute

[[math]] \begin{align*} \P\bigl[|\langle{}e^{\scriptscriptstyle(1)},T(\omega_1)Y(\omega_2)\rangle| \gt \epsilon\bigr] & = \frac{\lambda^d(\{(z_1,\dots,z_d)\in\Bbar_1(0)\:;\:|z_1| \gt \epsilon\})}{\lambda^d(\Bbar_1(0))}\\ &\leqslant \medfrac{2}{\epsilon\sqrt{d-1}}\exp\bigl(-\medfrac{\epsilon^2(d-1)}{2}\bigr)\\ & = \medfrac{2}{\sqrt{6\ln n}}\exp\bigl(-\medfrac{6\ln n}{2}\bigr)\\ \phantom{\int}& \leqslant \medfrac{2}{n^3} \end{align*} [[/math]]
where the last estimate holds as [math]1/\sqrt{6\ln n}\leqslant1[/math] is valid for [math]n\geqslant2[/math]. After observing that in the first line below necessarily [math]X(\omega_1)\not=0[/math] holds and by using [math]\|X(\omega_1)\|\leqslant 1[/math], we get

[[math]] \begin{align*} \P\bigl[\bigl|\langle{}X,Y\rangle{}\bigr| \gt \epsilon\bigr] & = \P\bigl(\{(\omega_1,\omega_2)\in\Omega^2\:;\:\bigl|\bigl\langle{}X(\omega_1),Y(\omega_2)\bigr\rangle{}\bigr| \gt \epsilon\}\bigr)\\ \phantom{\int}& =\P\bigl(\{(\omega_1,\omega_2)\in\Omega^2\:;\:\|X(\omega)\|\cdot\bigl|\bigl\langle{}e^{\scriptscriptstyle(1)},T(\omega)Y(\omega)\bigr\rangle{}\bigr| \gt \epsilon\}\bigr)\\ \phantom{\int}& \leqslant\P\bigl(\{(\omega_1,\omega_2)\in\Omega^2\:;\:\bigl|\bigl\langle{}e^{\scriptscriptstyle(1)},T(\omega_1)Y(\omega_2)\bigr\rangle{}\bigr| \gt \epsilon\}\bigr)\\ & = \int_{\Omega} \P\bigl[\bigl|((T(\omega_1)Y(\omega_2))_1\bigr| \gt \epsilon\bigr]\dd\P(\omega_1)\\ &\leqslant \int_{\Omega}1/n^3\dd\P = \medfrac{1}{n^3} \end{align*} [[/math]]
where we employed Cavalieri's Principle and the estimates from above. Let finally, random vectors [math]X^{\scriptscriptstyle(1)},\dots,X^{\scriptscriptstyle(n)}\sim\mathcal{U}(\Bbar_1(0))[/math] be given. Then

[[math]] \begin{align*} \P\bigl[\forall\:j\not=k\colon|\langle{}X^{\scriptscriptstyle(j)},X^{\scriptscriptstyle(k)}\rangle|\leqslant\epsilon\bigr] & = 1- \P\bigl[\exists\:j\not=k\colon\bigl|\langle{}X^{\scriptscriptstyle(j)},X^{\scriptscriptstyle(k)}\rangle\bigr| \gt \epsilon \bigr]\\ & = 1-{\textstyle{n\choose2}}\P\bigl[\bigl|\langle{}X,Y\rangle{}\bigr| \gt \epsilon\bigr]\\ &\geqslant 1-\medfrac{n^2-n}{2}\cdot\medfrac{2}{n^3}\\ &\geqslant 1-\medfrac{1}{n} \end{align*} [[/math]]
as desired.

If we apply the estimate of Theorem for only two points [math]x[/math], [math]y\in\Bbar_1(0)[/math] we get only a probability of greater or equal to [math]1/2[/math] for [math]\langle{}x,y\rangle{}[/math] being smaller than [math]\sqrt{6\ln 2}/\sqrt{d-1}[/math]. Going through the proof of of the theorem yields however that

[[math]] \P\bigl[|\langle{}x,y\rangle{}|\leqslant\epsilon\bigr]\geqslant 1-\medfrac{2}{\epsilon\sqrt{d-1}}\exp\bigl(-\medfrac{\epsilon^2(d-1)}{2}\bigr) [[/math]]

from where we can see that for fixed [math]\epsilon \gt 0[/math] and [math]d[/math] large enough the probability for the scalar product to be [math]\epsilon[/math]--small will be close to one which means that two points picked at random from the high dimensional unit ball will be almost orthogonal with high probability.

We remark here, that when interpreting the scalar product as a measure for how close to orthogonal two vectors are, one usually does normalize the vectors first. In the above result we did not do so, since, by Theorem we know that the points considered in Theorem will have norm close to one with high probability, anyway. We leave a formal proof as Problem.

General references

Wegner, Sven-Ake (2024). "Lecture Notes on High-Dimensional Data". arXiv:2101.05841 [math.FA].