guide:D9c33cd067: Difference between revisions
No edit summary |
mNo edit summary |
||
(3 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
<div class="d-none"><math> | |||
\newcommand{\smallfrac}[2]{\frac{#1}{#2}} | |||
\newcommand{\medfrac}[2]{\frac{#1}{#2}} | |||
\newcommand{\textfrac}[2]{\frac{#1}{#2}} | |||
\newcommand{\tr}{\operatorname{tr}} | |||
\newcommand{\e}{\operatorname{e}} | |||
\newcommand{\B}{\operatorname{B}} | |||
\newcommand{\Bbar}{\overline{\operatorname{B}}} | |||
\newcommand{\pr}{\operatorname{pr}} | |||
\newcommand{\dd}{\operatorname{d}\hspace{-1pt}} | |||
\newcommand{\E}{\operatorname{E}} | |||
\newcommand{\V}{\operatorname{V}} | |||
\newcommand{\Cov}{\operatorname{Cov}} | |||
\newcommand{\Bigsum}[2]{\mathop{\textstyle\sum}_{#1}^{#2}} | |||
\newcommand{\ran}{\operatorname{ran}} | |||
\newcommand{\card}{\#} | |||
\renewcommand{\P}{\operatorname{P}} | |||
\renewcommand{\L}{\operatorname{L}} | |||
\newcommand{\mathds}{\mathbb} | |||
</math> | |||
</div> | |||
In this chapter we denote by <math>\lambda^d</math> the <math>d</math>-dimensional Lebesgue measure on <math>\mathbb{R}^d</math>. The euclidean norm we denote by <math>\|\cdot\|</math> and balls with respect to this norm are denoted by <math>\B_r(x)</math> and <math>\Bbar_r(x)</math> respectively. | |||
We start with the following observation. Let <math>A\subseteq\mathbb{R}^d</math> be a measurable star-shaped set and assume that zero is a center. Then <math>(1-\epsilon)A\subseteq A</math> and we can interpret <math>A\backslash(1-\epsilon)A</math> as that part of <math>A</math> that is close to its surface. | |||
<div class="d-flex justify-content-center"> | |||
[[File:tikzcacb22a4.png | 400px | thumb |Volume close to the surface of a ‘nice’ subset of <math>\mathbb{R}^2</math>.]] | |||
</div> | |||
If we compute the measure of the latter set, then we get | |||
<math display="block"> | |||
\lambda^d(A\backslash(1-\epsilon)A)=\lambda^d(A)-\lambda^d((1-\epsilon)A) = \lambda^d(A)-(1-\epsilon)^d\lambda^d(A) | |||
</math> | |||
and thus | |||
<math display="block"> | |||
\smallfrac{\lambda^d(A\backslash(1-\epsilon)A)}{\lambda^d(A)}=(1-\epsilon)^d | |||
</math> | |||
which will be close to one if <math>\epsilon</math> is small and <math>d</math> is large. This means that in this case the measure of <math>A</math> is located close to the surface of <math>A</math> whereas its center contributes only little to its volume. Our first aim in this section is to quantify the latter observation in the case that <math>A</math> is the unit ball. | |||
{{proofcard|Lemma|LEM-9-1|For <math>d\geqslant 1</math> and <math>0 < \epsilon\leqslant1</math> we have <math>(1-\epsilon)^d\leqslant \exp(-\epsilon d)</math>. | |||
|Define <math>f\colon[0,1]\rightarrow\mathbb{R}</math> via <math>f(x)=\exp(-x)+x-1</math>. Then <math>f(0)=0</math> and <math>f'(x)=1-\exp(-x)\geqslant0</math> holds for all <math>x\in[0,1]</math>. Therefore <math>f(x)\geqslant0</math> and thus <math>\exp(-x)\geqslant1-x\geqslant0</math> holds for all <math>x\in[0,1]</math>. Evaluating at <math>x=\epsilon</math> and taking the <math>d</math>--th power finishes the proof.}} | |||
Using the lemma we can now prove that most of the unit ball's volume is located close to its surface. This is very reasonable, since if we think of the ball consisting of many thin shells all of the same depth but with different radii, then the outer ones will have (in high dimensions significantly!) larger volume then the inner ones. | |||
{{proofcard|Theorem|SC-THM|(Surface Concentration Theorem) Let <math>d\geqslant1</math> and <math>0 < \epsilon\leqslant1</math>. Then we have | |||
<math display="block"> | |||
\lambda^d(\Bbar_1(0)\backslash \B_{1-\epsilon}(0))\geqslant (1-\exp(-\epsilon d))\cdot\lambda^d(\Bbar_1(0)), | |||
</math> | |||
i.e., at least <math>1-\exp(-\epsilon d)</math> fraction of the volume of the <math>d</math>-dimensional unit ball is located <math>\epsilon</math>-close to its surface. | |||
<div class{{=}}"d-flex justify-content-center"> | |||
[[File:tikz4262b634.png | 400px | thumb | Volume close to the surface of the unit ball. ]] | |||
</div> | |||
|We compute | |||
<math display="block"> | |||
\begin{align*} | |||
\frac{\lambda^d(\Bbar_1(0)\backslash{}\B_{1-\epsilon}(0))}{\lambda^d(\Bbar_1(0))} &= \frac{\lambda^d(\Bbar_1(0))-\lambda^d(\B_{1-\epsilon}(0))}{\lambda^d(\Bbar_1(0))} = 1-\frac{\lambda^d((1-\epsilon)\B_{1}(0))}{\lambda^d(\Bbar_1(0))}\\ | |||
& = 1-\frac{(1-\epsilon)^d\lambda^d(\B_{1}(0))}{\lambda^d(\B_1(0))} = 1-(1-\epsilon)^d\geqslant 1-\exp(-\epsilon d) | |||
\end{align*} | |||
</math> | |||
where we used [[#LEM-9-1 |Lemma]] for the final estimate.}} | |||
For the second concentration theorem we need the following lemma. | |||
{{proofcard|Lemma|LEM-9-2|For <math>d\geqslant3</math> we have <math>\lambda^d(\Bbar_1(0))=\frac{2\pi}{d}\cdot\lambda^{d-2}(\Bbar_1(0))</math> where the unit balls are taken in <math>\mathbb{R}^d</math> and <math>\mathbb{R}^{d-2}</math>, respectively. | |||
|Since <math>\B_1(0)=\{(x_1,\dots,x_d)\in\mathbb{R}^d\:;\:x_1^2+\cdots+x_d^2\leqslant1\}</math> we may compute | |||
<math display="block"> | |||
\begin{align*} | |||
\lambda^d(\Bbar_1(0)) & = \int_{\Bbar_1(0)}1\dd\lambda^d\\ | |||
& = \int_{x_1^2+\cdots+x_d^2\leqslant1}1\dd\lambda^d(x_1,\dots,x_d)\\ | |||
& = \int_{x_1^2+x_2^2\leqslant1} \Big(\int_{x_3^2+\cdots+x_d^2\leqslant1-x_1^2-x_2^2}1\dd\lambda^{d-2}(x_3,\dots,x_d)\Big)d\lambda^2(x_1,x_2)\\ | |||
& = \int_{x_1^2+x_2^2\leqslant1} \lambda^{d-2}(\Bbar_{(1-x_1^2-x_2^2)^{1/2}}(0))\dd\lambda^2(x_1,x_2)\\ | |||
& = \int_{x_1^2+x_2^2\leqslant1}(1-x_1^2-x_2^2)^{(d-2)/2}\lambda^{d-2}(\Bbar_1(0))\dd\lambda^2(x_1,x_2)\\ | |||
& = \lambda^{d-2}\bigl(\Bbar_1(0)\bigr)\cdot\int_0^{2\pi}\int_0^1(1-r^2)^{(d-2)/2}\,r\dd r \dd\varphi\\ | |||
& = \lambda^{d-2}(\Bbar_1(0))\cdot\int_0^{2\pi} \dd\varphi \cdot \int_1^0u^{(d-2)/2} (-1/2) \dd u\\ | |||
& = \lambda^{d-2}(\Bbar_1(0)) \cdot 2\pi \cdot \int_0^1u^{d/2-1}/2 \dd u\\ | |||
& = \lambda^{d-2}(\Bbar_1(0)) \cdot \pi \cdot \medfrac{u^{d/2}}{d/2}\Big|_0^1 = \medfrac{2\pi}{d}\cdot\lambda^{d-2}(\Bbar_1(0)) | |||
\end{align*} | |||
</math> | |||
where we used Fubini's Theorem for the third and transformation into polar coordinates for the sixth equality.}} | |||
After we have seen that most of the volume of the unit ball is concentrated close to its surface we show in the next result that if we designate one point of the surface as the north pole, then most of the unit ball's volume will be concentrated at its waist. As with the surface concentration, this is not surprising, since if we go from the center in the direction of the north pole, then the ball becomes thinner in all of the <math>d-1</math> remaining dimensions. The precise result is as follows. | |||
{{proofcard|Theorem|WC-THM|(Waist Concentration Theorem) Let <math>d\geqslant3</math> and <math>\epsilon > 0</math>. Then we have | |||
<math display="block"> | |||
\lambda^d(W_{\epsilon})\geqslant\bigl(1-\medfrac{2}{\epsilon\sqrt{d-1}}\exp\bigl({-\medfrac{\epsilon^2(d-1)}{2}}\bigr)\bigr)\cdot\lambda^d(\Bbar_1(0)) | |||
</math> | |||
where <math>W_{\epsilon}=\bigl\{(x_1,\dots,x_d)\in \Bbar_1(0)\:;\:|x_1|\leqslant\epsilon\bigr\}</math>. | |||
<div class{{=}}"d-flex justify-content-center"> | |||
[[File:tikz62fda0be.png | 400px | thumb | Most of the volume is located close to the equator of the unit ball. ]] | |||
</div>|We first observe that the statement is trivial for <math>\epsilon\geqslant1</math>, so we may assume <math>\epsilon < 1</math> for the rest of the proof. We now compute the volume of that part | |||
<math display="block"> | |||
H^{\delta} = \{(x_1,\dots,x_d)\in \Bbar_1(0)\:;\:x_1\geqslant\delta\} | |||
</math> | |||
of the unit ball that is located above <math>\delta\geqslant0</math> in <math>x_1</math>-direction. We observe that for <math>x_1\in[\delta,1]</math> the corresponding section of <math>H^{\delta}</math> equals | |||
<math display="block"> | |||
\begin{align*} | |||
H_{x_1}^{\delta} &=\bigl\{(x_2,\dots,x_d)\in\mathbb{R}^{d-1}\:;\:(x_1,x_2,\dots,x_d)\in H^{\delta}\bigr\}\\ | |||
& = \bigl\{(x_2,\dots,x_d)\in\mathbb{R}^{d-1}\:;\:\|(x_1,x_2,\dots,x_d)\|\leqslant1\bigr\}\\ | |||
& =\bigl\{y\in\mathbb{R}^{d-1}\:;\:x_1^2+\|y\|^2\leqslant1\bigr\}\\ | |||
& = \Bbar_{(1-x_1^2)^{1/2}}(0) | |||
\end{align*} | |||
</math> | |||
where the ball is taken in <math>\mathbb{R}^{d-1}</math>, compare the picture below. | |||
<div class{{=}}"d-flex justify-content-center"> | |||
[[File:tikz53fe5393.png | 400px | thumb | Slice of the unit ball <math>\Bbar_1(0)</math>. ]] | |||
</div> | |||
Now we employ Cavalieri's Principle and get | |||
<span id{{=}}"Caval-1"/> | |||
<math display="block"> | |||
\begin{equation}\label{Caval-1} | |||
\begin{aligned} | |||
\lambda^d\bigl(H^{\delta}\bigr) & = \int_{\delta}^1\lambda^{d-1}(H^{\delta}_{x_1})\dd\lambda^1(x_1)\\ | |||
& = \int_{\delta}^1\lambda^{d-1}(\Bbar_{(1-x_1^2)^{1/2}}(0))\dd\lambda^1(x_1)\\ | |||
& = \int_{\delta}^1(1-x_1^2)^{(d-1)/2}\lambda^{d-1}(\Bbar_1(0))\dd\lambda^1(x_1)\\ | |||
& = \lambda^{d-1}(\Bbar_1(0))\cdot\int_{\delta}^1(1-x_1^2)^{(d-1)/2}\dd\lambda^1(x_1). | |||
\end{aligned} | |||
\end{equation} | |||
</math> | |||
Next we estimate the integral at the end of \eqref{Caval-1} in two ways. Firstly, we put <math>\delta=0</math> and observe that <math>0\leqslant x_1\leqslant1/\sqrt{d-1}</math> implies <math>x_1^2\leqslant1/(d-1)</math>. This in turn leads to <math>1-1/(d-1)\leqslant 1-x_1^2</math> which implies <math>(1-1/(d-1))^{(d-1)/2}\leqslant (1-x_1^2)^{(d-1)/2}</math>. It follows | |||
<span id{{=}}"Caval-2"/> | |||
<math display="block"> | |||
\begin{equation}\label{Caval-2} | |||
\begin{aligned} | |||
\int_{0}^1(1-x_1^2)^{(d-1)/2}\dd\lambda^1(x_1) & \geqslant \int_{0}^{1/\sqrt{d-1}}(1-x_1^2)^{(d-1)/2}\dd\lambda^1(x_1)\\ | |||
& \geqslant \int_{0}^{1/\sqrt{d-1}}(1-1/(d-1))^{(d-1)/2}\dd\lambda^1(x_1)\\ | |||
& = (1-1/(d-1))^{(d-1)/2}\cdot\medfrac{1}{\sqrt{d-1}}\\ | |||
& \geqslant \left(1-\smallfrac{d-1}{2}\cdot\smallfrac{1}{d-1}\right)\cdot\medfrac{1}{\sqrt{d-1}}\\ | |||
& = \medfrac{1}{2\sqrt{d-1}} | |||
\end{aligned} | |||
\end{equation} | |||
</math> | |||
where we used Bernoulli's inequality in the last estimate as <math>(d-1)/2\geqslant1</math> holds due to our assumption <math>d\geqslant3</math>. Secondly, we estimate the integral for <math>\delta=\epsilon > 0</math> from above. We employ <math>1-x_1^2\leqslant \exp(-x_1^2)</math> from [[#LEM-9-2 |Lemma]] and <math>x_1/\epsilon\geqslant1</math> to get | |||
<span id{{=}}"Caval-3"/> | |||
<math display="block"> | |||
\begin{equation}\label{Caval-3} | |||
\begin{aligned} | |||
\int_{\epsilon}^1(1-x_1^2)^{(d-1)/2}\dd\lambda^1(x_1) & \leqslant \int_{\epsilon}^1(\exp(-x_1^2))^{(d-1)/2}\,(x_1/\epsilon)\,\dd\lambda^1(x_1)\\ | |||
& = \medfrac{1}{\epsilon} \cdot \int_{\epsilon}^1x_1\exp(-{\textstyle\frac{d-1}{2}}x_1^2)\dd\lambda^1(x_1)\\ | |||
& = \medfrac{1}{\epsilon}\cdot\frac{\exp({\textstyle-\frac{d-1}{2}})- \exp({\textstyle-\frac{\epsilon^2(d-1)}{2}})}{-(d-1)}\\ | |||
& \leqslant \frac{\exp({\textstyle-\frac{\epsilon^2(d-1)}{2}})}{\epsilon(d-1)}. | |||
\end{aligned} | |||
\end{equation} | |||
</math> | |||
Combining \eqref{Caval-1}--\eqref{Caval-3} we obtain | |||
<math display="block"> | |||
\lambda^d(H^0)\geqslant\frac{\lambda^{d-1}(\Bbar_1(0))}{2\sqrt{d-1}} \; \text{ and } \; \lambda^d(H^{\epsilon})\leqslant \lambda^{d-1}(\Bbar_1(0)) \frac{\exp({\textstyle-\frac{\epsilon^2(d-1)}{2}})}{\epsilon(d-1)}. | |||
</math> | |||
To finish the proof we observe that <math>H^0</math> is the upper half of the unit ball and <math>H^0\backslash{}H^{\epsilon}</math> is the upper half of <math>W_{\epsilon}</math>. Therefore we derive from the above | |||
<math display="block"> | |||
\begin{align*} | |||
\frac{\lambda^d(W_{\epsilon})}{\lambda^d(\Bbar_1(0))} & = \frac{2\cdot\lambda^d(H^0\backslash{}H^{\epsilon})}{2\cdot\lambda^d(H^0)} = \frac{\lambda^d(H^0)-\lambda^d(H^{\epsilon})}{\lambda^d(H^0)} = 1-\frac{\lambda^d(H^{\epsilon})}{\lambda^d(H^{0})}\\ | |||
& \geqslant 1-\frac{2\sqrt{d-1}\exp({\textstyle-\frac{\epsilon^2(d-1)}{2}})}{\epsilon(d-1)} = 1-\smallfrac{2}{\epsilon\sqrt{d-1}}\exp\bigl({-\smallfrac{\epsilon^2(d-1)}{2}}\bigr) | |||
\end{align*} | |||
</math> | |||
as desired.}} | |||
The estimate in [[#WC-THM |Theorem]] is only of value if the factor on the right hand side is strictly positive. The following lemma explains when this happens. | |||
{{proofcard|Lemma|WC-LEM|There exists a unique <math>a_0 > 0</math> such that <math>\frac{2}{a}\exp(-\frac{a^2}{2}) < 1</math> holds for every <math>a > a_0</math>. We have <math>a_0\in(1,2)</math>. | |||
|The function <math>f\colon(0,\infty)\rightarrow\mathbb{R}</math>, <math>f(a)=\frac{2}{a}\exp(-\frac{a^2}{2})</math> is strictly decreasing as both of its factors are strictly decreasing and strictly positive. We have <math>f(1)=2/\sqrt{e} > 1</math> and <math>f(2)=1/\e^2 < 1</math>. Consequently there is <math>a_0 > 0</math> such that <math>f(a_0)=1</math> and necessarily <math>1 < a_0 < 1</math>. For <math>a > a_0</math> we get <math>f(a) < 1</math> and for <math>a < a_0</math> we get <math>f(a) > 1</math>, so <math>a_0</math> is unique.}} | |||
Substituting <math>a=\epsilon\sqrt{d-1}</math> shows that the factor in [[#WC-THM |Theorem]] is strictly positive if and only if <math>\epsilon > a_0/\sqrt{d-1}</math> holds with some <math>a_0\in(1,2)</math>, compare [[exercise:1cae9d071a |Problem]]. That is, as bigger as we choose the dimension, the smaller we can make <math>\epsilon > 0</math>. | |||
We however ''cannot'' let <math>\epsilon</math> tend to zero for a fixed dimension <math>d</math>. The interpretation of [[#WC-THM |Theorem]], i.e., that <math>1-\frac{2}{\epsilon\sqrt{d-1}}\exp({-\frac{\epsilon^2(d-1)}{2}})</math> fraction of the volume of the <math>d</math>-dimensional unit ball is located <math>\epsilon</math>-close to the equator, is only valid if <math>\epsilon</math> is not too small. Although this might right now be fairly obvious, we point out that in contrast to the ‘typical epsilontics’, <math>\epsilon > 0</math> cannot be thought of being arbitrarily small, but what we can choose for it depends on the space's dimension to be large enough. | |||
For later use we note the following reformulation of [[#WC-THM |Theorem]]. | |||
{{proofcard|Corollary|WC-COR|Let <math>d\geqslant3</math> and <math>\epsilon > 0</math>. Then we have | |||
<math display="block"> | |||
\frac{\lambda^d(\{(x_1,\dots,x_d)\in\Bbar_1(0)\:;\:|x_1| > \epsilon\})}{\lambda^d(\Bbar_1(0))} \leqslant\smallfrac{2}{\epsilon\sqrt{d-1}}\exp\bigl({-\smallfrac{\epsilon^2(d-1)}{2}}\bigr) | |||
</math> | |||
for every <math>\epsilon > 0</math>.|In the notation of [[#WC-THM |Theorem]] we have | |||
<math display="block"> | |||
\begin{align*} | |||
\frac{\lambda^d(\{(x_1,\dots,x_d)\in\Bbar_1(0)\:;\:|x_1| > \epsilon\})}{\lambda^d(\Bbar_1(0))} &= \frac{\lambda^d(\Bbar_1(0)\backslash W_{\epsilon})}{\lambda^d(\Bbar_1(0))}= \frac{\lambda^d(\Bbar_1(0))-\lambda^d(W_{\epsilon})}{\lambda^d(\Bbar_1(0))}\\ | |||
&\leqslant 1 - \Bigr( 1- \smallfrac{2}{\epsilon\sqrt{d-1}}\exp\bigl({-\smallfrac{\epsilon^2(d-1)}{2}}\bigr)\Bigl)\\ | |||
& = \smallfrac{2}{\epsilon\sqrt{d-1}}\exp\bigl(-\smallfrac{\epsilon^2(d-1)}{2}\bigr) | |||
\end{align*} | |||
</math> | |||
as claimed.}} | |||
Now we will use the volume estimates from above in order to establish where points that we draw at random from the <math>d</math>-dimensional unit ball will be located and what mutual angles between the points we can expect. To make our proofs formal, we will employ in the rest of the chapter a probability space <math>(\Omega,\Sigma,\P)</math> and random variables <math>X\colon\Omega\rightarrow\mathbb{R}^d</math> with distribution <math>X\sim\mathcal{U}(\Bbar_1(0))</math>, compare [[guide:4ed4d3330a#DFN-DISTR |Definition]]. | |||
{{proofcard|Theorem|UA-THM|Let <math>d\geqslant3</math> and assume that we draw <math>n\geqslant2</math> points <math>x^{\scriptscriptstyle(1)},\dots,x^{\scriptscriptstyle(n)}</math> at random from the <math>d</math>-dimensional unit ball. Then | |||
<math display="block"> | |||
\P\bigl[\|x^{\scriptscriptstyle(j)}\|\geqslant 1-\medfrac{2\ln n}{d} \text{ for all }j=1,\dots,n\bigr]\geqslant 1-\medfrac{1}{n} | |||
</math> | |||
holds. | |||
|In view of the Surface Concentration Theorem, most of the unit ball's volume is located close to its surface and thus a point chosen uniformly at random is likely to be close to the surface. For a formal proof, we put <math>\epsilon=\frac{2\ln n}{d}</math> and consider a random vector <math>X\sim\mathcal{U}(\Bbar_1(0))</math>. For fixed <math>1\leqslant j\leqslant n</math> we employ [[#SC-THM |Theorem]] to get | |||
<math display="block"> | |||
\P\bigl[\|X^{\scriptscriptstyle(j)}\| < 1-\epsilon\bigr]=\frac{\lambda^d(\B_{1-\epsilon}(0))}{\lambda^d(\Bbar_{1}(0))}\leqslant\exp(-\epsilon{}d)=\exp(-2\ln n)=1/n^2. | |||
</math> | |||
This implies | |||
<math display="block"> | |||
\begin{align*} | |||
\P\bigl[\forall\:j\colon\|X^{\scriptscriptstyle(j)}\|\geqslant 1-\epsilon\bigr] &=1-\P\bigl[\exists\:j\colon\|X^{\scriptscriptstyle(j)}\|\leqslant 1-\epsilon\bigr]\\ | |||
&= 1-\Bigsum{j=1}{n}1/n^2=1-\medfrac{1}{n} | |||
\end{align*} | |||
</math> | |||
as claimed.}} | |||
Although we get a trivial estimate if we sub <math>n=1</math> into the Theorem, its proof showed that | |||
<math display="block"> | |||
\P\bigl[\|x\|\geqslant 1-\epsilon\bigr] \geqslant 1-\exp(-\epsilon d) | |||
</math> | |||
holds, so if <math>d</math> is large enough and <math>\epsilon > 0</math> is not too small, a single point chosen at random will be <math>\epsilon</math>--close to the surface of <math>\Bbar_1(0)</math> with a high probability. We continue by studying angles between points. | |||
{{proofcard|Theorem|UO-THM|Let <math>d\geqslant3</math> and assume that we draw <math>n\geqslant2</math> points <math>x^{\scriptscriptstyle(1)},\dots,x^{\scriptscriptstyle(n)}</math> uniformly at random from the <math>d</math>-dimensional unit ball. Then | |||
<math display="block"> | |||
\P\bigl[\bigl|\langle{}x^{\scriptscriptstyle(j)},x^{\scriptscriptstyle(k)}\rangle{}\bigr|\leqslant\medfrac{\sqrt{6\ln n}}{\sqrt{d-1}} \text{ for all }j\not=k\bigr]\geqslant 1-\medfrac{1}{n} | |||
</math> | |||
holds. | |||
|We first consider the case of two points <math>x, y\in\Bbar_1(0)</math>. As we are interested only in their scalar product it should make no difference if we draw both points at random or if we treat the first point's direction as fixed (but the length still chosen at random) and draw only the second one at random. Since <math>\lambda^d(\{0\})=0</math>, the first points will be non-zero with probability one and we can thus arrange a new coordinate system, in which the unit ball is still the same, but in which our first points is directed towards the (new) north pole of the ball. In these coordinates we can then apply the waist concentration theorem to see that the other point is likely to be located close to the new equator. This means in turn that the scalar product of both points will be close to zero. | |||
<div class{{=}}"d-flex justify-content-center"> | |||
[[File:tikz6e5b4b9b.png | 400px | thumb | We apply he Waist Concentration Theorem\\with respect to the new north pole. ]] | |||
</div> | |||
To make the above idea formal, we put <math>\epsilon=\sqrt{6\ln n}/\sqrt{d-1}</math> and consider two random variables <math>X,Y\colon\Omega\rightarrow\mathbb{R}^d</math> with <math>X,Y\sim\mathcal{U}(\Bbar_1(0))</math>. Then we can find for each <math>\omega_1\in\Omega</math> with <math>X(\omega_1)\not=0</math> an orthogonal linear map <math>T(\omega)\colon\mathbb{R}^d\rightarrow\mathbb{R}^d</math> with <math>T(\omega_1)X(\omega_1)=(\|X(\omega_1)\|,0,\dots,0)</math>. Indeed, we can put <math>f^{\scriptscriptstyle(1)}=X(\omega)/\|X(\omega)\|</math> and extend to an orthonormal basis <math>\mathcal{F}=\{ f^{\scriptscriptstyle(1)}, f^{\scriptscriptstyle(2)},\dots,f^{\scriptscriptstyle(d)}\}</math> by first applying the basis extension theorem and then the Gram-Schmidt procedure. The map <math>T(\omega_1)</math> is then defined as the linear extension of <math>T(\omega_1)f^{\scriptscriptstyle(j)}:=e^{\scriptscriptstyle(j)}</math> where the <math>e^{\scriptscriptstyle(j)}</math> are the standard unit vectors. We obtain <math>T(\omega_1)X(\omega_1)=T(\omega_1)(\|X(\omega_1)\|f^{\scriptscriptstyle(1)})=\|X(\omega_1)\|e^{\scriptscriptstyle(1)}</math> and thus | |||
<math display="block"> | |||
|\langle{}X(\omega_1),Y(\omega_2)\rangle{}| = |\langle{}T(\omega_1)X(\omega_1),T(\omega_1)Y(\omega_2)\rangle{}| = \|X(\omega_1)\|\cdot|\langle{}e^{\scriptscriptstyle(1)},T(\omega_1)Y(\omega_2)\rangle{}| | |||
</math> | |||
for <math>(\omega_1,\omega_2)\in\Omega\times\Omega</math> with <math>X(\omega_1)\not=0</math>. We observe that <math>T(\omega_1)Y\sim\mathcal{U}(\Bbar_1(0))</math> holds for each fixed <math>\omega_1</math> and that <math>\langle{}e^{\scriptscriptstyle(1)},T(\omega_1)Y(\omega_2)\rangle{}</math> is the first coordinate of <math>T(\omega_1)Y(\omega_2)</math>. Therefore we can apply [[#WC-COR |Corollary]] to compute | |||
<math display="block"> | |||
\begin{align*} | |||
\P\bigl[|\langle{}e^{\scriptscriptstyle(1)},T(\omega_1)Y(\omega_2)\rangle| > \epsilon\bigr] & = \frac{\lambda^d(\{(z_1,\dots,z_d)\in\Bbar_1(0)\:;\:|z_1| > \epsilon\})}{\lambda^d(\Bbar_1(0))}\\ | |||
&\leqslant \medfrac{2}{\epsilon\sqrt{d-1}}\exp\bigl(-\medfrac{\epsilon^2(d-1)}{2}\bigr)\\ | |||
& = \medfrac{2}{\sqrt{6\ln n}}\exp\bigl(-\medfrac{6\ln n}{2}\bigr)\\ | |||
\phantom{\int}& \leqslant \medfrac{2}{n^3} | |||
\end{align*} | |||
</math> | |||
where the last estimate holds as <math>1/\sqrt{6\ln n}\leqslant1</math> is valid for <math>n\geqslant2</math>. After observing that in the first line below necessarily <math>X(\omega_1)\not=0</math> holds and by using <math>\|X(\omega_1)\|\leqslant 1</math>, we get | |||
<math display="block"> | |||
\begin{align*} | |||
\P\bigl[\bigl|\langle{}X,Y\rangle{}\bigr| > \epsilon\bigr] & = \P\bigl(\{(\omega_1,\omega_2)\in\Omega^2\:;\:\bigl|\bigl\langle{}X(\omega_1),Y(\omega_2)\bigr\rangle{}\bigr| > \epsilon\}\bigr)\\ | |||
\phantom{\int}& =\P\bigl(\{(\omega_1,\omega_2)\in\Omega^2\:;\:\|X(\omega)\|\cdot\bigl|\bigl\langle{}e^{\scriptscriptstyle(1)},T(\omega)Y(\omega)\bigr\rangle{}\bigr| > \epsilon\}\bigr)\\ | |||
\phantom{\int}& \leqslant\P\bigl(\{(\omega_1,\omega_2)\in\Omega^2\:;\:\bigl|\bigl\langle{}e^{\scriptscriptstyle(1)},T(\omega_1)Y(\omega_2)\bigr\rangle{}\bigr| > \epsilon\}\bigr)\\ | |||
& = \int_{\Omega} \P\bigl[\bigl|((T(\omega_1)Y(\omega_2))_1\bigr| > \epsilon\bigr]\dd\P(\omega_1)\\ | |||
&\leqslant \int_{\Omega}1/n^3\dd\P = \medfrac{1}{n^3} | |||
\end{align*} | |||
</math> | |||
where we employed Cavalieri's Principle and the estimates from above. Let finally, random vectors <math>X^{\scriptscriptstyle(1)},\dots,X^{\scriptscriptstyle(n)}\sim\mathcal{U}(\Bbar_1(0))</math> be given. Then | |||
<math display="block"> | |||
\begin{align*} | |||
\P\bigl[\forall\:j\not=k\colon|\langle{}X^{\scriptscriptstyle(j)},X^{\scriptscriptstyle(k)}\rangle|\leqslant\epsilon\bigr] & = 1- \P\bigl[\exists\:j\not=k\colon\bigl|\langle{}X^{\scriptscriptstyle(j)},X^{\scriptscriptstyle(k)}\rangle\bigr| > \epsilon \bigr]\\ | |||
& = 1-{\textstyle{n\choose2}}\P\bigl[\bigl|\langle{}X,Y\rangle{}\bigr| > \epsilon\bigr]\\ | |||
&\geqslant 1-\medfrac{n^2-n}{2}\cdot\medfrac{2}{n^3}\\ | |||
&\geqslant 1-\medfrac{1}{n} | |||
\end{align*} | |||
</math> | |||
as desired.}} | |||
If we apply the estimate of [[#UO-THM |Theorem]] for only two points <math>x</math>, <math>y\in\Bbar_1(0)</math> we get only a probability of greater or equal to <math>1/2</math> for <math>\langle{}x,y\rangle{}</math> being smaller than <math>\sqrt{6\ln 2}/\sqrt{d-1}</math>. Going through the proof of of the theorem yields however that | |||
<math display="block"> | |||
\P\bigl[|\langle{}x,y\rangle{}|\leqslant\epsilon\bigr]\geqslant 1-\medfrac{2}{\epsilon\sqrt{d-1}}\exp\bigl(-\medfrac{\epsilon^2(d-1)}{2}\bigr) | |||
</math> | |||
from where we can see that for fixed <math>\epsilon > 0</math> and <math>d</math> large enough the probability for the scalar product to be <math>\epsilon</math>--small will be close to one which means that two points picked at random from the high dimensional unit ball will be almost orthogonal with high probability. | |||
We remark here, that when interpreting the scalar product as a measure for how close to orthogonal two vectors are, one usually does normalize the vectors first. In the above result we did not do so, since, by [[#UA-THM |Theorem]] we know that the points considered in [[#UO-THM |Theorem]] will have norm close to one with high probability, anyway. We leave a formal proof as [[exercise:8dbb0c4f40 |Problem]]. | |||
==General references== | |||
{{cite arXiv|last1=Wegner|first1=Sven-Ake|year=2024|title=Lecture Notes on High-Dimensional Data|eprint=2101.05841|class=math.FA}} |
Latest revision as of 00:05, 2 June 2024
In this chapter we denote by [math]\lambda^d[/math] the [math]d[/math]-dimensional Lebesgue measure on [math]\mathbb{R}^d[/math]. The euclidean norm we denote by [math]\|\cdot\|[/math] and balls with respect to this norm are denoted by [math]\B_r(x)[/math] and [math]\Bbar_r(x)[/math] respectively.
We start with the following observation. Let [math]A\subseteq\mathbb{R}^d[/math] be a measurable star-shaped set and assume that zero is a center. Then [math](1-\epsilon)A\subseteq A[/math] and we can interpret [math]A\backslash(1-\epsilon)A[/math] as that part of [math]A[/math] that is close to its surface.
If we compute the measure of the latter set, then we get
and thus
which will be close to one if [math]\epsilon[/math] is small and [math]d[/math] is large. This means that in this case the measure of [math]A[/math] is located close to the surface of [math]A[/math] whereas its center contributes only little to its volume. Our first aim in this section is to quantify the latter observation in the case that [math]A[/math] is the unit ball.
For [math]d\geqslant 1[/math] and [math]0 \lt \epsilon\leqslant1[/math] we have [math](1-\epsilon)^d\leqslant \exp(-\epsilon d)[/math].
Define [math]f\colon[0,1]\rightarrow\mathbb{R}[/math] via [math]f(x)=\exp(-x)+x-1[/math]. Then [math]f(0)=0[/math] and [math]f'(x)=1-\exp(-x)\geqslant0[/math] holds for all [math]x\in[0,1][/math]. Therefore [math]f(x)\geqslant0[/math] and thus [math]\exp(-x)\geqslant1-x\geqslant0[/math] holds for all [math]x\in[0,1][/math]. Evaluating at [math]x=\epsilon[/math] and taking the [math]d[/math]--th power finishes the proof.
Using the lemma we can now prove that most of the unit ball's volume is located close to its surface. This is very reasonable, since if we think of the ball consisting of many thin shells all of the same depth but with different radii, then the outer ones will have (in high dimensions significantly!) larger volume then the inner ones.
(Surface Concentration Theorem) Let [math]d\geqslant1[/math] and [math]0 \lt \epsilon\leqslant1[/math]. Then we have
We compute
For the second concentration theorem we need the following lemma.
For [math]d\geqslant3[/math] we have [math]\lambda^d(\Bbar_1(0))=\frac{2\pi}{d}\cdot\lambda^{d-2}(\Bbar_1(0))[/math] where the unit balls are taken in [math]\mathbb{R}^d[/math] and [math]\mathbb{R}^{d-2}[/math], respectively.
Since [math]\B_1(0)=\{(x_1,\dots,x_d)\in\mathbb{R}^d\:;\:x_1^2+\cdots+x_d^2\leqslant1\}[/math] we may compute
After we have seen that most of the volume of the unit ball is concentrated close to its surface we show in the next result that if we designate one point of the surface as the north pole, then most of the unit ball's volume will be concentrated at its waist. As with the surface concentration, this is not surprising, since if we go from the center in the direction of the north pole, then the ball becomes thinner in all of the [math]d-1[/math] remaining dimensions. The precise result is as follows.
(Waist Concentration Theorem) Let [math]d\geqslant3[/math] and [math]\epsilon \gt 0[/math]. Then we have
We first observe that the statement is trivial for [math]\epsilon\geqslant1[/math], so we may assume [math]\epsilon \lt 1[/math] for the rest of the proof. We now compute the volume of that part
Now we employ Cavalieri's Principle and get
Next we estimate the integral at the end of \eqref{Caval-1} in two ways. Firstly, we put [math]\delta=0[/math] and observe that [math]0\leqslant x_1\leqslant1/\sqrt{d-1}[/math] implies [math]x_1^2\leqslant1/(d-1)[/math]. This in turn leads to [math]1-1/(d-1)\leqslant 1-x_1^2[/math] which implies [math](1-1/(d-1))^{(d-1)/2}\leqslant (1-x_1^2)^{(d-1)/2}[/math]. It follows
Combining \eqref{Caval-1}--\eqref{Caval-3} we obtain
To finish the proof we observe that [math]H^0[/math] is the upper half of the unit ball and [math]H^0\backslash{}H^{\epsilon}[/math] is the upper half of [math]W_{\epsilon}[/math]. Therefore we derive from the above
The estimate in Theorem is only of value if the factor on the right hand side is strictly positive. The following lemma explains when this happens.
There exists a unique [math]a_0 \gt 0[/math] such that [math]\frac{2}{a}\exp(-\frac{a^2}{2}) \lt 1[/math] holds for every [math]a \gt a_0[/math]. We have [math]a_0\in(1,2)[/math].
The function [math]f\colon(0,\infty)\rightarrow\mathbb{R}[/math], [math]f(a)=\frac{2}{a}\exp(-\frac{a^2}{2})[/math] is strictly decreasing as both of its factors are strictly decreasing and strictly positive. We have [math]f(1)=2/\sqrt{e} \gt 1[/math] and [math]f(2)=1/\e^2 \lt 1[/math]. Consequently there is [math]a_0 \gt 0[/math] such that [math]f(a_0)=1[/math] and necessarily [math]1 \lt a_0 \lt 1[/math]. For [math]a \gt a_0[/math] we get [math]f(a) \lt 1[/math] and for [math]a \lt a_0[/math] we get [math]f(a) \gt 1[/math], so [math]a_0[/math] is unique.
Substituting [math]a=\epsilon\sqrt{d-1}[/math] shows that the factor in Theorem is strictly positive if and only if [math]\epsilon \gt a_0/\sqrt{d-1}[/math] holds with some [math]a_0\in(1,2)[/math], compare Problem. That is, as bigger as we choose the dimension, the smaller we can make [math]\epsilon \gt 0[/math].
We however cannot let [math]\epsilon[/math] tend to zero for a fixed dimension [math]d[/math]. The interpretation of Theorem, i.e., that [math]1-\frac{2}{\epsilon\sqrt{d-1}}\exp({-\frac{\epsilon^2(d-1)}{2}})[/math] fraction of the volume of the [math]d[/math]-dimensional unit ball is located [math]\epsilon[/math]-close to the equator, is only valid if [math]\epsilon[/math] is not too small. Although this might right now be fairly obvious, we point out that in contrast to the ‘typical epsilontics’, [math]\epsilon \gt 0[/math] cannot be thought of being arbitrarily small, but what we can choose for it depends on the space's dimension to be large enough.
For later use we note the following reformulation of Theorem.
Let [math]d\geqslant3[/math] and [math]\epsilon \gt 0[/math]. Then we have
In the notation of Theorem we have
Now we will use the volume estimates from above in order to establish where points that we draw at random from the [math]d[/math]-dimensional unit ball will be located and what mutual angles between the points we can expect. To make our proofs formal, we will employ in the rest of the chapter a probability space [math](\Omega,\Sigma,\P)[/math] and random variables [math]X\colon\Omega\rightarrow\mathbb{R}^d[/math] with distribution [math]X\sim\mathcal{U}(\Bbar_1(0))[/math], compare Definition.
Let [math]d\geqslant3[/math] and assume that we draw [math]n\geqslant2[/math] points [math]x^{\scriptscriptstyle(1)},\dots,x^{\scriptscriptstyle(n)}[/math] at random from the [math]d[/math]-dimensional unit ball. Then
In view of the Surface Concentration Theorem, most of the unit ball's volume is located close to its surface and thus a point chosen uniformly at random is likely to be close to the surface. For a formal proof, we put [math]\epsilon=\frac{2\ln n}{d}[/math] and consider a random vector [math]X\sim\mathcal{U}(\Bbar_1(0))[/math]. For fixed [math]1\leqslant j\leqslant n[/math] we employ Theorem to get
Although we get a trivial estimate if we sub [math]n=1[/math] into the Theorem, its proof showed that
holds, so if [math]d[/math] is large enough and [math]\epsilon \gt 0[/math] is not too small, a single point chosen at random will be [math]\epsilon[/math]--close to the surface of [math]\Bbar_1(0)[/math] with a high probability. We continue by studying angles between points.
Let [math]d\geqslant3[/math] and assume that we draw [math]n\geqslant2[/math] points [math]x^{\scriptscriptstyle(1)},\dots,x^{\scriptscriptstyle(n)}[/math] uniformly at random from the [math]d[/math]-dimensional unit ball. Then
We first consider the case of two points [math]x, y\in\Bbar_1(0)[/math]. As we are interested only in their scalar product it should make no difference if we draw both points at random or if we treat the first point's direction as fixed (but the length still chosen at random) and draw only the second one at random. Since [math]\lambda^d(\{0\})=0[/math], the first points will be non-zero with probability one and we can thus arrange a new coordinate system, in which the unit ball is still the same, but in which our first points is directed towards the (new) north pole of the ball. In these coordinates we can then apply the waist concentration theorem to see that the other point is likely to be located close to the new equator. This means in turn that the scalar product of both points will be close to zero.
To make the above idea formal, we put [math]\epsilon=\sqrt{6\ln n}/\sqrt{d-1}[/math] and consider two random variables [math]X,Y\colon\Omega\rightarrow\mathbb{R}^d[/math] with [math]X,Y\sim\mathcal{U}(\Bbar_1(0))[/math]. Then we can find for each [math]\omega_1\in\Omega[/math] with [math]X(\omega_1)\not=0[/math] an orthogonal linear map [math]T(\omega)\colon\mathbb{R}^d\rightarrow\mathbb{R}^d[/math] with [math]T(\omega_1)X(\omega_1)=(\|X(\omega_1)\|,0,\dots,0)[/math]. Indeed, we can put [math]f^{\scriptscriptstyle(1)}=X(\omega)/\|X(\omega)\|[/math] and extend to an orthonormal basis [math]\mathcal{F}=\{ f^{\scriptscriptstyle(1)}, f^{\scriptscriptstyle(2)},\dots,f^{\scriptscriptstyle(d)}\}[/math] by first applying the basis extension theorem and then the Gram-Schmidt procedure. The map [math]T(\omega_1)[/math] is then defined as the linear extension of [math]T(\omega_1)f^{\scriptscriptstyle(j)}:=e^{\scriptscriptstyle(j)}[/math] where the [math]e^{\scriptscriptstyle(j)}[/math] are the standard unit vectors. We obtain [math]T(\omega_1)X(\omega_1)=T(\omega_1)(\|X(\omega_1)\|f^{\scriptscriptstyle(1)})=\|X(\omega_1)\|e^{\scriptscriptstyle(1)}[/math] and thus
If we apply the estimate of Theorem for only two points [math]x[/math], [math]y\in\Bbar_1(0)[/math] we get only a probability of greater or equal to [math]1/2[/math] for [math]\langle{}x,y\rangle{}[/math] being smaller than [math]\sqrt{6\ln 2}/\sqrt{d-1}[/math]. Going through the proof of of the theorem yields however that
from where we can see that for fixed [math]\epsilon \gt 0[/math] and [math]d[/math] large enough the probability for the scalar product to be [math]\epsilon[/math]--small will be close to one which means that two points picked at random from the high dimensional unit ball will be almost orthogonal with high probability.
We remark here, that when interpreting the scalar product as a measure for how close to orthogonal two vectors are, one usually does normalize the vectors first. In the above result we did not do so, since, by Theorem we know that the points considered in Theorem will have norm close to one with high probability, anyway. We leave a formal proof as Problem.
General references
Wegner, Sven-Ake (2024). "Lecture Notes on High-Dimensional Data". arXiv:2101.05841 [math.FA].