guide:B846f441d7: Difference between revisions

From Stochiki
No edit summary
 
mNo edit summary
 
(3 intermediate revisions by 2 users not shown)
Line 1: Line 1:
<div class="d-none"><math>
\newcommand{\smallfrac}[2]{\frac{#1}{#2}}
\newcommand{\medfrac}[2]{\frac{#1}{#2}}
\newcommand{\textfrac}[2]{\frac{#1}{#2}}
\newcommand{\tr}{\operatorname{tr}}
\newcommand{\e}{\operatorname{e}}
\newcommand{\B}{\operatorname{B}}
\newcommand{\Bbar}{\overline{\operatorname{B}}}
\newcommand{\pr}{\operatorname{pr}}
\newcommand{\dd}{\operatorname{d}\hspace{-1pt}}
\newcommand{\E}{\operatorname{E}}
\newcommand{\V}{\operatorname{V}}
\newcommand{\Cov}{\operatorname{Cov}}
\newcommand{\Bigsum}[2]{\mathop{\textstyle\sum}_{#1}^{#2}}
\newcommand{\ran}{\operatorname{ran}}
\newcommand{\card}{\#}
\renewcommand{\P}{\operatorname{P}}
\renewcommand{\L}{\operatorname{L}}
\newcommand{\mathds}{\mathbb}</math></div>


We start this chapter by explaining a general technique to obtain bounds  for <math>\P\bigl[Y_1+\cdots+Y_d\geqslant a\bigr]</math> where the <math>Y_i</math> are independent random variables. A bound that arises from this ‘recipe’ is often referred to as a Chernoff bound. The basic idea is given in the following [[#CHERNOFF-RECEIPE |Lemma]] which establishes the ‘generic’ Chernoff bound.
{{proofcard|Lemma|CHERNOFF-RECEIPE|Let <math>(\Omega,\Sigma,\P)</math> be a probability space and let <math>Y_1,\dots,Y_n\colon\Omega\rightarrow\mathbb{R}</math> be mutually independent random variables. Then
<math display="block">
\P\bigl[Y_1+\cdots+Y_n\geqslant a\bigr]\leqslant \inf_{t > 0}\exp(-ta)\prod_{i=1}^n\E(\exp(tY_i))
</math>
holds for every <math>a > 0</math>.
|For fixed <math>t > 0</math> the function <math>\exp(t\cdot -)</math> is increasing. We put <math>Y=Y_1+\cdots+Y_n</math> and get
<math display="block">
\P[Y\geqslant a] = \P[\exp(tY)\geqslant \exp(ta)]\leqslant\frac{\E(\exp(tY))}{\exp(ta)}
</math>
by applying the Markov bound. Now we take the infimum over <math>t > 0</math> and employ the independence to get
<math display="block">
\P[Y\geqslant a]\leqslant \inf_{t > 0}\exp(-ta)\E(\exp(t\Bigsum{i=1}{n} tY_i))=\inf_{t > 0}\exp(-ta)\prod_{i=1}^n\E(\exp(tY_i))
</math>
as desired.}}
The classical Chernoff bound recipe is now to compute or estimate the right hand side by exploiting additional information on the <math>Y_i</math> that is given in concrete situations. Readers unfamiliar with Chernoff bounds can find a classical example on how the recipe works in [[exercise:Dfadfac430 |Problem]] where the <math>Y_i</math> are independent Bernoulli random variables. In [[#MTB-THM |Theorem]] below the additional knowledge comes from the assumption that the moments of each <math>Y_i</math> grow at most like factorial.
{{proofcard|Theorem|MTB-THM|Let <math>(\Omega,\Sigma,\P)</math> be a probability space and let <math>Y_1,\dots,Y_d\colon\Omega\rightarrow\mathbb{R}</math> be mutually independent random variables with expectation zero and <math>|\E(Y_i^k)|\leqslant k!/2</math> for <math>i=1,\dots,d</math> and <math>k\geqslant2</math>. Then we have
<math display="block">
\P\bigl[|Y_1+\cdots+Y_d|\geqslant a\bigr]\leqslant 2\exp\bigl(-C\min\bigl(\medfrac{a^2}{d},a\bigr)\bigr)
</math>
for every <math>a > 0</math> and with <math>C=1/4</math>.
|Let <math>Y</math> be one of the random variables <math>\pm Y_1,\dots,\pm Y_d</math> and <math>0 < t\leqslant 1/2</math>. Then we estimate
<math display="block">
\begin{align*}
\E(\exp(tY)) & = \bigl|\E\bigl(1+tY +\Bigsum{k=2}{\infty}\medfrac{(tY)^k}{k!}\bigr)\bigr| \leqslant 1 + \Bigsum{k=2}{\infty}\medfrac{t^k|\E(Y^k)|}{k!}\\
& \leqslant 1 + \Bigsum{k=2}{\infty}\medfrac{t^k}{2} = 1 + \smallfrac{1}{2}\bigl(\medfrac{1}{1-t} - (t+1)\bigr)\\
& = 1 + \medfrac{1}{2}\medfrac{t^2}{1-t}\leqslant 1+t^2\leqslant \exp(t^2)
\end{align*}
</math>
where the last equality can easily be checked by means of calculus. Now we use [[#CHERNOFF-RECEIPE |Lemma]] and get
<math display="block">
\begin{align*}
\P\bigl[Y_1+\cdots+Y_d\geqslant a\bigr] & \leqslant \inf_{t > 0}\exp(-ta)\prod_{i=1}^d\E(\exp(tY_i))\\
& \leqslant \inf_{0 < t\leqslant1/2}\exp(-ta)\prod_{i=1}^d\exp(t^2)\\
& \leqslant \inf_{0 < t\leqslant1/2}\exp(-ta+dt^2).
\end{align*}
</math>
By solving <math>\frac{\dd}{\dd t}(-ta+dt^2)=-a+2dt=0</math> for <math>t</math> and considering the constraint <math>t\leqslant1/2</math> this leads to <math>t=\min(\frac{a}{2d},\frac{1}{2})</math>, and further to
<math display="block">
-ta+dt^2 \leqslant -\medfrac{1}{2}ta -\medfrac{1}{2}ta + dt\medfrac{a}{2d}=-\medfrac{1}{2}ta \leqslant -\medfrac{1}{2}\min\bigl(\medfrac{a}{2d},\medfrac{1}{2}\bigr)a = -\min\bigl(\medfrac{a^2}{4d},\medfrac{a}{4}\bigr).
</math>
This implies
<math display="block">
\P\bigl[Y_1+\cdots+Y_d\geqslant a\bigr] \leqslant \exp\Bigl(-\min\bigl(\medfrac{a^2}{4d},\medfrac{a}{4}\bigr)\Bigr)
</math>
and similiarly we get
<math display="block">
\P\bigl[Y_1+\cdots+Y_d\leqslant -a\bigr] = \P\bigl[-Y_1-\cdots-Y_d\geqslant a\bigr] \leqslant \exp\Bigl(-\min\bigl(\medfrac{a^2}{4d},\medfrac{a}{4}\bigr)\Bigr).
</math>
Consequently,
<math display="block">
\P\bigl[\bigl|Y_1+\cdots+Y_d\bigr|\geqslant a\bigr] \leqslant 2\exp\Bigl(-\min\bigl(\medfrac{a^2}{4d},\medfrac{a}{4}\bigr)\Bigr)
</math>
as desired.}}
==General references==
{{cite arXiv|last1=Wegner|first1=Sven-Ake|year=2024|title=Lecture Notes on High-Dimensional Data|eprint=2101.05841|class=math.FA}}

Latest revision as of 00:43, 2 June 2024

[math] \newcommand{\smallfrac}[2]{\frac{#1}{#2}} \newcommand{\medfrac}[2]{\frac{#1}{#2}} \newcommand{\textfrac}[2]{\frac{#1}{#2}} \newcommand{\tr}{\operatorname{tr}} \newcommand{\e}{\operatorname{e}} \newcommand{\B}{\operatorname{B}} \newcommand{\Bbar}{\overline{\operatorname{B}}} \newcommand{\pr}{\operatorname{pr}} \newcommand{\dd}{\operatorname{d}\hspace{-1pt}} \newcommand{\E}{\operatorname{E}} \newcommand{\V}{\operatorname{V}} \newcommand{\Cov}{\operatorname{Cov}} \newcommand{\Bigsum}[2]{\mathop{\textstyle\sum}_{#1}^{#2}} \newcommand{\ran}{\operatorname{ran}} \newcommand{\card}{\#} \renewcommand{\P}{\operatorname{P}} \renewcommand{\L}{\operatorname{L}} \newcommand{\mathds}{\mathbb}[/math]

We start this chapter by explaining a general technique to obtain bounds for [math]\P\bigl[Y_1+\cdots+Y_d\geqslant a\bigr][/math] where the [math]Y_i[/math] are independent random variables. A bound that arises from this ‘recipe’ is often referred to as a Chernoff bound. The basic idea is given in the following Lemma which establishes the ‘generic’ Chernoff bound.

Lemma

Let [math](\Omega,\Sigma,\P)[/math] be a probability space and let [math]Y_1,\dots,Y_n\colon\Omega\rightarrow\mathbb{R}[/math] be mutually independent random variables. Then

[[math]] \P\bigl[Y_1+\cdots+Y_n\geqslant a\bigr]\leqslant \inf_{t \gt 0}\exp(-ta)\prod_{i=1}^n\E(\exp(tY_i)) [[/math]]
holds for every [math]a \gt 0[/math].


Show Proof

For fixed [math]t \gt 0[/math] the function [math]\exp(t\cdot -)[/math] is increasing. We put [math]Y=Y_1+\cdots+Y_n[/math] and get

[[math]] \P[Y\geqslant a] = \P[\exp(tY)\geqslant \exp(ta)]\leqslant\frac{\E(\exp(tY))}{\exp(ta)} [[/math]]
by applying the Markov bound. Now we take the infimum over [math]t \gt 0[/math] and employ the independence to get

[[math]] \P[Y\geqslant a]\leqslant \inf_{t \gt 0}\exp(-ta)\E(\exp(t\Bigsum{i=1}{n} tY_i))=\inf_{t \gt 0}\exp(-ta)\prod_{i=1}^n\E(\exp(tY_i)) [[/math]]
as desired.

The classical Chernoff bound recipe is now to compute or estimate the right hand side by exploiting additional information on the [math]Y_i[/math] that is given in concrete situations. Readers unfamiliar with Chernoff bounds can find a classical example on how the recipe works in Problem where the [math]Y_i[/math] are independent Bernoulli random variables. In Theorem below the additional knowledge comes from the assumption that the moments of each [math]Y_i[/math] grow at most like factorial.

Theorem

Let [math](\Omega,\Sigma,\P)[/math] be a probability space and let [math]Y_1,\dots,Y_d\colon\Omega\rightarrow\mathbb{R}[/math] be mutually independent random variables with expectation zero and [math]|\E(Y_i^k)|\leqslant k!/2[/math] for [math]i=1,\dots,d[/math] and [math]k\geqslant2[/math]. Then we have

[[math]] \P\bigl[|Y_1+\cdots+Y_d|\geqslant a\bigr]\leqslant 2\exp\bigl(-C\min\bigl(\medfrac{a^2}{d},a\bigr)\bigr) [[/math]]
for every [math]a \gt 0[/math] and with [math]C=1/4[/math].


Show Proof

Let [math]Y[/math] be one of the random variables [math]\pm Y_1,\dots,\pm Y_d[/math] and [math]0 \lt t\leqslant 1/2[/math]. Then we estimate

[[math]] \begin{align*} \E(\exp(tY)) & = \bigl|\E\bigl(1+tY +\Bigsum{k=2}{\infty}\medfrac{(tY)^k}{k!}\bigr)\bigr| \leqslant 1 + \Bigsum{k=2}{\infty}\medfrac{t^k|\E(Y^k)|}{k!}\\ & \leqslant 1 + \Bigsum{k=2}{\infty}\medfrac{t^k}{2} = 1 + \smallfrac{1}{2}\bigl(\medfrac{1}{1-t} - (t+1)\bigr)\\ & = 1 + \medfrac{1}{2}\medfrac{t^2}{1-t}\leqslant 1+t^2\leqslant \exp(t^2) \end{align*} [[/math]]
where the last equality can easily be checked by means of calculus. Now we use Lemma and get

[[math]] \begin{align*} \P\bigl[Y_1+\cdots+Y_d\geqslant a\bigr] & \leqslant \inf_{t \gt 0}\exp(-ta)\prod_{i=1}^d\E(\exp(tY_i))\\ & \leqslant \inf_{0 \lt t\leqslant1/2}\exp(-ta)\prod_{i=1}^d\exp(t^2)\\ & \leqslant \inf_{0 \lt t\leqslant1/2}\exp(-ta+dt^2). \end{align*} [[/math]]
By solving [math]\frac{\dd}{\dd t}(-ta+dt^2)=-a+2dt=0[/math] for [math]t[/math] and considering the constraint [math]t\leqslant1/2[/math] this leads to [math]t=\min(\frac{a}{2d},\frac{1}{2})[/math], and further to

[[math]] -ta+dt^2 \leqslant -\medfrac{1}{2}ta -\medfrac{1}{2}ta + dt\medfrac{a}{2d}=-\medfrac{1}{2}ta \leqslant -\medfrac{1}{2}\min\bigl(\medfrac{a}{2d},\medfrac{1}{2}\bigr)a = -\min\bigl(\medfrac{a^2}{4d},\medfrac{a}{4}\bigr). [[/math]]

This implies

[[math]] \P\bigl[Y_1+\cdots+Y_d\geqslant a\bigr] \leqslant \exp\Bigl(-\min\bigl(\medfrac{a^2}{4d},\medfrac{a}{4}\bigr)\Bigr) [[/math]]
and similiarly we get

[[math]] \P\bigl[Y_1+\cdots+Y_d\leqslant -a\bigr] = \P\bigl[-Y_1-\cdots-Y_d\geqslant a\bigr] \leqslant \exp\Bigl(-\min\bigl(\medfrac{a^2}{4d},\medfrac{a}{4}\bigr)\Bigr). [[/math]]
Consequently,

[[math]] \P\bigl[\bigl|Y_1+\cdots+Y_d\bigr|\geqslant a\bigr] \leqslant 2\exp\Bigl(-\min\bigl(\medfrac{a^2}{4d},\medfrac{a}{4}\bigr)\Bigr) [[/math]]
as desired.

General references

Wegner, Sven-Ake (2024). "Lecture Notes on High-Dimensional Data". arXiv:2101.05841 [math.FA].