guide:4b840b5280: Difference between revisions
No edit summary |
No edit summary |
||
Line 1: | Line 1: | ||
In [[wikipedia:probability theory|probability theory]], the '''central limit theorem''' ('''CLT''') establishes that, in many situations, when [[wikipedia:Statistical independence|independent random variables]] are summed up, their properly [[wikipedia:Normalization (statistics)|normalized]] sum tends toward a [[wikipedia:normal distribution|normal distribution]] (informally a ''bell curve'') even if the original variables themselves are not normally distributed. The theorem is a key concept in probability theory because it implies that probabilistic and [[wikipedia:Statistics|statistical]] methods that work for normal distributions can be applicable to many problems involving other types of distributions. This theorem has seen many changes during the formal development of probability theory. Previous versions of the theorem date back to 1811, but in its modern general form, this fundamental result in probability theory was precisely stated as late as 1920,<ref>{{cite web |last1=Fischer |first1=Hans |title=A history of the central limit theorem |url=http://www.medicine.mcgill.ca/epidemiology/hanley/bios601/GaussianModel/HistoryCentralLimitTheorem.pdf |publisher=Springer New York Dordrecht Heidelberg London |access-date=29 April 2021}}</ref> thereby serving as a bridge between classical and modern probability theory. | |||
The central limit theorem has several variants. In its common form, the random variables must be identically distributed. In variants, convergence of the mean to the normal distribution also occurs for non-identical distributions or for non-independent observations, if they comply with certain conditions. | |||
The earliest version of this theorem, that the normal distribution may be used as an approximation to the [[wikipedia:binomial distribution|binomial distribution]], is the [[wikipedia:de Moivre–Laplace theorem | de Moivre–Laplace theorem]]. | |||
==Classical CLT== | |||
Let <math display="inline">\{X_1, \ldots, X_n\}</math> be a [[wikipedia:random sample|random sample]] of size <math display="inline">n</math> — that is, a sequence of [[wikipedia:independent and identically distributed|independent and identically distributed]] (i.i.d.) random variables drawn from a distribution of [[wikipedia:expected value|expected value]] given by <math display="inline">\mu</math> and finite [[wikipedia:variance|variance]] given by {{nowrap|<math display="inline">\sigma^2</math>.}} Suppose we are interested in the [[wikipedia:sample mean|sample average]] | |||
<math display = "block">\bar{X}_n \equiv \frac{X_1 + \cdots + X_n}{n}</math> | |||
of these random variables. By the [[wikipedia:law of large numbers|law of large numbers]], the sample averages [[wikipedia:Almost sure convergence|converge almost surely]] (and therefore also [[wikipedia:Convergence in probability|converge in probability]]) to the expected value <math display="inline">\mu</math> as {{nowrap|<math display="inline">n\to\infty</math>.}} The classical central limit theorem describes the size and the distributional form of the stochastic fluctuations around the deterministic number <math display="inline">\mu</math> during this convergence. More precisely, it states that as <math display="inline">n</math> gets larger, the distribution of the difference between the sample average <math display="inline">\bar{X}_n</math> and its limit {{nowrap|<math display="inline">\mu</math>,}} when multiplied by the factor <math display="inline">\sqrt{n}</math> {{nowrap|<big>(</big>that is <math display="inline">\sqrt{n}(\bar{X}_n - \mu)</math><big>)</big>}} approximates the [[wikipedia:normal distribution|normal distribution]] with mean 0 and variance {{nowrap|<math display="inline">\sigma^2</math>.}} For large enough {{mvar|n}}, the distribution of <math display="inline">\bar{X}_n</math> is close to the normal distribution with mean <math display="inline">\mu</math> and variance {{nowrap|<math display="inline">\sigma^2/n</math>.}} The usefulness of the theorem is that the distribution of <math display="inline">\sqrt{n}(\bar{X}_n - \mu)</math> approaches normality regardless of the shape of the distribution of the individual {{nowrap|<math display="inline">X_i</math>.}} Formally, the theorem can be stated as follows: | |||
<blockquote>'''Lindeberg–Lévy CLT.''' Suppose <math display="inline">\{X_1, \ldots, X_n\}</math> is a sequence of [[wikipedia:independent and identically distributed|i.i.d.]] random variables with <math>\mathbb{E}[X_i] = \mu</math> and {{nowrap|<math display="inline">\operatorname{Var}[X_i] = \sigma^2 < \infty</math>.}} Then as <math display="inline">n</math> approaches infinity, the random variables <math display="inline">\sqrt{n}(\bar{X}_n - \mu)</math> [[wikipedia:convergence in distribution|converge in distribution]] to a [[wikipedia:normal distribution|normal]] {{nowrap|<math display="inline">\mathcal{N}(0, \sigma^2)</math>:}}<ref>Billingsley (1995, p. 357)</ref> | |||
<math>\sqrt{n}\left(\bar{X}_n - \mu\right)\ \xrightarrow{d}\ \mathcal{N}\left(0,\sigma^2\right) .</math></blockquote> | |||
In the case {{nowrap|<math display="inline">\sigma > 0</math>,}} convergence in distribution means that the [[wikipedia:cumulative distribution function|cumulative distribution function]]s of <math display="inline">\sqrt{n}(\bar{X}_n - \mu)</math> converge pointwise to the cdf of the <math display="inline">\mathcal{N}(0, \sigma^2)</math> distribution: for every real {{nowrap|number <math display="inline">z</math>,}} | |||
<math display = "block">\lim_{n\to\infty} \operatorname{P}\left[\sqrt{n}(\bar{X}_n-\mu) \le z\right] = \lim_{n\to\infty} \operatorname{P}\left[\frac{\sqrt{n}(\bar{X}_n-\mu)}{\sigma } \le \frac{z}{\sigma}\right]= \Phi\left(\frac{z}{\sigma}\right) ,</math> | |||
where <math display="inline">\Phi(z)</math> is the standard normal cdf evaluated {{nowrap|at <math display="inline">z</math>.}} The convergence is uniform in <math display="inline">z</math> in the sense that | |||
<math display="block">\lim_{n\to\infty}\;\sup_{z\in \mathbb{R}}\;\left|\operatorname{P}\left[\sqrt{n}(\bar{X}_n-\mu) \le z\right] - \Phi\left(\frac{z}{\sigma}\right)\right| = 0~,</math> | |||
where <math>\sup</math> denotes the least upper bound (or [[wikipedia:supremum|supremum]]) of the set.<ref>Bauer (2001, Theorem 30.13, p.199)</ref> | |||
==Convergence to the limit== | |||
The central limit theorem applies in particular to sums of independent and identically distributed [[wikipedia:discrete random variable|discrete random variable]]s. A sum of [[wikipedia:discrete random variable|discrete random variable]]s is still a [[wikipedia:discrete random variable|discrete random variable]], so that we are confronted with a sequence of [[wikipedia:discrete random variable|discrete random variable]]s whose cumulative probability distribution function converges towards a cumulative probability distribution function corresponding to a continuous variable (namely that of the [[wikipedia:normal distribution|normal distribution]]). This means that if we build a [[wikipedia:histogram|histogram]] of the realizations of the sum of {{mvar|n}} independent identical discrete variables, the curve that joins the centers of the upper faces of the rectangles forming the histogram converges toward a Gaussian curve as {{mvar|n}} approaches infinity, this relation is known as [[wikipedia:de Moivre–Laplace theorem | de Moivre–Laplace theorem]]. The [[wikipedia:binomial distribution|binomial distribution]] article details such an application of the central limit theorem in the simple case of a discrete variable taking only two possible values. | |||
==Density functions== | |||
The [[wikipedia:probability density function|density]] of the sum of two or more independent variables is the [[wikipedia:convolution|convolution]] of their densities (if these densities exist). Thus the central limit theorem can be interpreted as a statement about the properties of density functions under convolution: the convolution of a number of density functions tends to the normal density as the number of density functions increases without bound. These theorems require stronger hypotheses than the forms of the central limit theorem given above. Theorems of this type are often called local limit theorems. See Petrov<ref>{{Cite book|last=Petrov|first=V. V.|title=Sums of Independent Random Variables|year=1976|publisher=Springer-Verlag|location=New York-Heidelberg|isbn=9783642658099|at=ch. 7|url=https://books.google.com/books?id=zSDqCAAAQBAJ}}</ref> for a particular local limit theorem for sums of [[wikipedia:independent and identically distributed random variables|independent and identically distributed random variables]]. | |||
==Applications and examples== | |||
===Simple example=== | |||
A simple example of the central limit theorem is rolling many identical, unbiased dice. The distribution of the sum (or average) of the rolled numbers will be well approximated by a normal distribution. Since real-world quantities are often the balanced sum of many unobserved random events, the central limit theorem also provides a partial explanation for the prevalence of the normal probability distribution. It also justifies the approximation of large-sample [[wikipedia:statistic|statistic]]s to the normal distribution in controlled experiments. | |||
===Real applications=== | |||
Published literature contains a number of useful and interesting examples and applications relating to the central limit theorem.<ref>Dinov, Christou & Sánchez (2008)</ref> One source<ref>{{cite web|url=http://wiki.stat.ucla.edu/socr/index.php/SOCR_EduMaterials_Activities_GCLT_Applications |title=SOCR EduMaterials Activities GCLT Applications - Socr |website=Wiki.stat.ucla.edu |date=2010-05-24 |access-date=2017-01-23}}</ref> states the following examples: | |||
*The probability distribution for total distance covered in a [[wikipedia:random walk|random walk]] (biased or unbiased) will tend toward a [[wikipedia:normal distribution|normal distribution]]. | |||
*Flipping many coins will result in a normal distribution for the total number of heads (or equivalently total number of tails). | |||
From another viewpoint, the central limit theorem explains the common appearance of the "bell curve" in [[wikipedia:density estimation|density estimates]] applied to real world data. In cases like electronic noise, examination grades, and so on, we can often regard a single measured value as the weighted average of many small effects. Using generalisations of the central limit theorem, we can then see that this would often (though not always) produce a final distribution that is approximately normal. | |||
In general, the more a measurement is like the sum of independent variables with equal influence on the result, the more normality it exhibits. This justifies the common use of this distribution to stand in for the effects of unobserved variables in models like the [[wikipedia:linear model|linear model]]. | |||
==Notes== | |||
{{reflist}} | |||
==References== | |||
*{{cite web |url= https://en.wikipedia.org/w/index.php?title=Central_limit_theorem&oldid=1053264438 |title= Central limit theorem |author = Wikipedia contributors |website= Wikipedia |publisher= Wikipedia |access-date = 28 January 2022 }} |
Revision as of 02:33, 31 May 2022
In probability theory, the central limit theorem (CLT) establishes that, in many situations, when independent random variables are summed up, their properly normalized sum tends toward a normal distribution (informally a bell curve) even if the original variables themselves are not normally distributed. The theorem is a key concept in probability theory because it implies that probabilistic and statistical methods that work for normal distributions can be applicable to many problems involving other types of distributions. This theorem has seen many changes during the formal development of probability theory. Previous versions of the theorem date back to 1811, but in its modern general form, this fundamental result in probability theory was precisely stated as late as 1920,[1] thereby serving as a bridge between classical and modern probability theory.
The central limit theorem has several variants. In its common form, the random variables must be identically distributed. In variants, convergence of the mean to the normal distribution also occurs for non-identical distributions or for non-independent observations, if they comply with certain conditions.
The earliest version of this theorem, that the normal distribution may be used as an approximation to the binomial distribution, is the de Moivre–Laplace theorem.
Classical CLT
Let [math]\{X_1, \ldots, X_n\}[/math] be a random sample of size [math]n[/math] — that is, a sequence of independent and identically distributed (i.i.d.) random variables drawn from a distribution of expected value given by [math]\mu[/math] and finite variance given by [math]\sigma^2[/math]. Suppose we are interested in the sample average
of these random variables. By the law of large numbers, the sample averages converge almost surely (and therefore also converge in probability) to the expected value [math]\mu[/math] as [math]n\to\infty[/math]. The classical central limit theorem describes the size and the distributional form of the stochastic fluctuations around the deterministic number [math]\mu[/math] during this convergence. More precisely, it states that as [math]n[/math] gets larger, the distribution of the difference between the sample average [math]\bar{X}_n[/math] and its limit [math]\mu[/math], when multiplied by the factor [math]\sqrt{n}[/math] (that is [math]\sqrt{n}(\bar{X}_n - \mu)[/math]) approximates the normal distribution with mean 0 and variance [math]\sigma^2[/math]. For large enough n, the distribution of [math]\bar{X}_n[/math] is close to the normal distribution with mean [math]\mu[/math] and variance [math]\sigma^2/n[/math]. The usefulness of the theorem is that the distribution of [math]\sqrt{n}(\bar{X}_n - \mu)[/math] approaches normality regardless of the shape of the distribution of the individual [math]X_i[/math]. Formally, the theorem can be stated as follows:
Lindeberg–Lévy CLT. Suppose [math]\{X_1, \ldots, X_n\}[/math] is a sequence of i.i.d. random variables with [math]\mathbb{E}[X_i] = \mu[/math] and [math]\operatorname{Var}[X_i] = \sigma^2 \lt \infty[/math]. Then as [math]n[/math] approaches infinity, the random variables [math]\sqrt{n}(\bar{X}_n - \mu)[/math] converge in distribution to a normal [math]\mathcal{N}(0, \sigma^2)[/math]:[2] [math]\sqrt{n}\left(\bar{X}_n - \mu\right)\ \xrightarrow{d}\ \mathcal{N}\left(0,\sigma^2\right) .[/math]
In the case [math]\sigma \gt 0[/math], convergence in distribution means that the cumulative distribution functions of [math]\sqrt{n}(\bar{X}_n - \mu)[/math] converge pointwise to the cdf of the [math]\mathcal{N}(0, \sigma^2)[/math] distribution: for every real number [math]z[/math],
where [math]\Phi(z)[/math] is the standard normal cdf evaluated at [math]z[/math]. The convergence is uniform in [math]z[/math] in the sense that
where [math]\sup[/math] denotes the least upper bound (or supremum) of the set.[3]
Convergence to the limit
The central limit theorem applies in particular to sums of independent and identically distributed discrete random variables. A sum of discrete random variables is still a discrete random variable, so that we are confronted with a sequence of discrete random variables whose cumulative probability distribution function converges towards a cumulative probability distribution function corresponding to a continuous variable (namely that of the normal distribution). This means that if we build a histogram of the realizations of the sum of n independent identical discrete variables, the curve that joins the centers of the upper faces of the rectangles forming the histogram converges toward a Gaussian curve as n approaches infinity, this relation is known as de Moivre–Laplace theorem. The binomial distribution article details such an application of the central limit theorem in the simple case of a discrete variable taking only two possible values.
Density functions
The density of the sum of two or more independent variables is the convolution of their densities (if these densities exist). Thus the central limit theorem can be interpreted as a statement about the properties of density functions under convolution: the convolution of a number of density functions tends to the normal density as the number of density functions increases without bound. These theorems require stronger hypotheses than the forms of the central limit theorem given above. Theorems of this type are often called local limit theorems. See Petrov[4] for a particular local limit theorem for sums of independent and identically distributed random variables.
Applications and examples
Simple example
A simple example of the central limit theorem is rolling many identical, unbiased dice. The distribution of the sum (or average) of the rolled numbers will be well approximated by a normal distribution. Since real-world quantities are often the balanced sum of many unobserved random events, the central limit theorem also provides a partial explanation for the prevalence of the normal probability distribution. It also justifies the approximation of large-sample statistics to the normal distribution in controlled experiments.
Real applications
Published literature contains a number of useful and interesting examples and applications relating to the central limit theorem.[5] One source[6] states the following examples:
- The probability distribution for total distance covered in a random walk (biased or unbiased) will tend toward a normal distribution.
- Flipping many coins will result in a normal distribution for the total number of heads (or equivalently total number of tails).
From another viewpoint, the central limit theorem explains the common appearance of the "bell curve" in density estimates applied to real world data. In cases like electronic noise, examination grades, and so on, we can often regard a single measured value as the weighted average of many small effects. Using generalisations of the central limit theorem, we can then see that this would often (though not always) produce a final distribution that is approximately normal.
In general, the more a measurement is like the sum of independent variables with equal influence on the result, the more normality it exhibits. This justifies the common use of this distribution to stand in for the effects of unobserved variables in models like the linear model.
Notes
- Fischer, Hans. "A history of the central limit theorem" (PDF). Springer New York Dordrecht Heidelberg London. Retrieved 29 April 2021.
- Billingsley (1995, p. 357)
- Bauer (2001, Theorem 30.13, p.199)
- Petrov, V. V. (1976). Sums of Independent Random Variables. New York-Heidelberg: Springer-Verlag. ch. 7. ISBN 9783642658099.
- Dinov, Christou & Sánchez (2008)
- "SOCR EduMaterials Activities GCLT Applications - Socr". Wiki.stat.ucla.edu. 2010-05-24. Retrieved 2017-01-23.
References
- Wikipedia contributors. "Central limit theorem". Wikipedia. Wikipedia. Retrieved 28 January 2022.