guide:B5ab48c211: Difference between revisions
No edit summary |
mNo edit summary |
||
(One intermediate revision by the same user not shown) | |||
Line 1: | Line 1: | ||
==Binomial== | ==Binomial== | ||
The '''binomial distribution''' with parameters <math>n</math> and <math>p</math> is the [[ | The '''binomial distribution''' with parameters <math>n</math> and <math>p</math> is the [[guide:82d603b116#Discrete_probability_distribution|discrete probability distribution]] of the number of successes in a sequence of <math>n</math> [[guide:Af39987afc|independent]] yes/no experiments, each of which yields success with probability <math>p</math>. | ||
A success/failure experiment is also called a Bernoulli experiment or [[ | A success/failure experiment is also called a Bernoulli experiment or [[Bernoulli trial|Bernoulli trial]]; when <math>n = 1</math>, the binomial distribution is a [[#Bernoulli_distribution|Bernoulli distribution]]. The binomial distribution is the basis for the popular [[binomial test|binomial test]] of [[statistical significance|statistical significance]]. | ||
The binomial distribution is frequently used to model the number of successes in a sample of size <math>n</math> drawn [[ | The binomial distribution is frequently used to model the number of successes in a sample of size <math>n</math> drawn [[Sampling (statistics)#Replacement of selected units|with replacement]] from a population of size <math>N</math>. If the sampling is carried out without replacement, the draws are not independent and so the resulting distribution is a [[hypergeometric distribution|hypergeometric distribution]], not a binomial one. However, for <math>N</math> much larger than <math>n</math>, the binomial distribution is a good approximation, and widely used. | ||
===Specification=== | ===Specification=== | ||
Line 10: | Line 10: | ||
====Probability mass function==== | ====Probability mass function==== | ||
In general, if the random variable <math>X</math> follows the binomial distribution with parameters <math>n</math> ∈ ℕ and <math>p</math> ∈ [0,1], we write <math>X \sim B(n,p)</math>. The probability of getting exactly <math>k</math> successes in <math>n</math> trials is given by the [[ | In general, if the random variable <math>X</math> follows the binomial distribution with parameters <math>n</math> ∈ ℕ and <math>p</math> ∈ [0,1], we write <math>X \sim B(n,p)</math>. The probability of getting exactly <math>k</math> successes in <math>n</math> trials is given by the [[guide:82d603b116#Probability_Mass_Function|probability mass function]]: | ||
<math display="block"> f(k;n,p) = \operatorname{P}(X = k) = \binom n k p^k(1-p)^{n-k}</math> | <math display="block"> f(k;n,p) = \operatorname{P}(X = k) = \binom n k p^k(1-p)^{n-k}</math> | ||
Line 18: | Line 18: | ||
<math display="block">\binom n k =\frac{n!}{k!(n-k)!}</math> | <math display="block">\binom n k =\frac{n!}{k!(n-k)!}</math> | ||
is the [[ | is the [[binomial coefficient|binomial coefficient]], hence the name of the distribution. The formula can be understood as follows: we want exactly <math>k</math> successes (<math>p^k</math>) and <math>n-k</math> failures (<math>(1-p)^{-(n-k)}</math>). However, the <math>k</math> successes can occur anywhere among the <math>n</math> trials, and there are <math>{n\choose k}</math> different ways of distributing <math>k</math> successes in a sequence of <math>n</math> trials. | ||
In creating reference tables for binomial distribution probability, usually the table is filled in up to <math>n/2</math> values. This is because for <math>k \gt n/2</math>, the probability can be calculated by its complement as | In creating reference tables for binomial distribution probability, usually the table is filled in up to <math>n/2</math> values. This is because for <math>k \gt n/2</math>, the probability can be calculated by its complement as | ||
Line 24: | Line 24: | ||
<math display="block">f(k,n,p)=f(n-k,n,1-p). </math> | <math display="block">f(k,n,p)=f(n-k,n,1-p). </math> | ||
The probability mass function satisfies the following [[ | The probability mass function satisfies the following [[recurrence relation|recurrence relation]], for every <math>n,p</math>:<math display="block">\left\{\begin{array}{l} | ||
p (n-k) f(k,n,p) = (k+1) (1-p) | p (n-k) f(k,n,p) = (k+1) (1-p) | ||
f(k+1,n,p), \\[10pt] | f(k+1,n,p), \\[10pt] | ||
Line 31: | Line 31: | ||
Looking at the expression <math>f(k,n,p)</math> as a function of <math>k</math>, there is a <math>k</math> value that maximizes it. This <math>k</math> value can be found by calculating<math display="block"> \frac{f(k+1,n,p)}{f(k,n,p)}=\frac{(n-k)p}{(k+1)(1-p)} </math> | Looking at the expression <math>f(k,n,p)</math> as a function of <math>k</math>, there is a <math>k</math> value that maximizes it. This <math>k</math> value can be found by calculating<math display="block"> \frac{f(k+1,n,p)}{f(k,n,p)}=\frac{(n-k)p}{(k+1)(1-p)} </math> | ||
and comparing it to 1. There is always an integer | and comparing it to 1. There is always an integer <math>M</math> that satisfies<math display="block">(n+1)p-1 \leq M < (n+1)p.</math> | ||
<math>f(k,n,p)</math> is monotone increasing for <math>k < M</math> and monotone decreasing for <math>k \gt M</math>, with the exception of the case where <math>(n+1)p</math> is an integer. In this case, there are two values for which <math>f</math> is maximal: <math>(n+1)p</math> and <math>(n+1)p-1</math>. <math>M</math> is the ''most probable'' (''most likely'') outcome of the Bernoulli trials and is called the [[ | <math>f(k,n,p)</math> is monotone increasing for <math>k < M</math> and monotone decreasing for <math>k \gt M</math>, with the exception of the case where <math>(n+1)p</math> is an integer. In this case, there are two values for which <math>f</math> is maximal: <math>(n+1)p</math> and <math>(n+1)p-1</math>. <math>M</math> is the ''most probable'' (''most likely'') outcome of the Bernoulli trials and is called the [[Mode (statistics)|mode]]. Note that the probability of it occurring can be fairly small. | ||
====Cumulative distribution function==== | ====Cumulative distribution function==== | ||
The [[ | The [[guide:82d603b116#Cumulative_distribution_function|cumulative distribution function]] can be expressed as:<math display="block">F(k;n,p) = \Pr(X \le k) = \sum_{i=0}^{\lfloor k \rfloor} {n\choose i}p^i(1-p)^{n-i}</math> | ||
where <math>\scriptstyle \lfloor k\rfloor\,</math> is the "floor" under <math>k</math>, i.e. the [[ | where <math>\scriptstyle \lfloor k\rfloor\,</math> is the "floor" under <math>k</math>, i.e. the [[greatest integer|greatest integer]] less than or equal to <math>k</math>. | ||
===Mean and Variance=== | ===Mean and Variance=== | ||
If <math>X \sim B(n,p)</math>, that is, <math>X</math> is a binomially distributed random variable, <math>n</math> being the total number of experiments and <math>p</math> the probability of each experiment yielding a successful result, then the [[ | If <math>X \sim B(n,p)</math>, that is, <math>X</math> is a binomially distributed random variable, <math>n</math> being the total number of experiments and <math>p</math> the probability of each experiment yielding a successful result, then the [[guide:82d603b116|expected value]] of <math>X</math> is <math>np</math> and the variance is <math>npq</math>. This follows directly from the fact that <math>X</math> is equal in distribution to the sum of <math>n</math> independent [[#Bernoulli_distribution|Bernouilli]] random variables each having success probability <math>p</math> (see [[#Bernoulli_distribution|below]]). | ||
===Mode=== | ===Mode=== | ||
Usually the [[ | Usually the [[mode (statistics)|mode]] of a binomial <math>B(n,p)</math> distribution is equal to <math>\lfloor (n+1)p\rfloor</math>, where <math>\lfloor\cdot\rfloor</math> is the [[floor function|floor function]]. However, when<math>(n+1)p</math> is an integer and <math>p</math> is neither 0 nor 1, then the distribution has two modes: <math>(n+1)p</math>(''n'' + 1)''p'' and <math>(n+1)p -1 </math>. When <math>p</math> is equal to 0 or 1, the mode will be 0 and <math>n</math> correspondingly. These cases can be summarized as follows: | ||
<math display="block"> | <math display="block"> | ||
Line 55: | Line 55: | ||
n & \text{if }(n+1)p = n + 1. | n & \text{if }(n+1)p = n + 1. | ||
\end{cases}</math> | \end{cases}</math> | ||
===Median=== | ===Median=== | ||
In general, there is no single formula to find the [[ | In general, there is no single formula to find the [[median|median]] for a binomial distribution, and it may even be non-unique. However several special results have been established: | ||
* If <math>np</math> is an integer, then the mean, median, and mode coincide and equal <math>np</math>.<ref>{{cite journal|last=Neumann|first=P.|year=1966|title=Über den Median der Binomial- and Poissonverteilung|journal=Wissenschaftliche Zeitschrift der Technischen Universität Dresden|volume=19|pages=29–33|language=German}}</ref><ref>Lord, Nick. (July 2010). "Binomial averages when the mean is an integer", [[ | * If <math>np</math> is an integer, then the mean, median, and mode coincide and equal <math>np</math>.<ref>{{cite journal|last=Neumann|first=P.|year=1966|title=Über den Median der Binomial- and Poissonverteilung|journal=Wissenschaftliche Zeitschrift der Technischen Universität Dresden|volume=19|pages=29–33|language=German}}</ref><ref>Lord, Nick. (July 2010). "Binomial averages when the mean is an integer", [[The Mathematical Gazette|The Mathematical Gazette]] 94, 331-332.</ref> | ||
* Any median <math>m</math> must lie within the interval ⌊<math>np</math>⌋ ≤ <math>m</math> ≤ ⌈<math>np</math>⌉.<ref name="KaasBuhrman">{{cite journal|first1=R.|last1=Kaas|first2=J.M.|last2=Buhrman|title=Mean, Median and Mode in Binomial Distributions|journal=Statistica Neerlandica|year=1980|volume=34|issue=1|pages=13–18|doi=10.1111/j.1467-9574.1980.tb00681.x}}</ref> | * Any median <math>m</math> must lie within the interval ⌊<math>np</math>⌋ ≤ <math>m</math> ≤ ⌈<math>np</math>⌉.<ref name="KaasBuhrman">{{cite journal|first1=R.|last1=Kaas|first2=J.M.|last2=Buhrman|title=Mean, Median and Mode in Binomial Distributions|journal=Statistica Neerlandica|year=1980|volume=34|issue=1|pages=13–18|doi=10.1111/j.1467-9574.1980.tb00681.x}}</ref> | ||
* A median <math>m</math> cannot lie too far away from the mean: <math>m-np \leq \textrm{min}\{\ln(2),\textrm{max}\{p,1-p\}\}</math>.<ref name="Hamza">{{Cite journal | * A median <math>m</math> cannot lie too far away from the mean: <math>m-np \leq \textrm{min}\{\ln(2),\textrm{max}\{p,1-p\}\}</math>.<ref name="Hamza">{{Cite journal | ||
Line 74: | Line 72: | ||
| pmc = | | pmc = | ||
}}</ref> | }}</ref> | ||
* The median is unique and equal to <math>m=</math>[[ | * The median is unique and equal to <math>m=</math>[[Rounding|round]](<math>np</math>) in cases when either <math>p\leq 1-\ln(2)</math> or <math>p\geq \ln(2)</math> or <math>|m-np| \leq \textrm{min}\{p, 1-p\}</math> (except for the case when <math>p = 1/2 </math> and <math>n</math> is odd).<ref name="KaasBuhrman"/><ref name="Hamza"/> | ||
* When <math>p=1/2</math> and <math>n</math> is odd, any number <math>m</math> in the interval <math>[(n-1)/2,(n+1)/2]</math> is a median of the binomial distribution. If <math>p = 1/2</math> and <math>n</math> is even, then <math>m = n/2</math> is the unique median. | * When <math>p=1/2</math> and <math>n</math> is odd, any number <math>m</math> in the interval <math>[(n-1)/2,(n+1)/2]</math> is a median of the binomial distribution. If <math>p = 1/2</math> and <math>n</math> is even, then <math>m = n/2</math> is the unique median. | ||
Line 85: | Line 83: | ||
====Bernoulli distribution <span id="bernouilli"></span>==== | ====Bernoulli distribution <span id="bernouilli"></span>==== | ||
The [[ | The [[Bernoulli distribution|Bernoulli distribution]] is a special case of the binomial distribution, where <math> n = 1</math>. Symbolically, <math>X \sim B(1,p)</math> has the same meaning as <math>X \sim B(p) </math>. Conversely, any binomial distribution, <math>B(n,p)</math>, is the distribution of the sum of <math>n</math> [[Bernoulli trials|Bernoulli trials]], <math>B(p)</math>, each with the same probability <math>p</math>. | ||
====Normal approximation==== | ====Normal approximation==== | ||
If <math>n</math> is large enough, then the skew of the distribution is not too great. In this case a reasonable approximation to <math>B(n,p)</math> is given by the [[ | If <math>n</math> is large enough, then the skew of the distribution is not too great. In this case a reasonable approximation to <math>B(n,p)</math> is given by the [[normal distribution|normal distribution]] <math> \mathcal{N}(np,\,np(1-p))</math>, and this basic approximation can be improved in a simple way by using a suitable [[continuity correction|continuity correction]]. The basic approximation generally improves as <math>n</math> increases (at least 20) and is better when <math>p</math> is not near to 0 or 1.<ref name="bhh">{{cite book|title=Statistics for experimenters|author=Box, Hunter and Hunter|publisher=Wiley|year=1978|page=130}}</ref> Various heuristics may be used to decide whether <math>n</math> is large enough, and <math>p</math> is far enough from the extremes of zero or one: | ||
*One rule is that both <math>x = np </math> and <math>n-p</math> must be greater than 5. However, the specific number varies from source to source, and depends on how good an approximation one wants; some sources give 10 which gives virtually the same results as the following rule for large <math>n</math> until <math>n</math> is very large. | *One rule is that both <math>x = np </math> and <math>n-p</math> must be greater than 5. However, the specific number varies from source to source, and depends on how good an approximation one wants; some sources give 10 which gives virtually the same results as the following rule for large <math>n</math> until <math>n</math> is very large. | ||
Line 100: | Line 98: | ||
<math display="block">\mu \pm 3 \sigma = np \pm 3 \sqrt{np(1-p)} \in [0,n].</math> | <math display="block">\mu \pm 3 \sigma = np \pm 3 \sqrt{np(1-p)} \in [0,n].</math> | ||
The following is an example of applying a [[ | The following is an example of applying a [[continuity correction|continuity correction]]. Suppose one wishes to calculate <math>\operatorname{P}(X \leq 8) </math> for a binomial random variable <math>X</math>. If <math>Y</math> has a distribution given by the normal approximation, then <math>\operatorname{P}(X \leq 8 )</math> is approximated by <math>\operatorname{P}(Y \leq 8.5 ) </math>. The addition of 0.5 is the continuity correction; the uncorrected normal approximation gives considerably less accurate results. | ||
This approximation, known as [[ | This approximation, known as [[de Moivre–Laplace theorem|de Moivre–Laplace theorem]], is a huge time-saver when undertaking calculations by hand (exact calculations with large <math>n</math> are very onerous); historically, it was the first use of the normal distribution, introduced in [[Abraham de Moivre|Abraham de Moivre]]'s book ''[[The Doctrine of Chances|The Doctrine of Chances]]'' in 1738. Nowadays, it can be seen as a consequence of the [[central limit theorem|central limit theorem]] since <math>B(n,p)</math> is a sum of <math>n</math> independent, identically distributed [[Bernoulli distribution|Bernoulli variables]] with parameter <math>p</math>. This fact is the basis of a [[hypothesis test|hypothesis test]], a "proportion z-test", for the value of <math>p</math> using <math>x/n</math>, the sample proportion and estimator of <math>p</math>, in a [[common test statistics|common test statistic]].<ref>[[NIST|NIST]]/[[SEMATECH|SEMATECH]], [http://www.itl.nist.gov/div898/handbook/prc/section2/prc24.htm "7.2.4. Does the proportion of defectives meet requirements?"] ''e-Handbook of Statistical Methods.''</ref> | ||
For example, suppose one randomly samples <math>n</math> people out of a large population and ask them whether they agree with a certain statement. The proportion of people who agree will of course depend on the sample. If groups of <math>n</math> people were sampled repeatedly and truly randomly, the proportions would follow an approximate normal distribution with mean equal to the true proportion <math>p</math> of agreement in the population and with standard deviation <math>\sigma = \sqrt{\frac{p(1-p)}{n}}</math> | For example, suppose one randomly samples <math>n</math> people out of a large population and ask them whether they agree with a certain statement. The proportion of people who agree will of course depend on the sample. If groups of <math>n</math> people were sampled repeatedly and truly randomly, the proportions would follow an approximate normal distribution with mean equal to the true proportion <math>p</math> of agreement in the population and with standard deviation <math>\sigma = \sqrt{\frac{p(1-p)}{n}}</math> | ||
Line 111: | Line 109: | ||
<math display="block">\operatorname{P}(Y=k) = (1 - p)^k\,p\,.</math> | <math display="block">\operatorname{P}(Y=k) = (1 - p)^k\,p\,.</math> | ||
===Moments === | ===Moments === | ||
The expected value of the geometrically distributed random variable <math>Y | The expected value of the geometrically distributed random variable <math>Y</math> and its variance is | ||
<math display="block"> | <math display="block"> | ||
\operatorname{E}(Y) = \ | \operatorname{E}(Y) = \frac{p}{1-p}, \qquad\operatorname{Var}(Y) = \frac{p}{(1-p)^2}. | ||
</math> | </math> | ||
===Related distributions=== | ===Related distributions=== | ||
* The geometric distribution <math>Y</math> is a special case of the [[ | * The geometric distribution <math>Y</math> is a special case of the [[#Negative_Binomial|negative binomial distribution]], with <math>r = 1 </math>. More generally, if <math>Y_1,\ldots,Y_r</math> are [[guide:Af39987afc|independent]] geometrically distributed variables with parameter <math>p</math>, then the sum | ||
<math display="block">Z = \sum_{m=1}^r Y_m</math> | <math display="block">Z = \sum_{m=1}^r Y_m</math> | ||
Line 139: | Line 131: | ||
is also geometrically distributed, with parameter <math>p = 1-\prod_m(1-p_{m}).</math> | is also geometrically distributed, with parameter <math>p = 1-\prod_m(1-p_{m}).</math> | ||
* Suppose <math>0 < r < 1 </math>, and for <math> k = 1,2,3,\ldots </math> the random variable <math>X_k</math> has a [[ | * Suppose <math>0 < r < 1 </math>, and for <math> k = 1,2,3,\ldots </math> the random variable <math>X_k</math> has a [[Poisson distribution|Poisson distribution]] with expected value <math>r^k</math>. Then | ||
<math display="block">\sum_{k=1}^\infty k\,X_k</math> | <math display="block">\sum_{k=1}^\infty k\,X_k</math> | ||
Line 145: | Line 137: | ||
has a geometric distribution taking values in the set {0, 1, 2, ...}, with expected value <math>r/(1-r)</math>. | has a geometric distribution taking values in the set {0, 1, 2, ...}, with expected value <math>r/(1-r)</math>. | ||
* The [[ | * The [[guide:269af6cf67#Exponential_Distribution|exponential distribution]] is the continuous analogue of the geometric distribution. If <math>X</math> is an exponentially distributed random variable with parameter <math>\lambda</math>, then | ||
<math display="block">Y = \lfloor X \rfloor,</math> | <math display="block">Y = \lfloor X \rfloor,</math> | ||
where <math>\lfloor \quad \rfloor</math> is the [[ | where <math>\lfloor \quad \rfloor</math> is the [[Floor and ceiling functions|floor]] (or greatest integer) function, is a geometrically distributed random variable with parameter <math>p = 1- e^{-\lambda} </math> (thus <math>\lambda = - \ln(1-p) </math><ref>http://www.wolframalpha.com/input/?i=inverse+p+%3D+1+-+e^-l</ref>) and taking values in the set {0, 1, 2, ...}. | ||
==Poisson Distribution== | ==Poisson Distribution== | ||
The '''Poisson distribution''' , named after French mathematician [[ | The '''Poisson distribution''' , named after French mathematician [[Siméon Denis Poisson|Siméon Denis Poisson]], is a [[guide:82d603b116#Discrete_probability_distribution|discrete probability distribution]] that expresses the probability of a given number of events occurring in a fixed interval of time and/or space if these events occur with a known average rate and [[guide:Af39987afc|independently]] of the time since the last event.<ref name=haight>{{cite book|author=Frank A. Haight|title=Handbook of the Poisson Distribution|publisher=John Wiley & Sons|location=New York|year=1967|ref=harv}}</ref> The Poisson distribution can also be used for the number of events in other specified intervals such as distance, area or volume. Within the context of insurance, the Poisson distribution can be used to model the number (frequency) of claims during a given time period. | ||
=== Definition === | === Definition === | ||
A discrete [[ | A discrete [[guide:1b8642f694|random variable]] <math>X</math> is said to have a Poisson distribution with parameter <math>\lambda > 0</math>, if, for <math>k = 0, 1, \ldots </math>, the [[guide:82d603b116#Probability_Mass_Function|probability mass function]] of <math>X</math> is given by<ref>Probability and Stochastic Processes: A Friendly Introduction for Electrical and Computer Engineers, Roy D. Yates, David Goodman, page 60.</ref> | ||
<math display="block">\!p_k= \operatorname{P}(X = k)= \frac{\lambda^k e^{-\lambda}}{k!}.</math> | <math display="block">\!p_k= \operatorname{P}(X = k)= \frac{\lambda^k e^{-\lambda}}{k!}.</math> | ||
The probability mass function satisfies the following [[ | The probability mass function satisfies the following [[recurrence relation|recurrence relation]]: | ||
<math display="block">\left\{\begin{array}{l} | <math display="block">\left\{\begin{array}{l} | ||
Line 171: | Line 163: | ||
</math> | </math> | ||
=== | === Mean === | ||
*The [[guide:82d603b116|expected value]] and [[guide:E4d753a3b5|variance]] of a Poisson-distributed random variable are both equal to <math>\lambda</math>. | |||
*The [[coefficient of variation|coefficient of variation]] is <math>\textstyle \lambda^{-1/2}</math>, while the [[index of dispersion|index of dispersion]] is 1.<ref name=JKK157/> | |||
*The [[ | *The [[mean absolute deviation|mean absolute deviation]] about the mean is<ref name=JKK157/> | ||
*The [[ | |||
*The [[ | |||
<math display="block">\operatorname{E}|X-\lambda|= 2\exp(-\lambda) \frac{\lambda^{\lfloor\lambda\rfloor + 1}}{ \lfloor\lambda\rfloor!} .</math> | <math display="block">\operatorname{E}|X-\lambda|= 2\exp(-\lambda) \frac{\lambda^{\lfloor\lambda\rfloor + 1}}{ \lfloor\lambda\rfloor!} .</math> | ||
*The [[ | *The [[mode (statistics)|mode]] of a Poisson-distributed random variable with non-integer <math>\lambda</math> is equal to <math>\scriptstyle\lfloor \lambda \rfloor</math>, which is the largest integer less than or equal to <math>\lambda</math>. This is also written as [[floor function|floor]](<math>λ</math>). When <math>λ</math> is a positive integer, the modes are <math>\lambda</math> and <math>\lambda-1</math>. | ||
=== Median === | |||
Bounds for the median (<math>ν</math>) of the distribution are known and are sharp:<ref name=Choi1994>Choi KP (1994) On the medians of Gamma distributions and an equation of Ramanujan. Proc Amer Math Soc 121 (1) 245–251</ref><math display="block"> \lambda - \ln 2 \le \nu < \lambda + \frac{1}{3}. </math> | Bounds for the median (<math>ν</math>) of the distribution are known and are sharp:<ref name=Choi1994>Choi KP (1994) On the medians of Gamma distributions and an equation of Ramanujan. Proc Amer Math Soc 121 (1) 245–251</ref><math display="block"> \lambda - \ln 2 \le \nu < \lambda + \frac{1}{3}. </math> | ||
=== Other properties === | |||
*The Poisson distributions are [[ | *If <math>X_i \sim \operatorname{Pois}(\lambda_i)</math> are [[guide:Af39987afc|independent]] and <math>\lambda=\sum_{i=1}^n \lambda_i</math>, then <math>Y = \left( \sum_{i=1}^n X_i \right) \sim \operatorname{Pois}(\lambda)</math>.<ref>{{cite book|author=E. L. Lehmann|title=Testing Statistical Hypotheses|publisher=Springer Verlag|location=New York|edition=second|year=1986|isbn=0-387-94919-4|ref=harv}} page 65.</ref> A converse is [[Raikov's theorem|Raikov's theorem]], which says that if the sum of two independent random variables is Poisson-distributed, then so is each of those two independent random variables.<ref>Raikov, D. (1937). On the decomposition of Poisson laws. ''Comptes Rendus (Doklady) de l' Academie des Sciences de l'URSS, 14, 9–11. (The proof is also given in {{cite book|author=von Mises, Richard|year=1964|title=Mathematical Theory of Probability and Statistics|publisher=Academic Press|location=New York|ref=harv}})</ref> | ||
*The Poisson distributions are [[Infinite divisibility (probability)|infinitely divisible]] probability distributions.<ref>{{cite book|author1=Laha, R. G. |author2=Rohatgi, V. K. |title=Probability Theory|publisher=John Wiley & Sons|location=New York|isbn=0-471-03262-X|ref=harv|page=233}}</ref><ref name=JKK159/> | |||
* Bounds for the tail probabilities of a Poisson random variable <math> X \sim \operatorname{Pois}(\lambda)</math> can be derived using a [[ | * Bounds for the tail probabilities of a Poisson random variable <math> X \sim \operatorname{Pois}(\lambda)</math> can be derived using a [[Chernoff bound|Chernoff bound]] argument:<ref>{{cite book|author1=Michael Mitzenmacher |author2=Eli Upfal |title=Probability and Computing: Randomized Algorithms and Probabilistic Analysis|publisher=Cambridge University Press|isbn=0521835402|page=97|ref=harv}}</ref> | ||
<math display="block"> | <math display="block"> | ||
Line 214: | Line 195: | ||
==Negative Binomial== | ==Negative Binomial== | ||
The '''negative binomial distribution''' is a discrete probability distribution of the number of failures in a sequence of independent and identically distributed [[ | The '''negative binomial distribution''' is a discrete probability distribution of the number of failures in a sequence of independent and identically distributed [[Bernoulli trial|Bernoulli trial]]s before a specified number of successes (denoted <math>r</math>) occurs. More precisely, suppose there is a sequence of independent Bernoulli trials. Thus, each trial has two potential outcomes called “success” and “failure”. In each trial the probability of success is <math>p</math> and of failure is <math>1-p</math>. We are observing this sequence until a predefined number <math>r</math> of successes has occurred. | ||
===Probability Mass Function === | ===Probability Mass Function === | ||
Line 232: | Line 213: | ||
===Extension to real-valued ''r''=== | ===Extension to real-valued ''r''=== | ||
It is possible to extend the definition of the negative binomial distribution to the case of a positive [[ | It is possible to extend the definition of the negative binomial distribution to the case of a positive [[real number|real]] parameter <math>r</math>. Although it is impossible to visualize a non-integer number of “successes”, we can still formally define the distribution through its probability mass function. | ||
In the spirit of being consistent with the parametrizations found in <ref name="tables">https://www.soa.org/globalassets/assets/Files/Edu/2019/2019-02-exam-stam-tables.pdf</ref>, we consider the alternative parametrization defined implicitly by setting <math>p = 1(1+\beta)</math>. | In the spirit of being consistent with the parametrizations found in <ref name="tables">https://www.soa.org/globalassets/assets/Files/Edu/2019/2019-02-exam-stam-tables.pdf</ref>, we consider the alternative parametrization defined implicitly by setting <math>p = 1(1+\beta)</math>. | ||
Line 239: | Line 220: | ||
f(k; r, \beta) \equiv \operatorname{P}(N = k) = \binom{k+r-1}{k} \frac{\beta^k}{(1 + \beta)^{r + k}} \quad\text{for }k = 0, 1, 2, \dotsc | f(k; r, \beta) \equiv \operatorname{P}(N = k) = \binom{k+r-1}{k} \frac{\beta^k}{(1 + \beta)^{r + k}} \quad\text{for }k = 0, 1, 2, \dotsc | ||
</math> | </math> | ||
Here <math>r</math> is a real, positive number. The binomial coefficient is then defined by the [[ | Here <math>r</math> is a real, positive number. The binomial coefficient is then defined by the [[binomial coefficient#Multiplicative formula|multiplicative formula]] and can also be rewritten using the [[gamma function|gamma function]]:<math display="block"> | ||
\binom{k+r-1}{k} = \frac{(k+r-1)(k+r-2)\dotsm(r)}{k!} = \frac{\Gamma(k+r)}{k!\,\Gamma(r)}. | \binom{k+r-1}{k} = \frac{(k+r-1)(k+r-2)\dotsm(r)}{k!} = \frac{\Gamma(k+r)}{k!\,\Gamma(r)}. | ||
</math> | </math> | ||
To show that the probability mass function adds up to one, we have, by the [[ | To show that the probability mass function adds up to one, we have, by the [[binomial series|binomial series]] | ||
<math display="block"> | <math display="block"> | ||
Line 250: | Line 231: | ||
</math> | </math> | ||
Finally, the following [[ | Finally, the following [[recurrence relation|recurrence relation]] holds: | ||
<math display="block">\begin{array}{l} | <math display="block">\begin{array}{l} | ||
Line 258: | Line 239: | ||
</math> | </math> | ||
== | ==Wikipedia References== | ||
*{{cite web |url= https://en.wikipedia.org/w/index.php?title=Binomial_distribution&oldid=1065569910 |title= Binomial distribution |author = Wikipedia contributors |website= Wikipedia |publisher= Wikipedia |access-date = 28 January 2022 }} | *{{cite web |url= https://en.wikipedia.org/w/index.php?title=Binomial_distribution&oldid=1065569910 |title= Binomial distribution |author = Wikipedia contributors |website= Wikipedia |publisher= Wikipedia |access-date = 28 January 2022 }} | ||
*{{cite web |url= https://en.wikipedia.org/w/index.php?title=Geometric_distribution&oldid=1061164164 |title= Geometric distribution |author = Wikipedia contributors |website= Wikipedia |publisher= Wikipedia |access-date = 28 January 2022 }} | *{{cite web |url= https://en.wikipedia.org/w/index.php?title=Geometric_distribution&oldid=1061164164 |title= Geometric distribution |author = Wikipedia contributors |website= Wikipedia |publisher= Wikipedia |access-date = 28 January 2022 }} | ||
*{{cite web |url= https://en.wikipedia.org/w/index.php?title=Poisson_distribution&oldid=1068368695 |title= Poisson distribution |author = Wikipedia contributors |website= Wikipedia |publisher= Wikipedia |access-date = 28 January 2022 }} | *{{cite web |url= https://en.wikipedia.org/w/index.php?title=Poisson_distribution&oldid=1068368695 |title= Poisson distribution |author = Wikipedia contributors |website= Wikipedia |publisher= Wikipedia |access-date = 28 January 2022 }} | ||
*{{cite web |url = https://en.wikipedia.org/w/index.php?title=Negative_binomial_distribution&oldid=898136399 | title= Negative binomial distribution | author = Wikipedia contributors |website= Wikipedia |publisher= Wikipedia |access-date = 17 February 2022 }} | *{{cite web |url = https://en.wikipedia.org/w/index.php?title=Negative_binomial_distribution&oldid=898136399 | title= Negative binomial distribution | author = Wikipedia contributors |website= Wikipedia |publisher= Wikipedia |access-date = 17 February 2022 }} | ||
==References== | |||
{{Reflist|30em|refs= | |||
<ref name=JKK157>Johnson, N.L., Kotz, S., Kemp, A.W. (1993) ''Univariate Discrete distributions'' (2nd edition). Wiley. ISBN 0-471-54897-9, p157</ref><ref name=JKK159>Johnson, N.L., Kotz, S., Kemp, A.W. (1993) ''Univariate Discrete distributions'' (2nd edition). Wiley. ISBN 0-471-54897-9, p159</ref>}} |
Latest revision as of 22:56, 4 April 2024
Binomial
The binomial distribution with parameters [math]n[/math] and [math]p[/math] is the discrete probability distribution of the number of successes in a sequence of [math]n[/math] independent yes/no experiments, each of which yields success with probability [math]p[/math]. A success/failure experiment is also called a Bernoulli experiment or Bernoulli trial; when [math]n = 1[/math], the binomial distribution is a Bernoulli distribution. The binomial distribution is the basis for the popular binomial test of statistical significance.
The binomial distribution is frequently used to model the number of successes in a sample of size [math]n[/math] drawn with replacement from a population of size [math]N[/math]. If the sampling is carried out without replacement, the draws are not independent and so the resulting distribution is a hypergeometric distribution, not a binomial one. However, for [math]N[/math] much larger than [math]n[/math], the binomial distribution is a good approximation, and widely used.
Specification
Probability mass function
In general, if the random variable [math]X[/math] follows the binomial distribution with parameters [math]n[/math] ∈ ℕ and [math]p[/math] ∈ [0,1], we write [math]X \sim B(n,p)[/math]. The probability of getting exactly [math]k[/math] successes in [math]n[/math] trials is given by the probability mass function:
for [math]k = 0, \ldots, n[/math] where
is the binomial coefficient, hence the name of the distribution. The formula can be understood as follows: we want exactly [math]k[/math] successes ([math]p^k[/math]) and [math]n-k[/math] failures ([math](1-p)^{-(n-k)}[/math]). However, the [math]k[/math] successes can occur anywhere among the [math]n[/math] trials, and there are [math]{n\choose k}[/math] different ways of distributing [math]k[/math] successes in a sequence of [math]n[/math] trials.
In creating reference tables for binomial distribution probability, usually the table is filled in up to [math]n/2[/math] values. This is because for [math]k \gt n/2[/math], the probability can be calculated by its complement as
The probability mass function satisfies the following recurrence relation, for every [math]n,p[/math]:
Looking at the expression [math]f(k,n,p)[/math] as a function of [math]k[/math], there is a [math]k[/math] value that maximizes it. This [math]k[/math] value can be found by calculating
and comparing it to 1. There is always an integer [math]M[/math] that satisfies
[math]f(k,n,p)[/math] is monotone increasing for [math]k \lt M[/math] and monotone decreasing for [math]k \gt M[/math], with the exception of the case where [math](n+1)p[/math] is an integer. In this case, there are two values for which [math]f[/math] is maximal: [math](n+1)p[/math] and [math](n+1)p-1[/math]. [math]M[/math] is the most probable (most likely) outcome of the Bernoulli trials and is called the mode. Note that the probability of it occurring can be fairly small.
Cumulative distribution function
The cumulative distribution function can be expressed as:
where [math]\scriptstyle \lfloor k\rfloor\,[/math] is the "floor" under [math]k[/math], i.e. the greatest integer less than or equal to [math]k[/math].
Mean and Variance
If [math]X \sim B(n,p)[/math], that is, [math]X[/math] is a binomially distributed random variable, [math]n[/math] being the total number of experiments and [math]p[/math] the probability of each experiment yielding a successful result, then the expected value of [math]X[/math] is [math]np[/math] and the variance is [math]npq[/math]. This follows directly from the fact that [math]X[/math] is equal in distribution to the sum of [math]n[/math] independent Bernouilli random variables each having success probability [math]p[/math] (see below).
Mode
Usually the mode of a binomial [math]B(n,p)[/math] distribution is equal to [math]\lfloor (n+1)p\rfloor[/math], where [math]\lfloor\cdot\rfloor[/math] is the floor function. However, when[math](n+1)p[/math] is an integer and [math]p[/math] is neither 0 nor 1, then the distribution has two modes: [math](n+1)p[/math](n + 1)p and [math](n+1)p -1 [/math]. When [math]p[/math] is equal to 0 or 1, the mode will be 0 and [math]n[/math] correspondingly. These cases can be summarized as follows:
Median
In general, there is no single formula to find the median for a binomial distribution, and it may even be non-unique. However several special results have been established:
- If [math]np[/math] is an integer, then the mean, median, and mode coincide and equal [math]np[/math].[1][2]
- Any median [math]m[/math] must lie within the interval ⌊[math]np[/math]⌋ ≤ [math]m[/math] ≤ ⌈[math]np[/math]⌉.[3]
- A median [math]m[/math] cannot lie too far away from the mean: [math]m-np \leq \textrm{min}\{\ln(2),\textrm{max}\{p,1-p\}\}[/math].[4]
- The median is unique and equal to [math]m=[/math]round([math]np[/math]) in cases when either [math]p\leq 1-\ln(2)[/math] or [math]p\geq \ln(2)[/math] or [math]|m-np| \leq \textrm{min}\{p, 1-p\}[/math] (except for the case when [math]p = 1/2 [/math] and [math]n[/math] is odd).[3][4]
- When [math]p=1/2[/math] and [math]n[/math] is odd, any number [math]m[/math] in the interval [math][(n-1)/2,(n+1)/2][/math] is a median of the binomial distribution. If [math]p = 1/2[/math] and [math]n[/math] is even, then [math]m = n/2[/math] is the unique median.
Related distributions
Sums of binomials
If [math]X \sim B(n,p)[/math] and [math]Y \sim B(m, p) [/math] are independent binomial variables with the same probability [math]p[/math], then [math]X+Y [/math] is again a binomial variable: its distribution is [math]Z=X+Y \sim B(n+m, p)[/math].
Bernoulli distribution
The Bernoulli distribution is a special case of the binomial distribution, where [math] n = 1[/math]. Symbolically, [math]X \sim B(1,p)[/math] has the same meaning as [math]X \sim B(p) [/math]. Conversely, any binomial distribution, [math]B(n,p)[/math], is the distribution of the sum of [math]n[/math] Bernoulli trials, [math]B(p)[/math], each with the same probability [math]p[/math].
Normal approximation
If [math]n[/math] is large enough, then the skew of the distribution is not too great. In this case a reasonable approximation to [math]B(n,p)[/math] is given by the normal distribution [math] \mathcal{N}(np,\,np(1-p))[/math], and this basic approximation can be improved in a simple way by using a suitable continuity correction. The basic approximation generally improves as [math]n[/math] increases (at least 20) and is better when [math]p[/math] is not near to 0 or 1.[5] Various heuristics may be used to decide whether [math]n[/math] is large enough, and [math]p[/math] is far enough from the extremes of zero or one:
- One rule is that both [math]x = np [/math] and [math]n-p[/math] must be greater than 5. However, the specific number varies from source to source, and depends on how good an approximation one wants; some sources give 10 which gives virtually the same results as the following rule for large [math]n[/math] until [math]n[/math] is very large.
- A second rule[5] is that for [math]n\gt5[/math] the normal approximation is adequate if
- Another commonly used rule holds that the normal approximation is appropriate only if everything within 3 standard deviations of its mean is within the range of possible values, that is if
The following is an example of applying a continuity correction. Suppose one wishes to calculate [math]\operatorname{P}(X \leq 8) [/math] for a binomial random variable [math]X[/math]. If [math]Y[/math] has a distribution given by the normal approximation, then [math]\operatorname{P}(X \leq 8 )[/math] is approximated by [math]\operatorname{P}(Y \leq 8.5 ) [/math]. The addition of 0.5 is the continuity correction; the uncorrected normal approximation gives considerably less accurate results.
This approximation, known as de Moivre–Laplace theorem, is a huge time-saver when undertaking calculations by hand (exact calculations with large [math]n[/math] are very onerous); historically, it was the first use of the normal distribution, introduced in Abraham de Moivre's book The Doctrine of Chances in 1738. Nowadays, it can be seen as a consequence of the central limit theorem since [math]B(n,p)[/math] is a sum of [math]n[/math] independent, identically distributed Bernoulli variables with parameter [math]p[/math]. This fact is the basis of a hypothesis test, a "proportion z-test", for the value of [math]p[/math] using [math]x/n[/math], the sample proportion and estimator of [math]p[/math], in a common test statistic.[6]
For example, suppose one randomly samples [math]n[/math] people out of a large population and ask them whether they agree with a certain statement. The proportion of people who agree will of course depend on the sample. If groups of [math]n[/math] people were sampled repeatedly and truly randomly, the proportions would follow an approximate normal distribution with mean equal to the true proportion [math]p[/math] of agreement in the population and with standard deviation [math]\sigma = \sqrt{\frac{p(1-p)}{n}}[/math]
Geometric
The geometric distribution is the probability distribution of the number of failures before the first success supported on the set { 0, 1, 2, 3, ... }, i.e, if [math]p[/math] denotes the probability of success on each trial then
Moments
The expected value of the geometrically distributed random variable [math]Y[/math] and its variance is
Related distributions
- The geometric distribution [math]Y[/math] is a special case of the negative binomial distribution, with [math]r = 1 [/math]. More generally, if [math]Y_1,\ldots,Y_r[/math] are independent geometrically distributed variables with parameter [math]p[/math], then the sum
follows a negative binomial distribution with parameters [math]r[/math] and [math]p[/math].[7]
- If [math]Y_1,\ldots,Y_r[/math] are independent geometrically distributed variables (with possibly different success parameters [math]p_m[/math]), then their minimum
is also geometrically distributed, with parameter [math]p = 1-\prod_m(1-p_{m}).[/math]
- Suppose [math]0 \lt r \lt 1 [/math], and for [math] k = 1,2,3,\ldots [/math] the random variable [math]X_k[/math] has a Poisson distribution with expected value [math]r^k[/math]. Then
has a geometric distribution taking values in the set {0, 1, 2, ...}, with expected value [math]r/(1-r)[/math].
- The exponential distribution is the continuous analogue of the geometric distribution. If [math]X[/math] is an exponentially distributed random variable with parameter [math]\lambda[/math], then
where [math]\lfloor \quad \rfloor[/math] is the floor (or greatest integer) function, is a geometrically distributed random variable with parameter [math]p = 1- e^{-\lambda} [/math] (thus [math]\lambda = - \ln(1-p) [/math][8]) and taking values in the set {0, 1, 2, ...}.
Poisson Distribution
The Poisson distribution , named after French mathematician Siméon Denis Poisson, is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time and/or space if these events occur with a known average rate and independently of the time since the last event.[9] The Poisson distribution can also be used for the number of events in other specified intervals such as distance, area or volume. Within the context of insurance, the Poisson distribution can be used to model the number (frequency) of claims during a given time period.
Definition
A discrete random variable [math]X[/math] is said to have a Poisson distribution with parameter [math]\lambda \gt 0[/math], if, for [math]k = 0, 1, \ldots [/math], the probability mass function of [math]X[/math] is given by[10]
The probability mass function satisfies the following recurrence relation:
Mean
- The expected value and variance of a Poisson-distributed random variable are both equal to [math]\lambda[/math].
- The coefficient of variation is [math]\textstyle \lambda^{-1/2}[/math], while the index of dispersion is 1.[11]
- The mean absolute deviation about the mean is[11]
- The mode of a Poisson-distributed random variable with non-integer [math]\lambda[/math] is equal to [math]\scriptstyle\lfloor \lambda \rfloor[/math], which is the largest integer less than or equal to [math]\lambda[/math]. This is also written as floor([math]λ[/math]). When [math]λ[/math] is a positive integer, the modes are [math]\lambda[/math] and [math]\lambda-1[/math].
Median
Bounds for the median ([math]ν[/math]) of the distribution are known and are sharp:[12]
Other properties
- If [math]X_i \sim \operatorname{Pois}(\lambda_i)[/math] are independent and [math]\lambda=\sum_{i=1}^n \lambda_i[/math], then [math]Y = \left( \sum_{i=1}^n X_i \right) \sim \operatorname{Pois}(\lambda)[/math].[13] A converse is Raikov's theorem, which says that if the sum of two independent random variables is Poisson-distributed, then so is each of those two independent random variables.[14]
- The Poisson distributions are infinitely divisible probability distributions.[15][16]
- Bounds for the tail probabilities of a Poisson random variable [math] X \sim \operatorname{Pois}(\lambda)[/math] can be derived using a Chernoff bound argument:[17]
Negative Binomial
The negative binomial distribution is a discrete probability distribution of the number of failures in a sequence of independent and identically distributed Bernoulli trials before a specified number of successes (denoted [math]r[/math]) occurs. More precisely, suppose there is a sequence of independent Bernoulli trials. Thus, each trial has two potential outcomes called “success” and “failure”. In each trial the probability of success is [math]p[/math] and of failure is [math]1-p[/math]. We are observing this sequence until a predefined number [math]r[/math] of successes has occurred.
Probability Mass Function
The probability mass function of the negative binomial distribution is
The binomial coefficient can be written in the following manner, explaining the name “negative binomial”:
To understand the above definition of the probability mass function, note that the probability for every specific sequence of [math]k[/math] failures and [math]r[/math] successes is [math]p^r(1-p)^k[/math], because the outcomes of the [math]k[/math] trials are supposed to happen independently. Since the [math]r[/math]th success comes last, it remains to choose the [math]k[/math] trials with failures out of the remaining [math]r-1[/math] trials. The above binomial coefficient gives precisely the number of all these sequences of length [math]k-1[/math].
Extension to real-valued r
It is possible to extend the definition of the negative binomial distribution to the case of a positive real parameter [math]r[/math]. Although it is impossible to visualize a non-integer number of “successes”, we can still formally define the distribution through its probability mass function.
In the spirit of being consistent with the parametrizations found in [18], we consider the alternative parametrization defined implicitly by setting [math]p = 1(1+\beta)[/math].
As before, we say that [math]N[/math] has a negative binomial (or Pólya) distribution if it has a probability mass function:
Here [math]r[/math] is a real, positive number. The binomial coefficient is then defined by the multiplicative formula and can also be rewritten using the gamma function:
To show that the probability mass function adds up to one, we have, by the binomial series
Finally, the following recurrence relation holds:
Wikipedia References
- Wikipedia contributors. "Binomial distribution". Wikipedia. Wikipedia. Retrieved 28 January 2022.
- Wikipedia contributors. "Geometric distribution". Wikipedia. Wikipedia. Retrieved 28 January 2022.
- Wikipedia contributors. "Poisson distribution". Wikipedia. Wikipedia. Retrieved 28 January 2022.
- Wikipedia contributors. "Negative binomial distribution". Wikipedia. Wikipedia. Retrieved 17 February 2022.
References
- Neumann, P. (1966). "Über den Median der Binomial- and Poissonverteilung" (in German). Wissenschaftliche Zeitschrift der Technischen Universität Dresden 19: 29–33.
- Lord, Nick. (July 2010). "Binomial averages when the mean is an integer", The Mathematical Gazette 94, 331-332.
- 3.0 3.1 "Mean, Median and Mode in Binomial Distributions" (1980). Statistica Neerlandica 34 (1): 13–18. doi: .
- 4.0 4.1 "The smallest uniform upper bound on the distance between the mean and the median of the binomial and Poisson distributions" (1995). Statistics & Probability Letters 23: 21–25. doi: .
- 5.0 5.1 Box, Hunter and Hunter (1978). Statistics for experimenters. Wiley. p. 130.
- NIST/SEMATECH, "7.2.4. Does the proportion of defectives meet requirements?" e-Handbook of Statistical Methods.
- Pitman, Jim. Probability (1993 edition). Springer Publishers. pp 372.
- http://www.wolframalpha.com/input/?i=inverse+p+%3D+1+-+e^-l
- Frank A. Haight (1967). Handbook of the Poisson Distribution. New York: John Wiley & Sons.CS1 maint: ref=harv (link)
- Probability and Stochastic Processes: A Friendly Introduction for Electrical and Computer Engineers, Roy D. Yates, David Goodman, page 60.
- 11.0 11.1 Johnson, N.L., Kotz, S., Kemp, A.W. (1993) Univariate Discrete distributions (2nd edition). Wiley. ISBN 0-471-54897-9, p157
- Choi KP (1994) On the medians of Gamma distributions and an equation of Ramanujan. Proc Amer Math Soc 121 (1) 245–251
- E. L. Lehmann (1986). Testing Statistical Hypotheses (second ed.). New York: Springer Verlag. ISBN 0-387-94919-4.CS1 maint: ref=harv (link) page 65.
- Raikov, D. (1937). On the decomposition of Poisson laws. Comptes Rendus (Doklady) de l' Academie des Sciences de l'URSS, 14, 9–11. (The proof is also given in von Mises, Richard (1964). Mathematical Theory of Probability and Statistics. New York: Academic Press.CS1 maint: ref=harv (link))
- Laha, R. G.; Rohatgi, V. K. Probability Theory. New York: John Wiley & Sons. p. 233. ISBN 0-471-03262-X.CS1 maint: ref=harv (link)
- Johnson, N.L., Kotz, S., Kemp, A.W. (1993) Univariate Discrete distributions (2nd edition). Wiley. ISBN 0-471-54897-9, p159
- Michael Mitzenmacher; Eli Upfal. Probability and Computing: Randomized Algorithms and Probabilistic Analysis. Cambridge University Press. p. 97. ISBN 0521835402.CS1 maint: ref=harv (link)
- https://www.soa.org/globalassets/assets/Files/Edu/2019/2019-02-exam-stam-tables.pdf