guide:B5ab48c211: Difference between revisions

From Stochiki
No edit summary
mNo edit summary
 
(One intermediate revision by the same user not shown)
Line 1: Line 1:
==Binomial==
==Binomial==


The '''binomial distribution''' with parameters <math>n</math> and <math>p</math> is the [[wikipedia:discrete probability distribution|discrete probability distribution]] of the number of successes in a sequence of <math>n</math> [[wikipedia:statistical independence|independent]] yes/no experiments, each of which yields success with [[wikipedia:probability|probability]] <math>p</math>.
The '''binomial distribution''' with parameters <math>n</math> and <math>p</math> is the [[guide:82d603b116#Discrete_probability_distribution|discrete probability distribution]] of the number of successes in a sequence of <math>n</math> [[guide:Af39987afc|independent]] yes/no experiments, each of which yields success with probability <math>p</math>.
A success/failure experiment is also called a Bernoulli experiment or [[wikipedia:Bernoulli trial|Bernoulli trial]];  when <math>n = 1</math>, the binomial distribution  is a [[wikipedia:Bernoulli distribution|Bernoulli distribution]]. The binomial distribution is the basis for the popular [[wikipedia:binomial test|binomial test]] of [[wikipedia:statistical significance|statistical significance]].
A success/failure experiment is also called a Bernoulli experiment or [[Bernoulli trial|Bernoulli trial]];  when <math>n = 1</math>, the binomial distribution  is a [[#Bernoulli_distribution|Bernoulli distribution]]. The binomial distribution is the basis for the popular [[binomial test|binomial test]] of [[statistical significance|statistical significance]].


The binomial distribution is frequently used to model the number of successes in a sample of size <math>n</math> drawn [[wikipedia:Sampling (statistics)#Replacement of selected units|with replacement]] from a population of size <math>N</math>. If the sampling is carried out without replacement, the draws are not independent and so the resulting distribution is a [[wikipedia:hypergeometric distribution|hypergeometric distribution]], not a binomial one.  However, for <math>N</math> much larger than <math>n</math>, the binomial distribution is a good approximation, and widely used.
The binomial distribution is frequently used to model the number of successes in a sample of size <math>n</math> drawn [[Sampling (statistics)#Replacement of selected units|with replacement]] from a population of size <math>N</math>. If the sampling is carried out without replacement, the draws are not independent and so the resulting distribution is a [[hypergeometric distribution|hypergeometric distribution]], not a binomial one.  However, for <math>N</math> much larger than <math>n</math>, the binomial distribution is a good approximation, and widely used.


===Specification===
===Specification===
Line 10: Line 10:
====Probability mass function====
====Probability mass function====


In general, if the random variable <math>X</math> follows the binomial distribution with parameters <math>n</math> ∈ ℕ and <math>p</math> ∈ [0,1], we write <math>X \sim B(n,p)</math>. The probability of getting exactly <math>k</math> successes in <math>n</math> trials is given by the [[wikipedia:probability mass function|probability mass function]]:
In general, if the random variable <math>X</math> follows the binomial distribution with parameters <math>n</math> ∈ ℕ and <math>p</math> ∈ [0,1], we write <math>X \sim B(n,p)</math>. The probability of getting exactly <math>k</math> successes in <math>n</math> trials is given by the [[guide:82d603b116#Probability_Mass_Function|probability mass function]]:


<math display="block"> f(k;n,p) = \operatorname{P}(X = k) = \binom n k  p^k(1-p)^{n-k}</math>
<math display="block"> f(k;n,p) = \operatorname{P}(X = k) = \binom n k  p^k(1-p)^{n-k}</math>
Line 18: Line 18:
<math display="block">\binom n k =\frac{n!}{k!(n-k)!}</math>
<math display="block">\binom n k =\frac{n!}{k!(n-k)!}</math>


is the [[wikipedia:binomial coefficient|binomial coefficient]], hence the name of the distribution. The formula can be understood as follows: we want exactly <math>k</math> successes (<math>p^k</math>) and <math>n-k</math> failures (<math>(1-p)^{-(n-k)}</math>). However, the <math>k</math> successes can occur anywhere among the <math>n</math> trials, and there are <math>{n\choose k}</math> different ways of distributing <math>k</math> successes in a sequence of <math>n</math> trials.
is the [[binomial coefficient|binomial coefficient]], hence the name of the distribution. The formula can be understood as follows: we want exactly <math>k</math> successes (<math>p^k</math>) and <math>n-k</math> failures (<math>(1-p)^{-(n-k)}</math>). However, the <math>k</math> successes can occur anywhere among the <math>n</math> trials, and there are <math>{n\choose k}</math> different ways of distributing <math>k</math> successes in a sequence of <math>n</math> trials.


In creating reference tables for binomial distribution probability, usually the table is filled in up to <math>n/2</math> values. This is because for <math>k \gt n/2</math>, the probability can be calculated by its complement as
In creating reference tables for binomial distribution probability, usually the table is filled in up to <math>n/2</math> values. This is because for <math>k \gt n/2</math>, the probability can be calculated by its complement as
Line 24: Line 24:
<math display="block">f(k,n,p)=f(n-k,n,1-p). </math>
<math display="block">f(k,n,p)=f(n-k,n,1-p). </math>


The probability mass function satisfies the following [[wikipedia:recurrence relation|recurrence relation]], for every <math>n,p</math>:<math display="block">\left\{\begin{array}{l}
The probability mass function satisfies the following [[recurrence relation|recurrence relation]], for every <math>n,p</math>:<math display="block">\left\{\begin{array}{l}
p (n-k) f(k,n,p) = (k+1) (1-p)
p (n-k) f(k,n,p) = (k+1) (1-p)
   f(k+1,n,p), \\[10pt]
   f(k+1,n,p), \\[10pt]
Line 31: Line 31:


Looking at the expression <math>f(k,n,p)</math> as a function of <math>k</math>, there is a <math>k</math> value that maximizes it. This <math>k</math> value can be found by calculating<math display="block"> \frac{f(k+1,n,p)}{f(k,n,p)}=\frac{(n-k)p}{(k+1)(1-p)} </math>
Looking at the expression <math>f(k,n,p)</math> as a function of <math>k</math>, there is a <math>k</math> value that maximizes it. This <math>k</math> value can be found by calculating<math display="block"> \frac{f(k+1,n,p)}{f(k,n,p)}=\frac{(n-k)p}{(k+1)(1-p)} </math>
and comparing it to 1. There is always an integer ''M'' that satisfies<math display="block">(n+1)p-1 \leq M < (n+1)p.</math>
and comparing it to 1. There is always an integer <math>M</math> that satisfies<math display="block">(n+1)p-1 \leq M < (n+1)p.</math>


<math>f(k,n,p)</math> is monotone increasing for <math>k < M</math> and monotone decreasing for <math>k \gt M</math>, with the exception of the case where <math>(n+1)p</math> is an integer. In this case, there are two values for which <math>f</math> is maximal: <math>(n+1)p</math> and <math>(n+1)p-1</math>. <math>M</math> is the ''most probable'' (''most likely'') outcome of the Bernoulli trials and is called the [[wikipedia:Mode (statistics)|mode]]. Note that the probability of it occurring can be fairly small.
<math>f(k,n,p)</math> is monotone increasing for <math>k < M</math> and monotone decreasing for <math>k \gt M</math>, with the exception of the case where <math>(n+1)p</math> is an integer. In this case, there are two values for which <math>f</math> is maximal: <math>(n+1)p</math> and <math>(n+1)p-1</math>. <math>M</math> is the ''most probable'' (''most likely'') outcome of the Bernoulli trials and is called the [[Mode (statistics)|mode]]. Note that the probability of it occurring can be fairly small.


====Cumulative distribution function====
====Cumulative distribution function====


The [[wikipedia:cumulative distribution function|cumulative distribution function]] can be expressed as:<math display="block">F(k;n,p) = \Pr(X \le k) = \sum_{i=0}^{\lfloor k \rfloor} {n\choose i}p^i(1-p)^{n-i}</math>
The [[guide:82d603b116#Cumulative_distribution_function|cumulative distribution function]] can be expressed as:<math display="block">F(k;n,p) = \Pr(X \le k) = \sum_{i=0}^{\lfloor k \rfloor} {n\choose i}p^i(1-p)^{n-i}</math>


where <math>\scriptstyle \lfloor k\rfloor\,</math> is the "floor" under <math>k</math>, i.e. the [[wikipedia:greatest integer|greatest integer]] less than or equal to <math>k</math>.
where <math>\scriptstyle \lfloor k\rfloor\,</math> is the "floor" under <math>k</math>, i.e. the [[greatest integer|greatest integer]] less than or equal to <math>k</math>.


===Mean and Variance===
===Mean and Variance===


If <math>X \sim B(n,p)</math>, that is, <math>X</math> is a binomially distributed random variable, <math>n</math> being the total number of experiments and <math>p</math> the probability of each experiment yielding a successful result, then the [[wikipedia:expected value|expected value]] of <math>X</math> is <math>np</math> and the variance is <math>npq</math>. This follows directly from the fact that <math>X</math> is equal in distribution to the sum of <math>n</math> independent [[wikipedia:Bernouilli|Bernouilli]] random variables each having success probability <math>p</math> (see [[#bernouilli|below]]).  
If <math>X \sim B(n,p)</math>, that is, <math>X</math> is a binomially distributed random variable, <math>n</math> being the total number of experiments and <math>p</math> the probability of each experiment yielding a successful result, then the [[guide:82d603b116|expected value]] of <math>X</math> is <math>np</math> and the variance is <math>npq</math>. This follows directly from the fact that <math>X</math> is equal in distribution to the sum of <math>n</math> independent [[#Bernoulli_distribution|Bernouilli]] random variables each having success probability <math>p</math> (see [[#Bernoulli_distribution|below]]).  


===Mode===
===Mode===


Usually the [[wikipedia:mode (statistics)|mode]] of a binomial <math>B(n,p)</math> distribution is equal to <math>\lfloor (n+1)p\rfloor</math>, where  <math>\lfloor\cdot\rfloor</math> is the [[wikipedia:floor function|floor function]]. However, when<math>(n+1)p</math> is an integer and <math>p</math> is neither 0 nor 1, then the distribution has two modes: <math>(n+1)p</math>(''n''&nbsp;+&nbsp;1)''p'' and <math>(n+1)p -1 </math>. When <math>p</math> is equal to 0 or 1, the mode will be 0 and <math>n</math> correspondingly. These cases can be summarized as follows:
Usually the [[mode (statistics)|mode]] of a binomial <math>B(n,p)</math> distribution is equal to <math>\lfloor (n+1)p\rfloor</math>, where  <math>\lfloor\cdot\rfloor</math> is the [[floor function|floor function]]. However, when<math>(n+1)p</math> is an integer and <math>p</math> is neither 0 nor 1, then the distribution has two modes: <math>(n+1)p</math>(''n''&nbsp;+&nbsp;1)''p'' and <math>(n+1)p -1 </math>. When <math>p</math> is equal to 0 or 1, the mode will be 0 and <math>n</math> correspondingly. These cases can be summarized as follows:


<math display="block">
<math display="block">
Line 55: Line 55:
         n & \text{if }(n+1)p = n + 1.
         n & \text{if }(n+1)p = n + 1.
       \end{cases}</math>
       \end{cases}</math>
<div class="text-right">{{#Proof:View Proof|Mode|binomial/mode}}</div>


===Median===
===Median===


In general, there is no single formula to find the [[wikipedia:median|median]] for a binomial distribution, and it may even be non-unique. However several special results have been established:
In general, there is no single formula to find the [[median|median]] for a binomial distribution, and it may even be non-unique. However several special results have been established:
* If <math>np</math> is an integer, then the mean, median, and mode coincide and equal <math>np</math>.<ref>{{cite journal|last=Neumann|first=P.|year=1966|title=Über den Median der Binomial- and Poissonverteilung|journal=Wissenschaftliche Zeitschrift der Technischen Universität Dresden|volume=19|pages=29–33|language=German}}</ref><ref>Lord, Nick. (July 2010). "Binomial averages when the mean is an integer", [[wikipedia:The Mathematical Gazette|The Mathematical Gazette]] 94, 331-332.</ref>
* If <math>np</math> is an integer, then the mean, median, and mode coincide and equal <math>np</math>.<ref>{{cite journal|last=Neumann|first=P.|year=1966|title=Über den Median der Binomial- and Poissonverteilung|journal=Wissenschaftliche Zeitschrift der Technischen Universität Dresden|volume=19|pages=29–33|language=German}}</ref><ref>Lord, Nick. (July 2010). "Binomial averages when the mean is an integer", [[The Mathematical Gazette|The Mathematical Gazette]] 94, 331-332.</ref>
* Any median <math>m</math> must lie within the interval ⌊<math>np</math>⌋&nbsp;≤&nbsp;<math>m</math>&nbsp;≤&nbsp;⌈<math>np</math>⌉.<ref name="KaasBuhrman">{{cite journal|first1=R.|last1=Kaas|first2=J.M.|last2=Buhrman|title=Mean, Median and Mode in Binomial Distributions|journal=Statistica Neerlandica|year=1980|volume=34|issue=1|pages=13–18|doi=10.1111/j.1467-9574.1980.tb00681.x}}</ref>
* Any median <math>m</math> must lie within the interval ⌊<math>np</math>⌋&nbsp;≤&nbsp;<math>m</math>&nbsp;≤&nbsp;⌈<math>np</math>⌉.<ref name="KaasBuhrman">{{cite journal|first1=R.|last1=Kaas|first2=J.M.|last2=Buhrman|title=Mean, Median and Mode in Binomial Distributions|journal=Statistica Neerlandica|year=1980|volume=34|issue=1|pages=13–18|doi=10.1111/j.1467-9574.1980.tb00681.x}}</ref>
* A median <math>m</math> cannot lie too far away from the mean: <math>m-np \leq \textrm{min}\{\ln(2),\textrm{max}\{p,1-p\}\}</math>.<ref name="Hamza">{{Cite journal
* A median <math>m</math> cannot lie too far away from the mean: <math>m-np \leq \textrm{min}\{\ln(2),\textrm{max}\{p,1-p\}\}</math>.<ref name="Hamza">{{Cite journal
Line 74: Line 72:
| pmc =  
| pmc =  
}}</ref>
}}</ref>
* The median is unique and equal to <math>m=</math>[[wikipedia:Rounding|round]](<math>np</math>) in cases when either <math>p\leq 1-\ln(2)</math> or <math>p\geq \ln(2)</math> or <math>|m-np| \leq \textrm{min}\{p, 1-p\}</math> (except for the case when <math>p = 1/2 </math> and <math>n</math> is odd).<ref name="KaasBuhrman"/><ref name="Hamza"/>
* The median is unique and equal to <math>m=</math>[[Rounding|round]](<math>np</math>) in cases when either <math>p\leq 1-\ln(2)</math> or <math>p\geq \ln(2)</math> or <math>|m-np| \leq \textrm{min}\{p, 1-p\}</math> (except for the case when <math>p = 1/2 </math> and <math>n</math> is odd).<ref name="KaasBuhrman"/><ref name="Hamza"/>
* When <math>p=1/2</math> and <math>n</math> is odd, any number <math>m</math> in the interval <math>[(n-1)/2,(n+1)/2]</math> is a median of the binomial distribution. If <math>p = 1/2</math> and <math>n</math> is even, then <math>m = n/2</math> is the unique median.
* When <math>p=1/2</math> and <math>n</math> is odd, any number <math>m</math> in the interval <math>[(n-1)/2,(n+1)/2]</math> is a median of the binomial distribution. If <math>p = 1/2</math> and <math>n</math> is even, then <math>m = n/2</math> is the unique median.


Line 85: Line 83:
====Bernoulli distribution <span id="bernouilli"></span>====
====Bernoulli distribution <span id="bernouilli"></span>====


The [[wikipedia:Bernoulli distribution|Bernoulli distribution]] is a special case of the binomial distribution, where <math> n = 1</math>. Symbolically, <math>X \sim B(1,p)</math> has the same meaning as <math>X \sim B(p) </math>. Conversely, any binomial distribution, <math>B(n,p)</math>, is the distribution of the sum of <math>n</math> [[wikipedia:Bernoulli trials|Bernoulli trials]], <math>B(p)</math>, each with the same probability <math>p</math>.
The [[Bernoulli distribution|Bernoulli distribution]] is a special case of the binomial distribution, where <math> n = 1</math>. Symbolically, <math>X \sim B(1,p)</math> has the same meaning as <math>X \sim B(p) </math>. Conversely, any binomial distribution, <math>B(n,p)</math>, is the distribution of the sum of <math>n</math> [[Bernoulli trials|Bernoulli trials]], <math>B(p)</math>, each with the same probability <math>p</math>.


====Normal approximation====
====Normal approximation====


If <math>n</math> is large enough, then the skew of the distribution is not too great. In this case a reasonable approximation to <math>B(n,p)</math> is given by the [[wikipedia:normal distribution|normal distribution]] <math> \mathcal{N}(np,\,np(1-p))</math>, and this basic approximation can be improved in a simple way by using a suitable [[wikipedia:continuity correction|continuity correction]]. The basic approximation generally improves as <math>n</math> increases (at least 20) and is better when <math>p</math> is not near to 0 or 1.<ref name="bhh">{{cite book|title=Statistics for experimenters|author=Box, Hunter and Hunter|publisher=Wiley|year=1978|page=130}}</ref> Various heuristics may be used to decide whether <math>n</math> is large enough, and <math>p</math> is far enough from the extremes of zero or one:
If <math>n</math> is large enough, then the skew of the distribution is not too great. In this case a reasonable approximation to <math>B(n,p)</math> is given by the [[normal distribution|normal distribution]] <math> \mathcal{N}(np,\,np(1-p))</math>, and this basic approximation can be improved in a simple way by using a suitable [[continuity correction|continuity correction]]. The basic approximation generally improves as <math>n</math> increases (at least 20) and is better when <math>p</math> is not near to 0 or 1.<ref name="bhh">{{cite book|title=Statistics for experimenters|author=Box, Hunter and Hunter|publisher=Wiley|year=1978|page=130}}</ref> Various heuristics may be used to decide whether <math>n</math> is large enough, and <math>p</math> is far enough from the extremes of zero or one:


*One rule is that both <math>x = np </math> and <math>n-p</math> must be greater than&nbsp;5. However, the specific number varies from source to source, and depends on how good an approximation one wants; some sources give 10 which gives virtually the same results as the following rule for large <math>n</math> until <math>n</math> is very large.
*One rule is that both <math>x = np </math> and <math>n-p</math> must be greater than&nbsp;5. However, the specific number varies from source to source, and depends on how good an approximation one wants; some sources give 10 which gives virtually the same results as the following rule for large <math>n</math> until <math>n</math> is very large.
Line 100: Line 98:
<math display="block">\mu \pm 3 \sigma = np \pm 3 \sqrt{np(1-p)} \in [0,n].</math>
<math display="block">\mu \pm 3 \sigma = np \pm 3 \sqrt{np(1-p)} \in [0,n].</math>


The following is an example of applying a [[wikipedia:continuity correction|continuity correction]]. Suppose one wishes to calculate <math>\operatorname{P}(X \leq 8) </math> for a binomial random variable <math>X</math>. If <math>Y</math> has a distribution given by the normal approximation, then <math>\operatorname{P}(X \leq 8 )</math> is approximated by <math>\operatorname{P}(Y \leq 8.5 ) </math>. The addition of 0.5 is the continuity correction; the uncorrected normal approximation gives considerably less accurate results.
The following is an example of applying a [[continuity correction|continuity correction]]. Suppose one wishes to calculate <math>\operatorname{P}(X \leq 8) </math> for a binomial random variable <math>X</math>. If <math>Y</math> has a distribution given by the normal approximation, then <math>\operatorname{P}(X \leq 8 )</math> is approximated by <math>\operatorname{P}(Y \leq 8.5 ) </math>. The addition of 0.5 is the continuity correction; the uncorrected normal approximation gives considerably less accurate results.


This approximation, known as [[wikipedia:de Moivre–Laplace theorem|de Moivre–Laplace theorem]], is a huge time-saver when undertaking calculations by hand (exact calculations with large <math>n</math> are very onerous); historically, it was the first use of the normal distribution, introduced in [[wikipedia:Abraham de Moivre|Abraham de Moivre]]'s book ''[[wikipedia:The Doctrine of Chances|The Doctrine of Chances]]'' in 1738. Nowadays, it can be seen as a consequence of the [[wikipedia:central limit theorem|central limit theorem]] since <math>B(n,p)</math> is a sum of <math>n</math> independent, identically distributed [[wikipedia:Bernoulli distribution|Bernoulli variables]] with parameter <math>p</math>. This fact is the basis of a [[wikipedia:hypothesis test|hypothesis test]], a "proportion z-test", for the value of <math>p</math> using <math>x/n</math>, the sample proportion and estimator of <math>p</math>, in a [[wikipedia:common test statistics|common test statistic]].<ref>[[wikipedia:NIST|NIST]]/[[wikipedia:SEMATECH|SEMATECH]], [http://www.itl.nist.gov/div898/handbook/prc/section2/prc24.htm "7.2.4. Does the proportion of defectives meet requirements?"] ''e-Handbook of Statistical Methods.''</ref>
This approximation, known as [[de Moivre–Laplace theorem|de Moivre–Laplace theorem]], is a huge time-saver when undertaking calculations by hand (exact calculations with large <math>n</math> are very onerous); historically, it was the first use of the normal distribution, introduced in [[Abraham de Moivre|Abraham de Moivre]]'s book ''[[The Doctrine of Chances|The Doctrine of Chances]]'' in 1738. Nowadays, it can be seen as a consequence of the [[central limit theorem|central limit theorem]] since <math>B(n,p)</math> is a sum of <math>n</math> independent, identically distributed [[Bernoulli distribution|Bernoulli variables]] with parameter <math>p</math>. This fact is the basis of a [[hypothesis test|hypothesis test]], a "proportion z-test", for the value of <math>p</math> using <math>x/n</math>, the sample proportion and estimator of <math>p</math>, in a [[common test statistics|common test statistic]].<ref>[[NIST|NIST]]/[[SEMATECH|SEMATECH]], [http://www.itl.nist.gov/div898/handbook/prc/section2/prc24.htm "7.2.4. Does the proportion of defectives meet requirements?"] ''e-Handbook of Statistical Methods.''</ref>


For example, suppose one randomly samples <math>n</math> people out of a large population and ask them whether they agree with a certain statement. The proportion of people who agree will of course depend on the sample. If groups of <math>n</math> people were sampled repeatedly and truly randomly, the proportions would follow an approximate normal distribution with mean equal to the true proportion <math>p</math> of agreement in the population and with standard deviation <math>\sigma = \sqrt{\frac{p(1-p)}{n}}</math>
For example, suppose one randomly samples <math>n</math> people out of a large population and ask them whether they agree with a certain statement. The proportion of people who agree will of course depend on the sample. If groups of <math>n</math> people were sampled repeatedly and truly randomly, the proportions would follow an approximate normal distribution with mean equal to the true proportion <math>p</math> of agreement in the population and with standard deviation <math>\sigma = \sqrt{\frac{p(1-p)}{n}}</math>
Line 111: Line 109:


<math display="block">\operatorname{P}(Y=k) = (1 - p)^k\,p\,.</math>
<math display="block">\operatorname{P}(Y=k) = (1 - p)^k\,p\,.</math>
To retain consistency with the notation found in <ref>https://www.soa.org/files/edu/edu-exam-c-tables-cont-dist.pdf</ref>, we set <math>\beta = p/(1-p)</math> and obtain:
<math display="block">\begin{equation}\label{geometric}\operatorname{P}(Y=k) = \frac{\beta^k}{(1+\beta)^{k+1}}.\end{equation}</math>
Going forward, we assume that a geometric distribution is characterized by \ref{geometric} and depends solely on <math>\beta</math> which turns out to be the mean of the distribution.


===Moments ===
===Moments ===


The expected value of the geometrically distributed random variable <math>Y</math> is <math>\beta</math> and its variance is <math>\beta(\beta +1)</math>:
The expected value of the geometrically distributed random variable <math>Y</math> and its variance is


<math display="block">
<math display="block">
\operatorname{E}(Y) = \beta, \qquad\operatorname{Var}(Y) = \beta(1 + \beta).
\operatorname{E}(Y) = \frac{p}{1-p}, \qquad\operatorname{Var}(Y) = \frac{p}{(1-p)^2}.
</math>
</math>


===Related distributions===
===Related distributions===


* The geometric distribution <math>Y</math> is a special case of the [[wikipedia:negative binomial distribution|negative binomial distribution]], with <math>r = 1 </math>. More generally, if <math>Y_1,\ldots,Y_r</math> are [[wikipedia:statistical independence|independent]] geometrically distributed variables with parameter <math>p</math>, then the sum
* The geometric distribution <math>Y</math> is a special case of the [[#Negative_Binomial|negative binomial distribution]], with <math>r = 1 </math>. More generally, if <math>Y_1,\ldots,Y_r</math> are [[guide:Af39987afc|independent]] geometrically distributed variables with parameter <math>p</math>, then the sum


<math display="block">Z = \sum_{m=1}^r Y_m</math>
<math display="block">Z = \sum_{m=1}^r Y_m</math>
Line 139: Line 131:


is also geometrically distributed, with parameter <math>p = 1-\prod_m(1-p_{m}).</math>
is also geometrically distributed, with parameter <math>p = 1-\prod_m(1-p_{m}).</math>
* Suppose <math>0 < r < 1 </math>, and for <math> k = 1,2,3,\ldots </math> the random variable <math>X_k</math> has a [[wikipedia:Poisson distribution|Poisson distribution]] with expected value <math>r^k</math>.  Then
* Suppose <math>0 < r < 1 </math>, and for <math> k = 1,2,3,\ldots </math> the random variable <math>X_k</math> has a [[Poisson distribution|Poisson distribution]] with expected value <math>r^k</math>.  Then


<math display="block">\sum_{k=1}^\infty k\,X_k</math>
<math display="block">\sum_{k=1}^\infty k\,X_k</math>
Line 145: Line 137:
has a geometric distribution taking values in the set {0,&nbsp;1,&nbsp;2,&nbsp;...}, with expected value <math>r/(1-r)</math>.
has a geometric distribution taking values in the set {0,&nbsp;1,&nbsp;2,&nbsp;...}, with expected value <math>r/(1-r)</math>.


* The [[wikipedia:exponential distribution|exponential distribution]] is the continuous analogue of the geometric distribution.  If <math>X</math> is an exponentially distributed random variable with parameter <math>\lambda</math>, then
* The [[guide:269af6cf67#Exponential_Distribution|exponential distribution]] is the continuous analogue of the geometric distribution.  If <math>X</math> is an exponentially distributed random variable with parameter <math>\lambda</math>, then


<math display="block">Y = \lfloor X \rfloor,</math>
<math display="block">Y = \lfloor X \rfloor,</math>


where <math>\lfloor \quad \rfloor</math> is the [[wikipedia:Floor and ceiling functions|floor]] (or greatest integer) function, is a geometrically distributed random variable with parameter <math>p = 1- e^{-\lambda} </math> (thus <math>\lambda = - \ln(1-p) </math><ref>http://www.wolframalpha.com/input/?i=inverse+p+%3D+1+-+e^-l</ref>) and taking values in the set&nbsp;{0,&nbsp;1,&nbsp;2,&nbsp;...}.
where <math>\lfloor \quad \rfloor</math> is the [[Floor and ceiling functions|floor]] (or greatest integer) function, is a geometrically distributed random variable with parameter <math>p = 1- e^{-\lambda} </math> (thus <math>\lambda = - \ln(1-p) </math><ref>http://www.wolframalpha.com/input/?i=inverse+p+%3D+1+-+e^-l</ref>) and taking values in the set&nbsp;{0,&nbsp;1,&nbsp;2,&nbsp;...}.


==Poisson Distribution==
==Poisson Distribution==


The '''Poisson distribution''' , named after French mathematician [[wikipedia:Siméon Denis Poisson|Siméon Denis Poisson]], is a [[wikipedia:discrete probability distribution|discrete probability distribution]] that expresses the probability of a given number of events occurring in a fixed interval of time and/or space if these events occur with a known average rate and [[wikipedia:Statistical independence|independently]] of the time since the last event.<ref name=haight>{{cite book|author=Frank A. Haight|title=Handbook of the Poisson Distribution|publisher=John Wiley & Sons|location=New York|year=1967|ref=harv}}</ref> The Poisson distribution can also be used for the number of events in other specified intervals such as distance, area or volume. Within the context of insurance, the Poisson distribution can be used to model the  number (frequency) of claims during a given time period.  
The '''Poisson distribution''' , named after French mathematician [[Siméon Denis Poisson|Siméon Denis Poisson]], is a [[guide:82d603b116#Discrete_probability_distribution|discrete probability distribution]] that expresses the probability of a given number of events occurring in a fixed interval of time and/or space if these events occur with a known average rate and [[guide:Af39987afc|independently]] of the time since the last event.<ref name=haight>{{cite book|author=Frank A. Haight|title=Handbook of the Poisson Distribution|publisher=John Wiley & Sons|location=New York|year=1967|ref=harv}}</ref> The Poisson distribution can also be used for the number of events in other specified intervals such as distance, area or volume. Within the context of insurance, the Poisson distribution can be used to model the  number (frequency) of claims during a given time period.  


=== Definition ===
=== Definition ===
A discrete [[wikipedia:random variable|random variable]]  <math>X</math>  is said to have a Poisson distribution with parameter <math>\lambda > 0</math>, if, for <math>k = 0, 1, \ldots </math>, the [[wikipedia:probability mass function|probability mass function]] of <math>X</math> is given by<ref>Probability and Stochastic Processes: A Friendly Introduction for Electrical and Computer Engineers, Roy D. Yates, David Goodman, page 60.</ref>
A discrete [[guide:1b8642f694|random variable]]  <math>X</math>  is said to have a Poisson distribution with parameter <math>\lambda > 0</math>, if, for <math>k = 0, 1, \ldots </math>, the [[guide:82d603b116#Probability_Mass_Function|probability mass function]] of <math>X</math> is given by<ref>Probability and Stochastic Processes: A Friendly Introduction for Electrical and Computer Engineers, Roy D. Yates, David Goodman, page 60.</ref>


<math display="block">\!p_k= \operatorname{P}(X = k)= \frac{\lambda^k e^{-\lambda}}{k!}.</math>
<math display="block">\!p_k= \operatorname{P}(X = k)= \frac{\lambda^k e^{-\lambda}}{k!}.</math>


The probability mass function satisfies the following [[wikipedia:recurrence relation|recurrence relation]]:
The probability mass function satisfies the following [[recurrence relation|recurrence relation]]:


<math display="block">\left\{\begin{array}{l}
<math display="block">\left\{\begin{array}{l}
Line 171: Line 163:
</math>
</math>


=== Properties ===
=== Mean ===


==== Mean ====
*The [[guide:82d603b116|expected value]] and [[guide:E4d753a3b5|variance]] of a Poisson-distributed random variable are both equal to <math>\lambda</math>.
 
*The [[coefficient of variation|coefficient of variation]] is <math>\textstyle \lambda^{-1/2}</math>, while the [[index of dispersion|index of dispersion]] is 1.<ref name=JKK157/>
*The [[wikipedia:expected value|expected value]] and [[wikipedia:variance|variance]] of a Poisson-distributed random variable are both equal to <math>\lambda</math>.
*The [[mean absolute deviation|mean absolute deviation]] about the mean is<ref name=JKK157/>
*The [[wikipedia:coefficient of variation|coefficient of variation]] is <math>\textstyle \lambda^{-1/2}</math>, while the [[wikipedia:index of dispersion|index of dispersion]] is 1.<ref name=JKK157/>
*The [[wikipedia:mean absolute deviation|mean absolute deviation]] about the mean is<ref name=JKK157/>
<math display="block">\operatorname{E}|X-\lambda|= 2\exp(-\lambda) \frac{\lambda^{\lfloor\lambda\rfloor + 1}}{ \lfloor\lambda\rfloor!} .</math>
<math display="block">\operatorname{E}|X-\lambda|= 2\exp(-\lambda) \frac{\lambda^{\lfloor\lambda\rfloor + 1}}{ \lfloor\lambda\rfloor!} .</math>
*The [[wikipedia:mode (statistics)|mode]] of a Poisson-distributed random variable with non-integer <math>\lambda</math> is equal to <math>\scriptstyle\lfloor \lambda \rfloor</math>, which is the largest integer less than or equal to&nbsp;<math>\lambda</math>. This is also written as [[wikipedia:floor function|floor]](<math>λ</math>). When <math>λ</math> is a positive integer, the modes are <math>\lambda</math> and <math>\lambda-1</math>.
*The [[mode (statistics)|mode]] of a Poisson-distributed random variable with non-integer <math>\lambda</math> is equal to <math>\scriptstyle\lfloor \lambda \rfloor</math>, which is the largest integer less than or equal to&nbsp;<math>\lambda</math>. This is also written as [[floor function|floor]](<math>λ</math>). When <math>λ</math> is a positive integer, the modes are <math>\lambda</math> and <math>\lambda-1</math>.


==== Median ====
=== Median ===


Bounds for the median (<math>ν</math>) of the distribution are known and are sharp:<ref name=Choi1994>Choi KP (1994) On the medians of Gamma distributions and an equation of Ramanujan. Proc Amer Math Soc 121 (1) 245–251</ref><math display="block"> \lambda - \ln 2 \le \nu < \lambda + \frac{1}{3}. </math>
Bounds for the median (<math>ν</math>) of the distribution are known and are sharp:<ref name=Choi1994>Choi KP (1994) On the medians of Gamma distributions and an equation of Ramanujan. Proc Amer Math Soc 121 (1) 245–251</ref><math display="block"> \lambda - \ln 2 \le \nu < \lambda + \frac{1}{3}. </math>


==== Higher moments ====
=== Other properties ===
 
* The higher [[wikipedia:moment (mathematics)|moments]] <math>m_k</math> of the Poisson distribution about the origin are [[wikipedia:Touchard polynomials|Touchard polynomials]] in <math>\lambda</math>:
 
<math display="block"> m_k = \sum_{i=1}^k \lambda^i \left\{\begin{matrix} k \\ i \end{matrix}\right\},</math>
 
where the {braces} denote [[wikipedia:Stirling numbers of the second kind|Stirling numbers of the second kind]].<ref>{{cite journal|author=Riordan, John|year=1937|journal=Annals of Mathematical Statistics|title=Moment recurrence relations for binomial, Poisson and hypergeometric frequency distributions|volume=8|pages=103–111|ref=harv|doi=10.1214/aoms/1177732430}} Also see Haight (1967), p. 6.</ref> The coefficients of the polynomials have a [[wikipedia:combinatorics|combinatorial]] meaning. In fact, when the expected value of the Poisson distribution is 1, then [[wikipedia:Dobinski's formula|Dobinski's formula]] says that the <math>n</math><sup>th</sup> moment equals the number of [[wikipedia:partition of a set|partitions of a set]] of size <math>n</math>.
 
*If <math>X_i \sim \operatorname{Pois}(\lambda_i)</math> are [[wikipedia:statistical independence|independent]] and <math>\lambda=\sum_{i=1}^n \lambda_i</math>, then <math>Y = \left( \sum_{i=1}^n X_i \right) \sim \operatorname{Pois}(\lambda)</math>.<ref>{{cite book|author=E. L. Lehmann|title=Testing Statistical Hypotheses|publisher=Springer Verlag|location=New York|edition=second|year=1986|isbn=0-387-94919-4|ref=harv}} page 65.</ref>  A converse is [[wikipedia:Raikov's theorem|Raikov's theorem]], which says that if the sum of two independent random variables is Poisson-distributed, then so is each of those two independent random variables.<ref>Raikov, D. (1937). On the decomposition of Poisson laws. ''Comptes Rendus (Doklady) de l' Academie des Sciences de l'URSS, 14, 9–11. (The proof is also given in {{cite book|author=von Mises, Richard|year=1964|title=Mathematical Theory of Probability and Statistics|publisher=Academic Press|location=New York|ref=harv}})</ref>
 
==== Other properties ====


*The Poisson distributions are [[wikipedia:Infinite divisibility (probability)|infinitely divisible]] probability distributions.<ref>{{cite book|author1=Laha, R. G.  |author2=Rohatgi, V. K.  |title=Probability Theory|publisher=John Wiley & Sons|location=New York|isbn=0-471-03262-X|ref=harv|page=233}}</ref><ref name=JKK159/>
*If <math>X_i \sim \operatorname{Pois}(\lambda_i)</math> are [[guide:Af39987afc|independent]] and <math>\lambda=\sum_{i=1}^n \lambda_i</math>, then <math>Y = \left( \sum_{i=1}^n X_i \right) \sim \operatorname{Pois}(\lambda)</math>.<ref>{{cite book|author=E. L. Lehmann|title=Testing Statistical Hypotheses|publisher=Springer Verlag|location=New York|edition=second|year=1986|isbn=0-387-94919-4|ref=harv}} page 65.</ref>  A converse is [[Raikov's theorem|Raikov's theorem]], which says that if the sum of two independent random variables is Poisson-distributed, then so is each of those two independent random variables.<ref>Raikov, D. (1937). On the decomposition of Poisson laws. ''Comptes Rendus (Doklady) de l' Academie des Sciences de l'URSS, 14, 9–11. (The proof is also given in {{cite book|author=von Mises, Richard|year=1964|title=Mathematical Theory of Probability and Statistics|publisher=Academic Press|location=New York|ref=harv}})</ref>
*The Poisson distributions are [[Infinite divisibility (probability)|infinitely divisible]] probability distributions.<ref>{{cite book|author1=Laha, R. G.  |author2=Rohatgi, V. K.  |title=Probability Theory|publisher=John Wiley & Sons|location=New York|isbn=0-471-03262-X|ref=harv|page=233}}</ref><ref name=JKK159/>


* Bounds for the tail probabilities of a Poisson random variable <math> X \sim \operatorname{Pois}(\lambda)</math> can be derived using a [[wikipedia:Chernoff bound|Chernoff bound]] argument:<ref>{{cite book|author1=Michael Mitzenmacher  |author2=Eli Upfal |title=Probability and Computing: Randomized Algorithms and Probabilistic Analysis|publisher=Cambridge University Press|isbn=0521835402|page=97|ref=harv}}</ref>
* Bounds for the tail probabilities of a Poisson random variable <math> X \sim \operatorname{Pois}(\lambda)</math> can be derived using a [[Chernoff bound|Chernoff bound]] argument:<ref>{{cite book|author1=Michael Mitzenmacher  |author2=Eli Upfal |title=Probability and Computing: Randomized Algorithms and Probabilistic Analysis|publisher=Cambridge University Press|isbn=0521835402|page=97|ref=harv}}</ref>


<math display="block">
<math display="block">
Line 214: Line 195:
==Negative Binomial==
==Negative Binomial==


The '''negative binomial distribution''' is a discrete probability distribution of the number of failures in a sequence of independent and identically distributed [[wikipedia:Bernoulli trial|Bernoulli trial]]s before a specified number of successes (denoted <math>r</math>) occurs. More precisely, suppose there is a sequence of independent Bernoulli trials. Thus, each trial has two potential outcomes called “success” and “failure”. In each trial the probability of success is <math>p</math> and of failure is <math>1-p</math>. We are observing this sequence until a predefined number <math>r</math> of successes has occurred.
The '''negative binomial distribution''' is a discrete probability distribution of the number of failures in a sequence of independent and identically distributed [[Bernoulli trial|Bernoulli trial]]s before a specified number of successes (denoted <math>r</math>) occurs. More precisely, suppose there is a sequence of independent Bernoulli trials. Thus, each trial has two potential outcomes called “success” and “failure”. In each trial the probability of success is <math>p</math> and of failure is <math>1-p</math>. We are observing this sequence until a predefined number <math>r</math> of successes has occurred.


===Probability Mass Function ===
===Probability Mass Function ===
Line 232: Line 213:
===Extension to real-valued ''r''===
===Extension to real-valued ''r''===


It is possible to extend the definition of the negative binomial distribution to the case of a positive [[wikipedia:real number|real]] parameter <math>r</math>. Although it is impossible to visualize a non-integer number of “successes”, we can still formally define the distribution through its probability mass function.  
It is possible to extend the definition of the negative binomial distribution to the case of a positive [[real number|real]] parameter <math>r</math>. Although it is impossible to visualize a non-integer number of “successes”, we can still formally define the distribution through its probability mass function.  


In the spirit of being consistent with the parametrizations found in <ref name="tables">https://www.soa.org/globalassets/assets/Files/Edu/2019/2019-02-exam-stam-tables.pdf</ref>, we consider the alternative parametrization defined implicitly by setting <math>p =  1(1+\beta)</math>.   
In the spirit of being consistent with the parametrizations found in <ref name="tables">https://www.soa.org/globalassets/assets/Files/Edu/2019/2019-02-exam-stam-tables.pdf</ref>, we consider the alternative parametrization defined implicitly by setting <math>p =  1(1+\beta)</math>.   
Line 239: Line 220:
     f(k; r, \beta) \equiv \operatorname{P}(N = k) = \binom{k+r-1}{k} \frac{\beta^k}{(1 + \beta)^{r + k}} \quad\text{for }k = 0, 1, 2, \dotsc
     f(k; r, \beta) \equiv \operatorname{P}(N = k) = \binom{k+r-1}{k} \frac{\beta^k}{(1 + \beta)^{r + k}} \quad\text{for }k = 0, 1, 2, \dotsc
   </math>
   </math>
Here <math>r</math> is a real, positive number. The binomial coefficient is then defined by the [[wikipedia:binomial coefficient#Multiplicative formula|multiplicative formula]] and can also be rewritten using the [[wikipedia:gamma function|gamma function]]:<math display="block">
Here <math>r</math> is a real, positive number. The binomial coefficient is then defined by the [[binomial coefficient#Multiplicative formula|multiplicative formula]] and can also be rewritten using the [[gamma function|gamma function]]:<math display="block">
   \binom{k+r-1}{k} = \frac{(k+r-1)(k+r-2)\dotsm(r)}{k!} = \frac{\Gamma(k+r)}{k!\,\Gamma(r)}.
   \binom{k+r-1}{k} = \frac{(k+r-1)(k+r-2)\dotsm(r)}{k!} = \frac{\Gamma(k+r)}{k!\,\Gamma(r)}.
   </math>
   </math>


To show that the probability mass function adds up to one, we have, by the [[wikipedia:binomial series|binomial series]]
To show that the probability mass function adds up to one, we have, by the [[binomial series|binomial series]]


<math display="block">
<math display="block">
Line 250: Line 231:
</math>
</math>


Finally, the following [[wikipedia:recurrence relation|recurrence relation]] holds:
Finally, the following [[recurrence relation|recurrence relation]] holds:


<math display="block">\begin{array}{l}
<math display="block">\begin{array}{l}
Line 258: Line 239:
</math>
</math>


==Notes==
==Wikipedia References==
{{Reflist|30em|refs=
<ref name=JKK157>Johnson, N.L., Kotz, S., Kemp, A.W. (1993) ''Univariate Discrete distributions'' (2nd edition). Wiley. ISBN 0-471-54897-9, p157</ref><ref name=JKK159>Johnson, N.L., Kotz, S., Kemp, A.W. (1993) ''Univariate Discrete distributions'' (2nd edition). Wiley. ISBN 0-471-54897-9, p159</ref>}}
 
==References==
*{{cite web |url= https://en.wikipedia.org/w/index.php?title=Binomial_distribution&oldid=1065569910 |title= Binomial distribution |author = Wikipedia contributors |website= Wikipedia |publisher= Wikipedia |access-date = 28 January 2022 }}
*{{cite web |url= https://en.wikipedia.org/w/index.php?title=Binomial_distribution&oldid=1065569910 |title= Binomial distribution |author = Wikipedia contributors |website= Wikipedia |publisher= Wikipedia |access-date = 28 January 2022 }}
*{{cite web |url= https://en.wikipedia.org/w/index.php?title=Geometric_distribution&oldid=1061164164 |title= Geometric distribution |author = Wikipedia contributors |website= Wikipedia |publisher= Wikipedia |access-date = 28 January 2022 }}
*{{cite web |url= https://en.wikipedia.org/w/index.php?title=Geometric_distribution&oldid=1061164164 |title= Geometric distribution |author = Wikipedia contributors |website= Wikipedia |publisher= Wikipedia |access-date = 28 January 2022 }}
*{{cite web |url= https://en.wikipedia.org/w/index.php?title=Poisson_distribution&oldid=1068368695 |title= Poisson distribution |author = Wikipedia contributors |website= Wikipedia |publisher= Wikipedia |access-date = 28 January 2022 }}
*{{cite web |url= https://en.wikipedia.org/w/index.php?title=Poisson_distribution&oldid=1068368695 |title= Poisson distribution |author = Wikipedia contributors |website= Wikipedia |publisher= Wikipedia |access-date = 28 January 2022 }}
*{{cite web |url = https://en.wikipedia.org/w/index.php?title=Negative_binomial_distribution&oldid=898136399  | title= Negative binomial distribution | author = Wikipedia contributors |website= Wikipedia |publisher= Wikipedia |access-date = 17 February 2022 }}
*{{cite web |url = https://en.wikipedia.org/w/index.php?title=Negative_binomial_distribution&oldid=898136399  | title= Negative binomial distribution | author = Wikipedia contributors |website= Wikipedia |publisher= Wikipedia |access-date = 17 February 2022 }}
==References==
{{Reflist|30em|refs=
<ref name=JKK157>Johnson, N.L., Kotz, S., Kemp, A.W. (1993) ''Univariate Discrete distributions'' (2nd edition). Wiley. ISBN 0-471-54897-9, p157</ref><ref name=JKK159>Johnson, N.L., Kotz, S., Kemp, A.W. (1993) ''Univariate Discrete distributions'' (2nd edition). Wiley. ISBN 0-471-54897-9, p159</ref>}}

Latest revision as of 23:56, 4 April 2024

Binomial

The binomial distribution with parameters [math]n[/math] and [math]p[/math] is the discrete probability distribution of the number of successes in a sequence of [math]n[/math] independent yes/no experiments, each of which yields success with probability [math]p[/math]. A success/failure experiment is also called a Bernoulli experiment or Bernoulli trial; when [math]n = 1[/math], the binomial distribution is a Bernoulli distribution. The binomial distribution is the basis for the popular binomial test of statistical significance.

The binomial distribution is frequently used to model the number of successes in a sample of size [math]n[/math] drawn with replacement from a population of size [math]N[/math]. If the sampling is carried out without replacement, the draws are not independent and so the resulting distribution is a hypergeometric distribution, not a binomial one. However, for [math]N[/math] much larger than [math]n[/math], the binomial distribution is a good approximation, and widely used.

Specification

Probability mass function

In general, if the random variable [math]X[/math] follows the binomial distribution with parameters [math]n[/math] ∈ ℕ and [math]p[/math] ∈ [0,1], we write [math]X \sim B(n,p)[/math]. The probability of getting exactly [math]k[/math] successes in [math]n[/math] trials is given by the probability mass function:

[[math]] f(k;n,p) = \operatorname{P}(X = k) = \binom n k p^k(1-p)^{n-k}[[/math]]

for [math]k = 0, \ldots, n[/math] where

[[math]]\binom n k =\frac{n!}{k!(n-k)!}[[/math]]

is the binomial coefficient, hence the name of the distribution. The formula can be understood as follows: we want exactly [math]k[/math] successes ([math]p^k[/math]) and [math]n-k[/math] failures ([math](1-p)^{-(n-k)}[/math]). However, the [math]k[/math] successes can occur anywhere among the [math]n[/math] trials, and there are [math]{n\choose k}[/math] different ways of distributing [math]k[/math] successes in a sequence of [math]n[/math] trials.

In creating reference tables for binomial distribution probability, usually the table is filled in up to [math]n/2[/math] values. This is because for [math]k \gt n/2[/math], the probability can be calculated by its complement as

[[math]]f(k,n,p)=f(n-k,n,1-p). [[/math]]

The probability mass function satisfies the following recurrence relation, for every [math]n,p[/math]:

[[math]]\left\{\begin{array}{l} p (n-k) f(k,n,p) = (k+1) (1-p) f(k+1,n,p), \\[10pt] f(0,n,p)=(1-p)^n \end{array}\right\}[[/math]]

Looking at the expression [math]f(k,n,p)[/math] as a function of [math]k[/math], there is a [math]k[/math] value that maximizes it. This [math]k[/math] value can be found by calculating

[[math]] \frac{f(k+1,n,p)}{f(k,n,p)}=\frac{(n-k)p}{(k+1)(1-p)} [[/math]]

and comparing it to 1. There is always an integer [math]M[/math] that satisfies

[[math]](n+1)p-1 \leq M \lt (n+1)p.[[/math]]

[math]f(k,n,p)[/math] is monotone increasing for [math]k \lt M[/math] and monotone decreasing for [math]k \gt M[/math], with the exception of the case where [math](n+1)p[/math] is an integer. In this case, there are two values for which [math]f[/math] is maximal: [math](n+1)p[/math] and [math](n+1)p-1[/math]. [math]M[/math] is the most probable (most likely) outcome of the Bernoulli trials and is called the mode. Note that the probability of it occurring can be fairly small.

Cumulative distribution function

The cumulative distribution function can be expressed as:

[[math]]F(k;n,p) = \Pr(X \le k) = \sum_{i=0}^{\lfloor k \rfloor} {n\choose i}p^i(1-p)^{n-i}[[/math]]

where [math]\scriptstyle \lfloor k\rfloor\,[/math] is the "floor" under [math]k[/math], i.e. the greatest integer less than or equal to [math]k[/math].

Mean and Variance

If [math]X \sim B(n,p)[/math], that is, [math]X[/math] is a binomially distributed random variable, [math]n[/math] being the total number of experiments and [math]p[/math] the probability of each experiment yielding a successful result, then the expected value of [math]X[/math] is [math]np[/math] and the variance is [math]npq[/math]. This follows directly from the fact that [math]X[/math] is equal in distribution to the sum of [math]n[/math] independent Bernouilli random variables each having success probability [math]p[/math] (see below).

Mode

Usually the mode of a binomial [math]B(n,p)[/math] distribution is equal to [math]\lfloor (n+1)p\rfloor[/math], where [math]\lfloor\cdot\rfloor[/math] is the floor function. However, when[math](n+1)p[/math] is an integer and [math]p[/math] is neither 0 nor 1, then the distribution has two modes: [math](n+1)p[/math](n + 1)p and [math](n+1)p -1 [/math]. When [math]p[/math] is equal to 0 or 1, the mode will be 0 and [math]n[/math] correspondingly. These cases can be summarized as follows:

[[math]] \begin{cases} \lfloor (n+1)\,p\rfloor & \text{if }(n+1)p\text{ is 0 or a noninteger}, \\ (n+1)\,p\ \text{ and }\ (n+1)\,p - 1 &\text{if }(n+1)p\in\{1,\dots,n\}, \\ n & \text{if }(n+1)p = n + 1. \end{cases}[[/math]]

Median

In general, there is no single formula to find the median for a binomial distribution, and it may even be non-unique. However several special results have been established:

  • If [math]np[/math] is an integer, then the mean, median, and mode coincide and equal [math]np[/math].[1][2]
  • Any median [math]m[/math] must lie within the interval ⌊[math]np[/math]⌋ ≤ [math]m[/math] ≤ ⌈[math]np[/math]⌉.[3]
  • A median [math]m[/math] cannot lie too far away from the mean: [math]m-np \leq \textrm{min}\{\ln(2),\textrm{max}\{p,1-p\}\}[/math].[4]
  • The median is unique and equal to [math]m=[/math]round([math]np[/math]) in cases when either [math]p\leq 1-\ln(2)[/math] or [math]p\geq \ln(2)[/math] or [math]|m-np| \leq \textrm{min}\{p, 1-p\}[/math] (except for the case when [math]p = 1/2 [/math] and [math]n[/math] is odd).[3][4]
  • When [math]p=1/2[/math] and [math]n[/math] is odd, any number [math]m[/math] in the interval [math][(n-1)/2,(n+1)/2][/math] is a median of the binomial distribution. If [math]p = 1/2[/math] and [math]n[/math] is even, then [math]m = n/2[/math] is the unique median.

Related distributions

Sums of binomials

If [math]X \sim B(n,p)[/math] and [math]Y \sim B(m, p) [/math] are independent binomial variables with the same probability [math]p[/math], then [math]X+Y [/math] is again a binomial variable: its distribution is [math]Z=X+Y \sim B(n+m, p)[/math].

Bernoulli distribution

The Bernoulli distribution is a special case of the binomial distribution, where [math] n = 1[/math]. Symbolically, [math]X \sim B(1,p)[/math] has the same meaning as [math]X \sim B(p) [/math]. Conversely, any binomial distribution, [math]B(n,p)[/math], is the distribution of the sum of [math]n[/math] Bernoulli trials, [math]B(p)[/math], each with the same probability [math]p[/math].

Normal approximation

If [math]n[/math] is large enough, then the skew of the distribution is not too great. In this case a reasonable approximation to [math]B(n,p)[/math] is given by the normal distribution [math] \mathcal{N}(np,\,np(1-p))[/math], and this basic approximation can be improved in a simple way by using a suitable continuity correction. The basic approximation generally improves as [math]n[/math] increases (at least 20) and is better when [math]p[/math] is not near to 0 or 1.[5] Various heuristics may be used to decide whether [math]n[/math] is large enough, and [math]p[/math] is far enough from the extremes of zero or one:

  • One rule is that both [math]x = np [/math] and [math]n-p[/math] must be greater than 5. However, the specific number varies from source to source, and depends on how good an approximation one wants; some sources give 10 which gives virtually the same results as the following rule for large [math]n[/math] until [math]n[/math] is very large.
  • A second rule[5] is that for [math]n\gt5[/math] the normal approximation is adequate if

[[math]]\left | \left (\frac{1}{\sqrt{n}} \right ) \left (\sqrt{\frac{1-p}{p}}-\sqrt{\frac{p}{1-p}} \right ) \right |\lt0.3[[/math]]

  • Another commonly used rule holds that the normal approximation is appropriate only if everything within 3 standard deviations of its mean is within the range of possible values, that is if

[[math]]\mu \pm 3 \sigma = np \pm 3 \sqrt{np(1-p)} \in [0,n].[[/math]]

The following is an example of applying a continuity correction. Suppose one wishes to calculate [math]\operatorname{P}(X \leq 8) [/math] for a binomial random variable [math]X[/math]. If [math]Y[/math] has a distribution given by the normal approximation, then [math]\operatorname{P}(X \leq 8 )[/math] is approximated by [math]\operatorname{P}(Y \leq 8.5 ) [/math]. The addition of 0.5 is the continuity correction; the uncorrected normal approximation gives considerably less accurate results.

This approximation, known as de Moivre–Laplace theorem, is a huge time-saver when undertaking calculations by hand (exact calculations with large [math]n[/math] are very onerous); historically, it was the first use of the normal distribution, introduced in Abraham de Moivre's book The Doctrine of Chances in 1738. Nowadays, it can be seen as a consequence of the central limit theorem since [math]B(n,p)[/math] is a sum of [math]n[/math] independent, identically distributed Bernoulli variables with parameter [math]p[/math]. This fact is the basis of a hypothesis test, a "proportion z-test", for the value of [math]p[/math] using [math]x/n[/math], the sample proportion and estimator of [math]p[/math], in a common test statistic.[6]

For example, suppose one randomly samples [math]n[/math] people out of a large population and ask them whether they agree with a certain statement. The proportion of people who agree will of course depend on the sample. If groups of [math]n[/math] people were sampled repeatedly and truly randomly, the proportions would follow an approximate normal distribution with mean equal to the true proportion [math]p[/math] of agreement in the population and with standard deviation [math]\sigma = \sqrt{\frac{p(1-p)}{n}}[/math]

Geometric

The geometric distribution is the probability distribution of the number of failures before the first success supported on the set { 0, 1, 2, 3, ... }, i.e, if [math]p[/math] denotes the probability of success on each trial then

[[math]]\operatorname{P}(Y=k) = (1 - p)^k\,p\,.[[/math]]

Moments

The expected value of the geometrically distributed random variable [math]Y[/math] and its variance is

[[math]] \operatorname{E}(Y) = \frac{p}{1-p}, \qquad\operatorname{Var}(Y) = \frac{p}{(1-p)^2}. [[/math]]

Related distributions

  • The geometric distribution [math]Y[/math] is a special case of the negative binomial distribution, with [math]r = 1 [/math]. More generally, if [math]Y_1,\ldots,Y_r[/math] are independent geometrically distributed variables with parameter [math]p[/math], then the sum

[[math]]Z = \sum_{m=1}^r Y_m[[/math]]

follows a negative binomial distribution with parameters [math]r[/math] and [math]p[/math].[7]

  • If [math]Y_1,\ldots,Y_r[/math] are independent geometrically distributed variables (with possibly different success parameters [math]p_m[/math]), then their minimum

[[math]]W = \min_{m \in 1, \dots, r} Y_m\,[[/math]]

is also geometrically distributed, with parameter [math]p = 1-\prod_m(1-p_{m}).[/math]

  • Suppose [math]0 \lt r \lt 1 [/math], and for [math] k = 1,2,3,\ldots [/math] the random variable [math]X_k[/math] has a Poisson distribution with expected value [math]r^k[/math]. Then

[[math]]\sum_{k=1}^\infty k\,X_k[[/math]]

has a geometric distribution taking values in the set {0, 1, 2, ...}, with expected value [math]r/(1-r)[/math].

  • The exponential distribution is the continuous analogue of the geometric distribution. If [math]X[/math] is an exponentially distributed random variable with parameter [math]\lambda[/math], then

[[math]]Y = \lfloor X \rfloor,[[/math]]

where [math]\lfloor \quad \rfloor[/math] is the floor (or greatest integer) function, is a geometrically distributed random variable with parameter [math]p = 1- e^{-\lambda} [/math] (thus [math]\lambda = - \ln(1-p) [/math][8]) and taking values in the set {0, 1, 2, ...}.

Poisson Distribution

The Poisson distribution , named after French mathematician Siméon Denis Poisson, is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time and/or space if these events occur with a known average rate and independently of the time since the last event.[9] The Poisson distribution can also be used for the number of events in other specified intervals such as distance, area or volume. Within the context of insurance, the Poisson distribution can be used to model the number (frequency) of claims during a given time period.

Definition

A discrete random variable [math]X[/math] is said to have a Poisson distribution with parameter [math]\lambda \gt 0[/math], if, for [math]k = 0, 1, \ldots [/math], the probability mass function of [math]X[/math] is given by[10]

[[math]]\!p_k= \operatorname{P}(X = k)= \frac{\lambda^k e^{-\lambda}}{k!}.[[/math]]

The probability mass function satisfies the following recurrence relation:

[[math]]\left\{\begin{array}{l} (k+1) p_{k+1}-\lambda p_{k}=0, \\ p_{0}=e^{-\lambda} \end{array}\right\}. [[/math]]

Mean

[[math]]\operatorname{E}|X-\lambda|= 2\exp(-\lambda) \frac{\lambda^{\lfloor\lambda\rfloor + 1}}{ \lfloor\lambda\rfloor!} .[[/math]]

  • The mode of a Poisson-distributed random variable with non-integer [math]\lambda[/math] is equal to [math]\scriptstyle\lfloor \lambda \rfloor[/math], which is the largest integer less than or equal to [math]\lambda[/math]. This is also written as floor([math]λ[/math]). When [math]λ[/math] is a positive integer, the modes are [math]\lambda[/math] and [math]\lambda-1[/math].

Median

Bounds for the median ([math]ν[/math]) of the distribution are known and are sharp:[12]

[[math]] \lambda - \ln 2 \le \nu \lt \lambda + \frac{1}{3}. [[/math]]

Other properties

  • If [math]X_i \sim \operatorname{Pois}(\lambda_i)[/math] are independent and [math]\lambda=\sum_{i=1}^n \lambda_i[/math], then [math]Y = \left( \sum_{i=1}^n X_i \right) \sim \operatorname{Pois}(\lambda)[/math].[13] A converse is Raikov's theorem, which says that if the sum of two independent random variables is Poisson-distributed, then so is each of those two independent random variables.[14]
  • The Poisson distributions are infinitely divisible probability distributions.[15][16]
  • Bounds for the tail probabilities of a Poisson random variable [math] X \sim \operatorname{Pois}(\lambda)[/math] can be derived using a Chernoff bound argument:[17]

[[math]] \begin{cases} \operatorname{P}(X \geq x) \leq \frac{e^{-\lambda} (e \lambda)^x}{x^x} & \text{ for } x \gt \lambda \\ \operatorname{P}(X \leq x) \leq \frac{e^{-\lambda} (e \lambda)^x}{x^x} & \text{ for } x \lt \lambda \,\, . \end{cases} [[/math]]

Negative Binomial

The negative binomial distribution is a discrete probability distribution of the number of failures in a sequence of independent and identically distributed Bernoulli trials before a specified number of successes (denoted [math]r[/math]) occurs. More precisely, suppose there is a sequence of independent Bernoulli trials. Thus, each trial has two potential outcomes called “success” and “failure”. In each trial the probability of success is [math]p[/math] and of failure is [math]1-p[/math]. We are observing this sequence until a predefined number [math]r[/math] of successes has occurred.

Probability Mass Function

The probability mass function of the negative binomial distribution is

[[math]] f(k; r, q) \equiv \operatorname{P}(N = k) = \binom{k+r-1}{k} (1-p)^kp^r \quad\text{for }k = 0, 1, 2, \dotsc [[/math]]

The binomial coefficient can be written in the following manner, explaining the name “negative binomial”:

[[math]] \begin{align*} \frac{(k+r-1)\dotsm(r)}{k!} &= (-1)^k \frac{(-r)(-r-1)(-r-2)\dotsm(-r-k+1)}{k!} \\ \label{*} &= (-1)^k\binom{-r}{k}. \end{align*} [[/math]]

To understand the above definition of the probability mass function, note that the probability for every specific sequence of [math]k[/math] failures and [math]r[/math] successes is [math]p^r(1-p)^k[/math], because the outcomes of the [math]k[/math] trials are supposed to happen independently. Since the [math]r[/math]th success comes last, it remains to choose the [math]k[/math] trials with failures out of the remaining [math]r-1[/math] trials. The above binomial coefficient gives precisely the number of all these sequences of length [math]k-1[/math].

Extension to real-valued r

It is possible to extend the definition of the negative binomial distribution to the case of a positive real parameter [math]r[/math]. Although it is impossible to visualize a non-integer number of “successes”, we can still formally define the distribution through its probability mass function.

In the spirit of being consistent with the parametrizations found in [18], we consider the alternative parametrization defined implicitly by setting [math]p = 1(1+\beta)[/math].

As before, we say that [math]N[/math] has a negative binomial (or Pólya) distribution if it has a probability mass function:

[[math]] f(k; r, \beta) \equiv \operatorname{P}(N = k) = \binom{k+r-1}{k} \frac{\beta^k}{(1 + \beta)^{r + k}} \quad\text{for }k = 0, 1, 2, \dotsc [[/math]]

Here [math]r[/math] is a real, positive number. The binomial coefficient is then defined by the multiplicative formula and can also be rewritten using the gamma function:

[[math]] \binom{k+r-1}{k} = \frac{(k+r-1)(k+r-2)\dotsm(r)}{k!} = \frac{\Gamma(k+r)}{k!\,\Gamma(r)}. [[/math]]

To show that the probability mass function adds up to one, we have, by the binomial series

[[math]] (1 + \beta)^{-r} = (1 - (1-p))^{-r} =\sum_{k=0}^\infty(-1)^k\binom{-r}{k}(1-p)^k = (1 + \beta)^r \,\sum_{k=0}^\infty \operatorname{P}(N = k). [[/math]]

Finally, the following recurrence relation holds:

[[math]]\begin{array}{l} (k+1) \operatorname{P} (k+1)- (1-p) \operatorname{P} (k) (k+r)=0, \\ \operatorname{P} (0) = p^r. \end{array} [[/math]]

Wikipedia References

  • Wikipedia contributors. "Binomial distribution". Wikipedia. Wikipedia. Retrieved 28 January 2022.
  • Wikipedia contributors. "Geometric distribution". Wikipedia. Wikipedia. Retrieved 28 January 2022.
  • Wikipedia contributors. "Poisson distribution". Wikipedia. Wikipedia. Retrieved 28 January 2022.
  • Wikipedia contributors. "Negative binomial distribution". Wikipedia. Wikipedia. Retrieved 17 February 2022.

References

  1. Neumann, P. (1966). "Über den Median der Binomial- and Poissonverteilung" (in German). Wissenschaftliche Zeitschrift der Technischen Universität Dresden 19: 29–33. 
  2. Lord, Nick. (July 2010). "Binomial averages when the mean is an integer", The Mathematical Gazette 94, 331-332.
  3. 3.0 3.1 "Mean, Median and Mode in Binomial Distributions" (1980). Statistica Neerlandica 34 (1): 13–18. doi:10.1111/j.1467-9574.1980.tb00681.x. 
  4. 4.0 4.1 "The smallest uniform upper bound on the distance between the mean and the median of the binomial and Poisson distributions" (1995). Statistics & Probability Letters 23: 21–25. doi:10.1016/0167-7152(94)00090-U. 
  5. 5.0 5.1 Box, Hunter and Hunter (1978). Statistics for experimenters. Wiley. p. 130.
  6. NIST/SEMATECH, "7.2.4. Does the proportion of defectives meet requirements?" e-Handbook of Statistical Methods.
  7. Pitman, Jim. Probability (1993 edition). Springer Publishers. pp 372.
  8. http://www.wolframalpha.com/input/?i=inverse+p+%3D+1+-+e^-l
  9. Frank A. Haight (1967). Handbook of the Poisson Distribution. New York: John Wiley & Sons.CS1 maint: ref=harv (link)
  10. Probability and Stochastic Processes: A Friendly Introduction for Electrical and Computer Engineers, Roy D. Yates, David Goodman, page 60.
  11. 11.0 11.1 Johnson, N.L., Kotz, S., Kemp, A.W. (1993) Univariate Discrete distributions (2nd edition). Wiley. ISBN 0-471-54897-9, p157
  12. Choi KP (1994) On the medians of Gamma distributions and an equation of Ramanujan. Proc Amer Math Soc 121 (1) 245–251
  13. E. L. Lehmann (1986). Testing Statistical Hypotheses (second ed.). New York: Springer Verlag. ISBN 0-387-94919-4.CS1 maint: ref=harv (link) page 65.
  14. Raikov, D. (1937). On the decomposition of Poisson laws. Comptes Rendus (Doklady) de l' Academie des Sciences de l'URSS, 14, 9–11. (The proof is also given in von Mises, Richard (1964). Mathematical Theory of Probability and Statistics. New York: Academic Press.CS1 maint: ref=harv (link))
  15. Laha, R. G.; Rohatgi, V. K. Probability Theory. New York: John Wiley & Sons. p. 233. ISBN 0-471-03262-X.CS1 maint: ref=harv (link)
  16. Johnson, N.L., Kotz, S., Kemp, A.W. (1993) Univariate Discrete distributions (2nd edition). Wiley. ISBN 0-471-54897-9, p159
  17. Michael Mitzenmacher; Eli Upfal. Probability and Computing: Randomized Algorithms and Probabilistic Analysis. Cambridge University Press. p. 97. ISBN 0521835402.CS1 maint: ref=harv (link)
  18. https://www.soa.org/globalassets/assets/Files/Edu/2019/2019-02-exam-stam-tables.pdf