guide:B5ab48c211: Difference between revisions

From Stochiki
No edit summary
 
No edit summary
Line 1: Line 1:
   
==Binomial==
 
The '''binomial distribution''' with parameters <math>n</math> and <math>p</math> is the [[wikipedia:discrete probability distribution|discrete probability distribution]] of the number of successes in a sequence of <math>n</math> [[wikipedia:statistical independence|independent]] yes/no experiments, each of which yields success with [[wikipedia:probability|probability]] <math>p</math>.
A success/failure experiment is also called a Bernoulli experiment or [[wikipedia:Bernoulli trial|Bernoulli trial]]; when <math>n = 1</math>, the binomial distribution  is a [[wikipedia:Bernoulli distribution|Bernoulli distribution]]. The binomial distribution is the basis for the popular [[wikipedia:binomial test|binomial test]] of [[wikipedia:statistical significance|statistical significance]].
 
The binomial distribution is frequently used to model the number of successes in a sample of size <math>n</math> drawn [[wikipedia:Sampling (statistics)#Replacement of selected units|with replacement]] from a population of size <math>N</math>. If the sampling is carried out without replacement, the draws are not independent and so the resulting distribution is a [[wikipedia:hypergeometric distribution|hypergeometric distribution]], not a binomial one.  However, for <math>N</math> much larger than <math>n</math>, the binomial distribution is a good approximation, and widely used.
 
===Specification===
 
====Probability mass function====
 
In general, if the random variable <math>X</math> follows the binomial distribution with parameters <math>n</math> ∈ ℕ and <math>p</math> ∈ [0,1], we write <math>X \sim B(n,p)</math>. The probability of getting exactly <math>k</math> successes in <math>n</math> trials is given by the [[wikipedia:probability mass function|probability mass function]]:
 
<math display="block"> f(k;n,p) = \operatorname{P}(X = k) = \binom n k  p^k(1-p)^{n-k}</math>
 
for <math>k = 0, \ldots, n</math> where
 
<math display="block">\binom n k =\frac{n!}{k!(n-k)!}</math>
 
is the [[wikipedia:binomial coefficient|binomial coefficient]], hence the name of the distribution. The formula can be understood as follows: we want exactly <math>k</math> successes (<math>p^k</math>) and <math>n-k</math> failures (<math>(1-p)^{-(n-k)}</math>). However, the <math>k</math> successes can occur anywhere among the <math>n</math> trials, and there are <math>{n\choose k}</math> different ways of distributing <math>k</math> successes in a sequence of <math>n</math> trials.
 
In creating reference tables for binomial distribution probability, usually the table is filled in up to <math>n/2</math> values. This is because for <math>k \gt n/2</math>, the probability can be calculated by its complement as
 
<math display="block">f(k,n,p)=f(n-k,n,1-p). </math>
 
The probability mass function satisfies the following [[wikipedia:recurrence relation|recurrence relation]], for every <math>n,p</math>:<math display="block">\left\{\begin{array}{l}
p (n-k) f(k,n,p) = (k+1) (1-p)
  f(k+1,n,p), \\[10pt]
f(0,n,p)=(1-p)^n
\end{array}\right\}</math>
 
Looking at the expression <math>f(k,n,p)</math> as a function of <math>k</math>, there is a <math>k</math> value that maximizes it. This <math>k</math> value can be found by calculating<math display="block"> \frac{f(k+1,n,p)}{f(k,n,p)}=\frac{(n-k)p}{(k+1)(1-p)} </math>
and comparing it to 1. There is always an integer ''M'' that satisfies<math display="block">(n+1)p-1 \leq M < (n+1)p.</math>
 
<math>f(k,n,p)</math> is monotone increasing for <math>k < M</math> and monotone decreasing for <math>k \gt M</math>, with the exception of the case where <math>(n+1)p</math> is an integer. In this case, there are two values for which <math>f</math> is maximal: <math>(n+1)p</math> and <math>(n+1)p-1</math>. <math>M</math> is the ''most probable'' (''most likely'') outcome of the Bernoulli trials and is called the [[wikipedia:Mode (statistics)|mode]]. Note that the probability of it occurring can be fairly small.
 
====Cumulative distribution function====
 
The [[wikipedia:cumulative distribution function|cumulative distribution function]] can be expressed as:<math display="block">F(k;n,p) = \Pr(X \le k) = \sum_{i=0}^{\lfloor k \rfloor} {n\choose i}p^i(1-p)^{n-i}</math>
 
where <math>\scriptstyle \lfloor k\rfloor\,</math> is the "floor" under <math>k</math>, i.e. the [[wikipedia:greatest integer|greatest integer]] less than or equal to <math>k</math>.
 
===Mean and Variance===
 
If <math>X \sim B(n,p)</math>, that is, <math>X</math> is a binomially distributed random variable, <math>n</math> being the total number of experiments and <math>p</math> the probability of each experiment yielding a successful result, then the [[wikipedia:expected value|expected value]] of <math>X</math> is <math>np</math> and the variance is <math>npq</math>. This follows directly from the fact that <math>X</math> is equal in distribution to the sum of <math>n</math> independent [[wikipedia:Bernouilli|Bernouilli]] random variables each having success probability <math>p</math> (see [[#bernouilli|below]]).
 
===Mode===
 
Usually the [[wikipedia:mode (statistics)|mode]] of a binomial <math>B(n,p)</math> distribution is equal to <math>\lfloor (n+1)p\rfloor</math>, where  <math>\lfloor\cdot\rfloor</math> is the [[wikipedia:floor function|floor function]]. However, when<math>(n+1)p</math> is an integer and <math>p</math> is neither 0 nor 1, then the distribution has two modes: <math>(n+1)p</math>(''n''&nbsp;+&nbsp;1)''p'' and <math>(n+1)p -1 </math>. When <math>p</math> is equal to 0 or 1, the mode will be 0 and <math>n</math> correspondingly. These cases can be summarized as follows:
 
<math display="block">
      \begin{cases}
        \lfloor (n+1)\,p\rfloor & \text{if }(n+1)p\text{ is 0 or a noninteger}, \\
        (n+1)\,p\ \text{ and }\ (n+1)\,p - 1 &\text{if }(n+1)p\in\{1,\dots,n\}, \\
        n & \text{if }(n+1)p = n + 1.
      \end{cases}</math>
 
<div class="text-right">{{#Proof:View Proof|Mode|binomial/mode}}</div>
 
===Median===
 
In general, there is no single formula to find the [[wikipedia:median|median]] for a binomial distribution, and it may even be non-unique. However several special results have been established:
* If <math>np</math> is an integer, then the mean, median, and mode coincide and equal <math>np</math>.<ref>{{cite journal|last=Neumann|first=P.|year=1966|title=Über den Median der Binomial- and Poissonverteilung|journal=Wissenschaftliche Zeitschrift der Technischen Universität Dresden|volume=19|pages=29–33|language=German}}</ref><ref>Lord, Nick. (July 2010). "Binomial averages when the mean is an integer", [[wikipedia:The Mathematical Gazette|The Mathematical Gazette]] 94, 331-332.</ref>
* Any median <math>m</math> must lie within the interval ⌊<math>np</math>⌋&nbsp;≤&nbsp;<math>m</math>&nbsp;≤&nbsp;⌈<math>np</math>⌉.<ref name="KaasBuhrman">{{cite journal|first1=R.|last1=Kaas|first2=J.M.|last2=Buhrman|title=Mean, Median and Mode in Binomial Distributions|journal=Statistica Neerlandica|year=1980|volume=34|issue=1|pages=13–18|doi=10.1111/j.1467-9574.1980.tb00681.x}}</ref>
* A median <math>m</math> cannot lie too far away from the mean: <math>m-np \leq \textrm{min}\{\ln(2),\textrm{max}\{p,1-p\}\}</math>.<ref name="Hamza">{{Cite journal
| last1 = Hamza | first1 = K.
| doi = 10.1016/0167-7152(94)00090-U
| title = The smallest uniform upper bound on the distance between the mean and the median of the binomial and Poisson distributions
| journal = Statistics & Probability Letters
| volume = 23
| pages = 21–25
| year = 1995
| pmid = 
| pmc =
}}</ref>
* The median is unique and equal to <math>m=</math>[[wikipedia:Rounding|round]](<math>np</math>) in cases when either <math>p\leq 1-\ln(2)</math> or <math>p\geq \ln(2)</math> or <math>|m-np| \leq \textrm{min}\{p, 1-p\}</math> (except for the case when <math>p = 1/2 </math> and <math>n</math> is odd).<ref name="KaasBuhrman"/><ref name="Hamza"/>
* When <math>p=1/2</math> and <math>n</math> is odd, any number <math>m</math> in the interval <math>[(n-1)/2,(n+1)/2]</math> is a median of the binomial distribution. If <math>p = 1/2</math> and <math>n</math> is even, then <math>m = n/2</math> is the unique median.
 
===Related distributions===
 
====Sums of binomials====
 
If <math>X \sim B(n,p)</math> and <math>Y \sim B(m, p) </math> are independent binomial variables with the same probability <math>p</math>, then <math>X+Y </math> is again a binomial variable: its distribution is <math>Z=X+Y \sim B(n+m, p)</math>.
 
====Bernoulli distribution <span id="bernouilli"></span>====
 
The [[wikipedia:Bernoulli distribution|Bernoulli distribution]] is a special case of the binomial distribution, where <math> n = 1</math>. Symbolically, <math>X \sim B(1,p)</math> has the same meaning as <math>X \sim B(p) </math>. Conversely, any binomial distribution, <math>B(n,p)</math>, is the distribution of the sum of <math>n</math> [[wikipedia:Bernoulli trials|Bernoulli trials]], <math>B(p)</math>, each with the same probability <math>p</math>.
 
====Normal approximation====
 
If <math>n</math> is large enough, then the skew of the distribution is not too great. In this case a reasonable approximation to <math>B(n,p)</math> is given by the [[wikipedia:normal distribution|normal distribution]] <math> \mathcal{N}(np,\,np(1-p))</math>, and this basic approximation can be improved in a simple way by using a suitable [[wikipedia:continuity correction|continuity correction]]. The basic approximation generally improves as <math>n</math> increases (at least 20) and is better when <math>p</math> is not near to 0 or 1.<ref name="bhh">{{cite book|title=Statistics for experimenters|author=Box, Hunter and Hunter|publisher=Wiley|year=1978|page=130}}</ref> Various heuristics may be used to decide whether <math>n</math> is large enough, and <math>p</math> is far enough from the extremes of zero or one:
 
*One rule is that both <math>x = np </math> and <math>n-p</math> must be greater than&nbsp;5. However, the specific number varies from source to source, and depends on how good an approximation one wants; some sources give 10 which gives virtually the same results as the following rule for large <math>n</math> until <math>n</math> is very large.
*A second rule<ref name="bhh"/> is that for <math>n>5</math> the normal approximation is adequate if
 
<math display="block">\left | \left (\frac{1}{\sqrt{n}} \right ) \left (\sqrt{\frac{1-p}{p}}-\sqrt{\frac{p}{1-p}} \right ) \right |<0.3</math>
 
*Another commonly used rule holds that the normal approximation is appropriate only if everything within 3 standard deviations of its mean is within the range of possible values, that is if
 
<math display="block">\mu \pm 3 \sigma = np \pm 3 \sqrt{np(1-p)} \in [0,n].</math>
 
The following is an example of applying a [[wikipedia:continuity correction|continuity correction]]. Suppose one wishes to calculate <math>\operatorname{P}(X \leq 8) </math> for a binomial random variable <math>X</math>. If <math>Y</math> has a distribution given by the normal approximation, then <math>\operatorname{P}(X \leq 8 )</math> is approximated by <math>\operatorname{P}(Y \leq 8.5 ) </math>. The addition of 0.5 is the continuity correction; the uncorrected normal approximation gives considerably less accurate results.
 
This approximation, known as [[wikipedia:de Moivre–Laplace theorem|de Moivre–Laplace theorem]], is a huge time-saver when undertaking calculations by hand (exact calculations with large <math>n</math> are very onerous); historically, it was the first use of the normal distribution, introduced in [[wikipedia:Abraham de Moivre|Abraham de Moivre]]'s book ''[[wikipedia:The Doctrine of Chances|The Doctrine of Chances]]'' in 1738. Nowadays, it can be seen as a consequence of the [[wikipedia:central limit theorem|central limit theorem]] since <math>B(n,p)</math> is a sum of <math>n</math> independent, identically distributed [[wikipedia:Bernoulli distribution|Bernoulli variables]] with parameter <math>p</math>. This fact is the basis of a [[wikipedia:hypothesis test|hypothesis test]], a "proportion z-test", for the value of <math>p</math> using <math>x/n</math>, the sample proportion and estimator of <math>p</math>, in a [[wikipedia:common test statistics|common test statistic]].<ref>[[wikipedia:NIST|NIST]]/[[wikipedia:SEMATECH|SEMATECH]], [http://www.itl.nist.gov/div898/handbook/prc/section2/prc24.htm "7.2.4. Does the proportion of defectives meet requirements?"] ''e-Handbook of Statistical Methods.''</ref>
 
For example, suppose one randomly samples <math>n</math> people out of a large population and ask them whether they agree with a certain statement. The proportion of people who agree will of course depend on the sample. If groups of <math>n</math> people were sampled repeatedly and truly randomly, the proportions would follow an approximate normal distribution with mean equal to the true proportion <math>p</math> of agreement in the population and with standard deviation <math>\sigma = \sqrt{\frac{p(1-p)}{n}}</math>
 
==Geometric==
 
The geometric distribution is the probability distribution of the number of failures before the first success supported on the set&nbsp;{&nbsp;0,&nbsp;1,&nbsp;2,&nbsp;3,&nbsp;...&nbsp;}, i.e, if <math>p</math> denotes the probability of success on each trial then
 
<math display="block">\operatorname{P}(Y=k) = (1 - p)^k\,p\,.</math>
 
To retain consistency with the notation found in <ref>https://www.soa.org/files/edu/edu-exam-c-tables-cont-dist.pdf</ref>, we set <math>\beta = p/(1-p)</math> and obtain:
 
<math display="block">\begin{equation}\label{geometric}\operatorname{P}(Y=k) = \frac{\beta^k}{(1+\beta)^{k+1}}.\end{equation}</math>
 
Going forward, we assume that a geometric distribution is characterized by \ref{geometric} and depends solely on <math>\beta</math> which turns out to be the mean of the distribution.
 
===Moments ===
 
The expected value of the geometrically distributed random variable <math>Y</math> is <math>\beta</math> and its variance is <math>\beta(\beta +1)</math>:
 
<math display="block">
\operatorname{E}(Y) = \beta, \qquad\operatorname{Var}(Y) = \beta(1 + \beta).
</math>
 
===Related distributions===
 
* The geometric distribution <math>Y</math> is a special case of the [[wikipedia:negative binomial distribution|negative binomial distribution]], with <math>r = 1 </math>. More generally, if <math>Y_1,\ldots,Y_r</math> are [[wikipedia:statistical independence|independent]] geometrically distributed variables with parameter <math>p</math>, then the sum
 
<math display="block">Z = \sum_{m=1}^r Y_m</math>
 
follows a negative binomial distribution with parameters <math>r</math> and <math>p</math>.<ref>Pitman, Jim. Probability (1993 edition). Springer Publishers. pp 372.</ref>
 
* If <math>Y_1,\ldots,Y_r</math> are independent geometrically distributed variables (with possibly different success parameters <math>p_m</math>), then their minimum
 
<math display="block">W = \min_{m \in 1, \dots, r} Y_m\,</math>
 
is also geometrically distributed, with parameter <math>p = 1-\prod_m(1-p_{m}).</math>
* Suppose <math>0 < r < 1 </math>, and for <math> k = 1,2,3,\ldots </math> the random variable <math>X_k</math> has a [[wikipedia:Poisson distribution|Poisson distribution]] with expected value <math>r^k</math>.  Then
 
<math display="block">\sum_{k=1}^\infty k\,X_k</math>
 
has a geometric distribution taking values in the set {0,&nbsp;1,&nbsp;2,&nbsp;...}, with expected value <math>r/(1-r)</math>.
 
* The [[wikipedia:exponential distribution|exponential distribution]] is the continuous analogue of the geometric distribution.  If <math>X</math> is an exponentially distributed random variable with parameter <math>\lambda</math>, then
 
<math display="block">Y = \lfloor X \rfloor,</math>
 
where <math>\lfloor \quad \rfloor</math> is the [[wikipedia:Floor and ceiling functions|floor]] (or greatest integer) function, is a geometrically distributed random variable with parameter <math>p = 1- e^{-\lambda} </math> (thus <math>\lambda = - \ln(1-p) </math><ref>http://www.wolframalpha.com/input/?i=inverse+p+%3D+1+-+e^-l</ref>) and taking values in the set&nbsp;{0,&nbsp;1,&nbsp;2,&nbsp;...}.
 
==Poisson Distribution==
 
The '''Poisson distribution''' , named after French mathematician [[wikipedia:Siméon Denis Poisson|Siméon Denis Poisson]], is a [[wikipedia:discrete probability distribution|discrete probability distribution]] that expresses the probability of a given number of events occurring in a fixed interval of time and/or space if these events occur with a known average rate and [[wikipedia:Statistical independence|independently]] of the time since the last event.<ref name=haight>{{cite book|author=Frank A. Haight|title=Handbook of the Poisson Distribution|publisher=John Wiley & Sons|location=New York|year=1967|ref=harv}}</ref> The Poisson distribution can also be used for the number of events in other specified intervals such as distance, area or volume. Within the context of insurance, the Poisson distribution can be used to model the  number (frequency) of claims during a given time period.
 
=== Definition ===
A discrete [[wikipedia:random variable|random variable]]  <math>X</math>  is said to have a Poisson distribution with parameter <math>\lambda > 0</math>, if, for <math>k = 0, 1, \ldots </math>, the [[wikipedia:probability mass function|probability mass function]] of <math>X</math> is given by<ref>Probability and Stochastic Processes: A Friendly Introduction for Electrical and Computer Engineers, Roy D. Yates, David Goodman, page 60.</ref>
 
<math display="block">\!p_k= \operatorname{P}(X = k)= \frac{\lambda^k e^{-\lambda}}{k!}.</math>
 
The probability mass function satisfies the following [[wikipedia:recurrence relation|recurrence relation]]:
 
<math display="block">\left\{\begin{array}{l}
 
(k+1) p_{k+1}-\lambda  p_{k}=0, \\
p_{0}=e^{-\lambda}
 
\end{array}\right\}.
 
</math>
 
=== Properties ===
 
==== Mean ====
 
*The [[wikipedia:expected value|expected value]] and [[wikipedia:variance|variance]] of a Poisson-distributed random variable are both equal to <math>\lambda</math>.
*The [[wikipedia:coefficient of variation|coefficient of variation]] is <math>\textstyle \lambda^{-1/2}</math>, while the [[wikipedia:index of dispersion|index of dispersion]] is 1.<ref name=JKK157/>
*The [[wikipedia:mean absolute deviation|mean absolute deviation]] about the mean is<ref name=JKK157/>
<math display="block">\operatorname{E}|X-\lambda|= 2\exp(-\lambda) \frac{\lambda^{\lfloor\lambda\rfloor + 1}}{ \lfloor\lambda\rfloor!} .</math>
*The [[wikipedia:mode (statistics)|mode]] of a Poisson-distributed random variable with non-integer <math>\lambda</math> is equal to <math>\scriptstyle\lfloor \lambda \rfloor</math>, which is the largest integer less than or equal to&nbsp;<math>\lambda</math>. This is also written as [[wikipedia:floor function|floor]](<math>λ</math>). When <math>λ</math> is a positive integer, the modes are <math>\lambda</math> and <math>\lambda-1</math>.
 
==== Median ====
 
Bounds for the median (<math>ν</math>) of the distribution are known and are sharp:<ref name=Choi1994>Choi KP (1994) On the medians of Gamma distributions and an equation of Ramanujan. Proc Amer Math Soc 121 (1) 245–251</ref><math display="block"> \lambda - \ln 2 \le \nu < \lambda + \frac{1}{3}. </math>
 
==== Higher moments ====
 
* The higher [[wikipedia:moment (mathematics)|moments]] <math>m_k</math> of the Poisson distribution about the origin are [[wikipedia:Touchard polynomials|Touchard polynomials]] in <math>\lambda</math>:
 
<math display="block"> m_k = \sum_{i=1}^k \lambda^i \left\{\begin{matrix} k \\ i \end{matrix}\right\},</math>
 
where the {braces} denote [[wikipedia:Stirling numbers of the second kind|Stirling numbers of the second kind]].<ref>{{cite journal|author=Riordan, John|year=1937|journal=Annals of Mathematical Statistics|title=Moment recurrence relations for binomial, Poisson and hypergeometric frequency distributions|volume=8|pages=103–111|ref=harv|doi=10.1214/aoms/1177732430}} Also see Haight (1967), p. 6.</ref> The coefficients of the polynomials have a [[wikipedia:combinatorics|combinatorial]] meaning. In fact, when the expected value of the Poisson distribution is 1, then [[wikipedia:Dobinski's formula|Dobinski's formula]] says that the <math>n</math><sup>th</sup> moment equals the number of [[wikipedia:partition of a set|partitions of a set]] of size <math>n</math>.
 
*If <math>X_i \sim \operatorname{Pois}(\lambda_i)</math> are [[wikipedia:statistical independence|independent]] and <math>\lambda=\sum_{i=1}^n \lambda_i</math>, then <math>Y = \left( \sum_{i=1}^n X_i \right) \sim \operatorname{Pois}(\lambda)</math>.<ref>{{cite book|author=E. L. Lehmann|title=Testing Statistical Hypotheses|publisher=Springer Verlag|location=New York|edition=second|year=1986|isbn=0-387-94919-4|ref=harv}} page 65.</ref>  A converse is [[wikipedia:Raikov's theorem|Raikov's theorem]], which says that if the sum of two independent random variables is Poisson-distributed, then so is each of those two independent random variables.<ref>Raikov, D. (1937). On the decomposition of Poisson laws. ''Comptes Rendus (Doklady) de l' Academie des Sciences de l'URSS, 14, 9–11. (The proof is also given in {{cite book|author=von Mises, Richard|year=1964|title=Mathematical Theory of Probability and Statistics|publisher=Academic Press|location=New York|ref=harv}})</ref>
 
==== Other properties ====
 
*The Poisson distributions are [[wikipedia:Infinite divisibility (probability)|infinitely divisible]] probability distributions.<ref>{{cite book|author1=Laha, R. G.  |author2=Rohatgi, V. K.  |title=Probability Theory|publisher=John Wiley & Sons|location=New York|isbn=0-471-03262-X|ref=harv|page=233}}</ref><ref name=JKK159/>
 
* Bounds for the tail probabilities of a Poisson random variable <math> X \sim \operatorname{Pois}(\lambda)</math> can be derived using a [[wikipedia:Chernoff bound|Chernoff bound]] argument:<ref>{{cite book|author1=Michael Mitzenmacher  |author2=Eli Upfal |title=Probability and Computing: Randomized Algorithms and Probabilistic Analysis|publisher=Cambridge University Press|isbn=0521835402|page=97|ref=harv}}</ref>
 
<math display="block">
\begin{cases}
 
\operatorname{P}(X \geq x) \leq \frac{e^{-\lambda} (e \lambda)^x}{x^x} & \text{ for } x > \lambda \\
 
\operatorname{P}(X \leq x) \leq \frac{e^{-\lambda} (e \lambda)^x}{x^x} & \text{ for } x < \lambda \,\, .
 
\end{cases}
 
</math>
 
==Negative Binomial==
 
The '''negative binomial distribution''' is a discrete probability distribution of the number of failures in a sequence of independent and identically distributed [[wikipedia:Bernoulli trial|Bernoulli trial]]s before a specified number of successes (denoted <math>r</math>) occurs. More precisely, suppose there is a sequence of independent Bernoulli trials. Thus, each trial has two potential outcomes called “success” and “failure”. In each trial the probability of success is <math>p</math> and of failure is <math>1-p</math>. We are observing this sequence until a predefined number <math>r</math> of successes has occurred.
 
===Probability Mass Function ===
 
The probability mass function of the negative binomial distribution is<math display="block">
    f(k; r, q) \equiv \operatorname{P}(N = k) = \binom{k+r-1}{k} (1-p)^kp^r \quad\text{for }k = 0, 1, 2, \dotsc
  </math>
 
The binomial coefficient can be written in the following manner, explaining the name “negative binomial”:<math display="block">
  \begin{align*}
    \frac{(k+r-1)\dotsm(r)}{k!} &= (-1)^k \frac{(-r)(-r-1)(-r-2)\dotsm(-r-k+1)}{k!} \\ \label{*} &= (-1)^k\binom{-r}{k}.
  \end{align*}
  </math>
 
To understand the above definition of the probability mass function, note that the probability for every specific sequence of <math>k</math> failures and <math>r</math> successes is <math>p^r(1-p)^k</math>, because the outcomes of the <math>k</math> trials are supposed to happen independently. Since the <math>r</math><sup>th</sup> success comes last, it remains to choose the <math>k</math> trials with failures out of the remaining <math>r-1</math> trials. The above binomial coefficient gives precisely the number of all these sequences of length <math>k-1</math>.
 
===Extension to real-valued ''r''===
 
It is possible to extend the definition of the negative binomial distribution to the case of a positive [[wikipedia:real number|real]] parameter <math>r</math>. Although it is impossible to visualize a non-integer number of “successes”, we can still formally define the distribution through its probability mass function.
 
In the spirit of being consistent with the parametrizations found in <ref name="tables">https://www.soa.org/globalassets/assets/Files/Edu/2019/2019-02-exam-stam-tables.pdf</ref>, we consider the alternative parametrization defined implicitly by setting <math>p =  1(1+\beta)</math>. 
 
As before, we say that <math>N</math> has a '''negative binomial''' (or '''Pólya''') distribution if it has a probability mass function:<math display="block">
    f(k; r, \beta) \equiv \operatorname{P}(N = k) = \binom{k+r-1}{k} \frac{\beta^k}{(1 + \beta)^{r + k}} \quad\text{for }k = 0, 1, 2, \dotsc
  </math>
Here <math>r</math> is a real, positive number. The binomial coefficient is then defined by the [[wikipedia:binomial coefficient#Multiplicative formula|multiplicative formula]] and can also be rewritten using the [[wikipedia:gamma function|gamma function]]:<math display="block">
  \binom{k+r-1}{k} = \frac{(k+r-1)(k+r-2)\dotsm(r)}{k!} = \frac{\Gamma(k+r)}{k!\,\Gamma(r)}.
  </math>
 
To show that the probability mass function adds up to one, we have, by the [[wikipedia:binomial series|binomial series]]
 
<math display="block">
(1 + \beta)^{-r} = (1 - (1-p))^{-r}  =\sum_{k=0}^\infty(-1)^k\binom{-r}{k}(1-p)^k
= (1 + \beta)^r \,\sum_{k=0}^\infty \operatorname{P}(N = k).
</math>
 
Finally, the following [[wikipedia:recurrence relation|recurrence relation]] holds:
 
<math display="block">\begin{array}{l}
(k+1) \operatorname{P} (k+1)- (1-p) \operatorname{P} (k) (k+r)=0, \\
\operatorname{P} (0) = p^r.
\end{array}
</math>
 
==Notes==
{{Reflist|30em|refs=
<ref name=JKK157>Johnson, N.L., Kotz, S., Kemp, A.W. (1993) ''Univariate Discrete distributions'' (2nd edition). Wiley. ISBN 0-471-54897-9, p157</ref><ref name=JKK159>Johnson, N.L., Kotz, S., Kemp, A.W. (1993) ''Univariate Discrete distributions'' (2nd edition). Wiley. ISBN 0-471-54897-9, p159</ref>}}
 
==References==
*{{cite web |url= https://en.wikipedia.org/w/index.php?title=Binomial_distribution&oldid=1065569910 |title= Binomial distribution |author = Wikipedia contributors |website= Wikipedia |publisher= Wikipedia |access-date = 28 January 2022 }}
*{{cite web |url= https://en.wikipedia.org/w/index.php?title=Geometric_distribution&oldid=1061164164 |title= Geometric distribution |author = Wikipedia contributors |website= Wikipedia |publisher= Wikipedia |access-date = 28 January 2022 }}
*{{cite web |url= https://en.wikipedia.org/w/index.php?title=Poisson_distribution&oldid=1068368695 |title= Poisson distribution |author = Wikipedia contributors |website= Wikipedia |publisher= Wikipedia |access-date = 28 January 2022 }}
*{{cite web |url = https://en.wikipedia.org/w/index.php?title=Negative_binomial_distribution&oldid=898136399  | title= Negative binomial distribution | author = Wikipedia contributors |website= Wikipedia |publisher= Wikipedia |access-date = 17 February 2022 }}

Revision as of 23:13, 30 May 2022

Binomial

The binomial distribution with parameters [math]n[/math] and [math]p[/math] is the discrete probability distribution of the number of successes in a sequence of [math]n[/math] independent yes/no experiments, each of which yields success with probability [math]p[/math]. A success/failure experiment is also called a Bernoulli experiment or Bernoulli trial; when [math]n = 1[/math], the binomial distribution is a Bernoulli distribution. The binomial distribution is the basis for the popular binomial test of statistical significance.

The binomial distribution is frequently used to model the number of successes in a sample of size [math]n[/math] drawn with replacement from a population of size [math]N[/math]. If the sampling is carried out without replacement, the draws are not independent and so the resulting distribution is a hypergeometric distribution, not a binomial one. However, for [math]N[/math] much larger than [math]n[/math], the binomial distribution is a good approximation, and widely used.

Specification

Probability mass function

In general, if the random variable [math]X[/math] follows the binomial distribution with parameters [math]n[/math] ∈ ℕ and [math]p[/math] ∈ [0,1], we write [math]X \sim B(n,p)[/math]. The probability of getting exactly [math]k[/math] successes in [math]n[/math] trials is given by the probability mass function:

[[math]] f(k;n,p) = \operatorname{P}(X = k) = \binom n k p^k(1-p)^{n-k}[[/math]]

for [math]k = 0, \ldots, n[/math] where

[[math]]\binom n k =\frac{n!}{k!(n-k)!}[[/math]]

is the binomial coefficient, hence the name of the distribution. The formula can be understood as follows: we want exactly [math]k[/math] successes ([math]p^k[/math]) and [math]n-k[/math] failures ([math](1-p)^{-(n-k)}[/math]). However, the [math]k[/math] successes can occur anywhere among the [math]n[/math] trials, and there are [math]{n\choose k}[/math] different ways of distributing [math]k[/math] successes in a sequence of [math]n[/math] trials.

In creating reference tables for binomial distribution probability, usually the table is filled in up to [math]n/2[/math] values. This is because for [math]k \gt n/2[/math], the probability can be calculated by its complement as

[[math]]f(k,n,p)=f(n-k,n,1-p). [[/math]]

The probability mass function satisfies the following recurrence relation, for every [math]n,p[/math]:

[[math]]\left\{\begin{array}{l} p (n-k) f(k,n,p) = (k+1) (1-p) f(k+1,n,p), \\[10pt] f(0,n,p)=(1-p)^n \end{array}\right\}[[/math]]

Looking at the expression [math]f(k,n,p)[/math] as a function of [math]k[/math], there is a [math]k[/math] value that maximizes it. This [math]k[/math] value can be found by calculating

[[math]] \frac{f(k+1,n,p)}{f(k,n,p)}=\frac{(n-k)p}{(k+1)(1-p)} [[/math]]

and comparing it to 1. There is always an integer M that satisfies

[[math]](n+1)p-1 \leq M \lt (n+1)p.[[/math]]

[math]f(k,n,p)[/math] is monotone increasing for [math]k \lt M[/math] and monotone decreasing for [math]k \gt M[/math], with the exception of the case where [math](n+1)p[/math] is an integer. In this case, there are two values for which [math]f[/math] is maximal: [math](n+1)p[/math] and [math](n+1)p-1[/math]. [math]M[/math] is the most probable (most likely) outcome of the Bernoulli trials and is called the mode. Note that the probability of it occurring can be fairly small.

Cumulative distribution function

The cumulative distribution function can be expressed as:

[[math]]F(k;n,p) = \Pr(X \le k) = \sum_{i=0}^{\lfloor k \rfloor} {n\choose i}p^i(1-p)^{n-i}[[/math]]

where [math]\scriptstyle \lfloor k\rfloor\,[/math] is the "floor" under [math]k[/math], i.e. the greatest integer less than or equal to [math]k[/math].

Mean and Variance

If [math]X \sim B(n,p)[/math], that is, [math]X[/math] is a binomially distributed random variable, [math]n[/math] being the total number of experiments and [math]p[/math] the probability of each experiment yielding a successful result, then the expected value of [math]X[/math] is [math]np[/math] and the variance is [math]npq[/math]. This follows directly from the fact that [math]X[/math] is equal in distribution to the sum of [math]n[/math] independent Bernouilli random variables each having success probability [math]p[/math] (see below).

Mode

Usually the mode of a binomial [math]B(n,p)[/math] distribution is equal to [math]\lfloor (n+1)p\rfloor[/math], where [math]\lfloor\cdot\rfloor[/math] is the floor function. However, when[math](n+1)p[/math] is an integer and [math]p[/math] is neither 0 nor 1, then the distribution has two modes: [math](n+1)p[/math](n + 1)p and [math](n+1)p -1 [/math]. When [math]p[/math] is equal to 0 or 1, the mode will be 0 and [math]n[/math] correspondingly. These cases can be summarized as follows:

[[math]] \begin{cases} \lfloor (n+1)\,p\rfloor & \text{if }(n+1)p\text{ is 0 or a noninteger}, \\ (n+1)\,p\ \text{ and }\ (n+1)\,p - 1 &\text{if }(n+1)p\in\{1,\dots,n\}, \\ n & \text{if }(n+1)p = n + 1. \end{cases}[[/math]]

{{#Proof:View Proof|Mode|binomial/mode}}

Median

In general, there is no single formula to find the median for a binomial distribution, and it may even be non-unique. However several special results have been established:

  • If [math]np[/math] is an integer, then the mean, median, and mode coincide and equal [math]np[/math].[1][2]
  • Any median [math]m[/math] must lie within the interval ⌊[math]np[/math]⌋ ≤ [math]m[/math] ≤ ⌈[math]np[/math]⌉.[3]
  • A median [math]m[/math] cannot lie too far away from the mean: [math]m-np \leq \textrm{min}\{\ln(2),\textrm{max}\{p,1-p\}\}[/math].[4]
  • The median is unique and equal to [math]m=[/math]round([math]np[/math]) in cases when either [math]p\leq 1-\ln(2)[/math] or [math]p\geq \ln(2)[/math] or [math]|m-np| \leq \textrm{min}\{p, 1-p\}[/math] (except for the case when [math]p = 1/2 [/math] and [math]n[/math] is odd).[3][4]
  • When [math]p=1/2[/math] and [math]n[/math] is odd, any number [math]m[/math] in the interval [math][(n-1)/2,(n+1)/2][/math] is a median of the binomial distribution. If [math]p = 1/2[/math] and [math]n[/math] is even, then [math]m = n/2[/math] is the unique median.

Related distributions

Sums of binomials

If [math]X \sim B(n,p)[/math] and [math]Y \sim B(m, p) [/math] are independent binomial variables with the same probability [math]p[/math], then [math]X+Y [/math] is again a binomial variable: its distribution is [math]Z=X+Y \sim B(n+m, p)[/math].

Bernoulli distribution

The Bernoulli distribution is a special case of the binomial distribution, where [math] n = 1[/math]. Symbolically, [math]X \sim B(1,p)[/math] has the same meaning as [math]X \sim B(p) [/math]. Conversely, any binomial distribution, [math]B(n,p)[/math], is the distribution of the sum of [math]n[/math] Bernoulli trials, [math]B(p)[/math], each with the same probability [math]p[/math].

Normal approximation

If [math]n[/math] is large enough, then the skew of the distribution is not too great. In this case a reasonable approximation to [math]B(n,p)[/math] is given by the normal distribution [math] \mathcal{N}(np,\,np(1-p))[/math], and this basic approximation can be improved in a simple way by using a suitable continuity correction. The basic approximation generally improves as [math]n[/math] increases (at least 20) and is better when [math]p[/math] is not near to 0 or 1.[5] Various heuristics may be used to decide whether [math]n[/math] is large enough, and [math]p[/math] is far enough from the extremes of zero or one:

  • One rule is that both [math]x = np [/math] and [math]n-p[/math] must be greater than 5. However, the specific number varies from source to source, and depends on how good an approximation one wants; some sources give 10 which gives virtually the same results as the following rule for large [math]n[/math] until [math]n[/math] is very large.
  • A second rule[5] is that for [math]n\gt5[/math] the normal approximation is adequate if

[[math]]\left | \left (\frac{1}{\sqrt{n}} \right ) \left (\sqrt{\frac{1-p}{p}}-\sqrt{\frac{p}{1-p}} \right ) \right |\lt0.3[[/math]]

  • Another commonly used rule holds that the normal approximation is appropriate only if everything within 3 standard deviations of its mean is within the range of possible values, that is if

[[math]]\mu \pm 3 \sigma = np \pm 3 \sqrt{np(1-p)} \in [0,n].[[/math]]

The following is an example of applying a continuity correction. Suppose one wishes to calculate [math]\operatorname{P}(X \leq 8) [/math] for a binomial random variable [math]X[/math]. If [math]Y[/math] has a distribution given by the normal approximation, then [math]\operatorname{P}(X \leq 8 )[/math] is approximated by [math]\operatorname{P}(Y \leq 8.5 ) [/math]. The addition of 0.5 is the continuity correction; the uncorrected normal approximation gives considerably less accurate results.

This approximation, known as de Moivre–Laplace theorem, is a huge time-saver when undertaking calculations by hand (exact calculations with large [math]n[/math] are very onerous); historically, it was the first use of the normal distribution, introduced in Abraham de Moivre's book The Doctrine of Chances in 1738. Nowadays, it can be seen as a consequence of the central limit theorem since [math]B(n,p)[/math] is a sum of [math]n[/math] independent, identically distributed Bernoulli variables with parameter [math]p[/math]. This fact is the basis of a hypothesis test, a "proportion z-test", for the value of [math]p[/math] using [math]x/n[/math], the sample proportion and estimator of [math]p[/math], in a common test statistic.[6]

For example, suppose one randomly samples [math]n[/math] people out of a large population and ask them whether they agree with a certain statement. The proportion of people who agree will of course depend on the sample. If groups of [math]n[/math] people were sampled repeatedly and truly randomly, the proportions would follow an approximate normal distribution with mean equal to the true proportion [math]p[/math] of agreement in the population and with standard deviation [math]\sigma = \sqrt{\frac{p(1-p)}{n}}[/math]

Geometric

The geometric distribution is the probability distribution of the number of failures before the first success supported on the set { 0, 1, 2, 3, ... }, i.e, if [math]p[/math] denotes the probability of success on each trial then

[[math]]\operatorname{P}(Y=k) = (1 - p)^k\,p\,.[[/math]]

To retain consistency with the notation found in [7], we set [math]\beta = p/(1-p)[/math] and obtain:

[[math]]\begin{equation}\label{geometric}\operatorname{P}(Y=k) = \frac{\beta^k}{(1+\beta)^{k+1}}.\end{equation}[[/math]]

Going forward, we assume that a geometric distribution is characterized by \ref{geometric} and depends solely on [math]\beta[/math] which turns out to be the mean of the distribution.

Moments

The expected value of the geometrically distributed random variable [math]Y[/math] is [math]\beta[/math] and its variance is [math]\beta(\beta +1)[/math]:

[[math]] \operatorname{E}(Y) = \beta, \qquad\operatorname{Var}(Y) = \beta(1 + \beta). [[/math]]

Related distributions

  • The geometric distribution [math]Y[/math] is a special case of the negative binomial distribution, with [math]r = 1 [/math]. More generally, if [math]Y_1,\ldots,Y_r[/math] are independent geometrically distributed variables with parameter [math]p[/math], then the sum

[[math]]Z = \sum_{m=1}^r Y_m[[/math]]

follows a negative binomial distribution with parameters [math]r[/math] and [math]p[/math].[8]

  • If [math]Y_1,\ldots,Y_r[/math] are independent geometrically distributed variables (with possibly different success parameters [math]p_m[/math]), then their minimum

[[math]]W = \min_{m \in 1, \dots, r} Y_m\,[[/math]]

is also geometrically distributed, with parameter [math]p = 1-\prod_m(1-p_{m}).[/math]

  • Suppose [math]0 \lt r \lt 1 [/math], and for [math] k = 1,2,3,\ldots [/math] the random variable [math]X_k[/math] has a Poisson distribution with expected value [math]r^k[/math]. Then

[[math]]\sum_{k=1}^\infty k\,X_k[[/math]]

has a geometric distribution taking values in the set {0, 1, 2, ...}, with expected value [math]r/(1-r)[/math].

  • The exponential distribution is the continuous analogue of the geometric distribution. If [math]X[/math] is an exponentially distributed random variable with parameter [math]\lambda[/math], then

[[math]]Y = \lfloor X \rfloor,[[/math]]

where [math]\lfloor \quad \rfloor[/math] is the floor (or greatest integer) function, is a geometrically distributed random variable with parameter [math]p = 1- e^{-\lambda} [/math] (thus [math]\lambda = - \ln(1-p) [/math][9]) and taking values in the set {0, 1, 2, ...}.

Poisson Distribution

The Poisson distribution , named after French mathematician Siméon Denis Poisson, is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time and/or space if these events occur with a known average rate and independently of the time since the last event.[10] The Poisson distribution can also be used for the number of events in other specified intervals such as distance, area or volume. Within the context of insurance, the Poisson distribution can be used to model the number (frequency) of claims during a given time period.

Definition

A discrete random variable [math]X[/math] is said to have a Poisson distribution with parameter [math]\lambda \gt 0[/math], if, for [math]k = 0, 1, \ldots [/math], the probability mass function of [math]X[/math] is given by[11]

[[math]]\!p_k= \operatorname{P}(X = k)= \frac{\lambda^k e^{-\lambda}}{k!}.[[/math]]

The probability mass function satisfies the following recurrence relation:

[[math]]\left\{\begin{array}{l} (k+1) p_{k+1}-\lambda p_{k}=0, \\ p_{0}=e^{-\lambda} \end{array}\right\}. [[/math]]

Properties

Mean

[[math]]\operatorname{E}|X-\lambda|= 2\exp(-\lambda) \frac{\lambda^{\lfloor\lambda\rfloor + 1}}{ \lfloor\lambda\rfloor!} .[[/math]]

  • The mode of a Poisson-distributed random variable with non-integer [math]\lambda[/math] is equal to [math]\scriptstyle\lfloor \lambda \rfloor[/math], which is the largest integer less than or equal to [math]\lambda[/math]. This is also written as floor([math]λ[/math]). When [math]λ[/math] is a positive integer, the modes are [math]\lambda[/math] and [math]\lambda-1[/math].

Median

Bounds for the median ([math]ν[/math]) of the distribution are known and are sharp:[13]

[[math]] \lambda - \ln 2 \le \nu \lt \lambda + \frac{1}{3}. [[/math]]

Higher moments

[[math]] m_k = \sum_{i=1}^k \lambda^i \left\{\begin{matrix} k \\ i \end{matrix}\right\},[[/math]]

where the {braces} denote Stirling numbers of the second kind.[14] The coefficients of the polynomials have a combinatorial meaning. In fact, when the expected value of the Poisson distribution is 1, then Dobinski's formula says that the [math]n[/math]th moment equals the number of partitions of a set of size [math]n[/math].

  • If [math]X_i \sim \operatorname{Pois}(\lambda_i)[/math] are independent and [math]\lambda=\sum_{i=1}^n \lambda_i[/math], then [math]Y = \left( \sum_{i=1}^n X_i \right) \sim \operatorname{Pois}(\lambda)[/math].[15] A converse is Raikov's theorem, which says that if the sum of two independent random variables is Poisson-distributed, then so is each of those two independent random variables.[16]

Other properties

  • Bounds for the tail probabilities of a Poisson random variable [math] X \sim \operatorname{Pois}(\lambda)[/math] can be derived using a Chernoff bound argument:[19]

[[math]] \begin{cases} \operatorname{P}(X \geq x) \leq \frac{e^{-\lambda} (e \lambda)^x}{x^x} & \text{ for } x \gt \lambda \\ \operatorname{P}(X \leq x) \leq \frac{e^{-\lambda} (e \lambda)^x}{x^x} & \text{ for } x \lt \lambda \,\, . \end{cases} [[/math]]

Negative Binomial

The negative binomial distribution is a discrete probability distribution of the number of failures in a sequence of independent and identically distributed Bernoulli trials before a specified number of successes (denoted [math]r[/math]) occurs. More precisely, suppose there is a sequence of independent Bernoulli trials. Thus, each trial has two potential outcomes called “success” and “failure”. In each trial the probability of success is [math]p[/math] and of failure is [math]1-p[/math]. We are observing this sequence until a predefined number [math]r[/math] of successes has occurred.

Probability Mass Function

The probability mass function of the negative binomial distribution is

[[math]] f(k; r, q) \equiv \operatorname{P}(N = k) = \binom{k+r-1}{k} (1-p)^kp^r \quad\text{for }k = 0, 1, 2, \dotsc [[/math]]

The binomial coefficient can be written in the following manner, explaining the name “negative binomial”:

[[math]] \begin{align*} \frac{(k+r-1)\dotsm(r)}{k!} &= (-1)^k \frac{(-r)(-r-1)(-r-2)\dotsm(-r-k+1)}{k!} \\ \label{*} &= (-1)^k\binom{-r}{k}. \end{align*} [[/math]]

To understand the above definition of the probability mass function, note that the probability for every specific sequence of [math]k[/math] failures and [math]r[/math] successes is [math]p^r(1-p)^k[/math], because the outcomes of the [math]k[/math] trials are supposed to happen independently. Since the [math]r[/math]th success comes last, it remains to choose the [math]k[/math] trials with failures out of the remaining [math]r-1[/math] trials. The above binomial coefficient gives precisely the number of all these sequences of length [math]k-1[/math].

Extension to real-valued r

It is possible to extend the definition of the negative binomial distribution to the case of a positive real parameter [math]r[/math]. Although it is impossible to visualize a non-integer number of “successes”, we can still formally define the distribution through its probability mass function.

In the spirit of being consistent with the parametrizations found in [20], we consider the alternative parametrization defined implicitly by setting [math]p = 1(1+\beta)[/math].

As before, we say that [math]N[/math] has a negative binomial (or Pólya) distribution if it has a probability mass function:

[[math]] f(k; r, \beta) \equiv \operatorname{P}(N = k) = \binom{k+r-1}{k} \frac{\beta^k}{(1 + \beta)^{r + k}} \quad\text{for }k = 0, 1, 2, \dotsc [[/math]]

Here [math]r[/math] is a real, positive number. The binomial coefficient is then defined by the multiplicative formula and can also be rewritten using the gamma function:

[[math]] \binom{k+r-1}{k} = \frac{(k+r-1)(k+r-2)\dotsm(r)}{k!} = \frac{\Gamma(k+r)}{k!\,\Gamma(r)}. [[/math]]

To show that the probability mass function adds up to one, we have, by the binomial series

[[math]] (1 + \beta)^{-r} = (1 - (1-p))^{-r} =\sum_{k=0}^\infty(-1)^k\binom{-r}{k}(1-p)^k = (1 + \beta)^r \,\sum_{k=0}^\infty \operatorname{P}(N = k). [[/math]]

Finally, the following recurrence relation holds:

[[math]]\begin{array}{l} (k+1) \operatorname{P} (k+1)- (1-p) \operatorname{P} (k) (k+r)=0, \\ \operatorname{P} (0) = p^r. \end{array} [[/math]]

Notes

  1. Neumann, P. (1966). "Über den Median der Binomial- and Poissonverteilung" (in German). Wissenschaftliche Zeitschrift der Technischen Universität Dresden 19: 29–33. 
  2. Lord, Nick. (July 2010). "Binomial averages when the mean is an integer", The Mathematical Gazette 94, 331-332.
  3. 3.0 3.1 "Mean, Median and Mode in Binomial Distributions" (1980). Statistica Neerlandica 34 (1): 13–18. doi:10.1111/j.1467-9574.1980.tb00681.x. 
  4. 4.0 4.1 "The smallest uniform upper bound on the distance between the mean and the median of the binomial and Poisson distributions" (1995). Statistics & Probability Letters 23: 21–25. doi:10.1016/0167-7152(94)00090-U. 
  5. 5.0 5.1 Box, Hunter and Hunter (1978). Statistics for experimenters. Wiley. p. 130.
  6. NIST/SEMATECH, "7.2.4. Does the proportion of defectives meet requirements?" e-Handbook of Statistical Methods.
  7. https://www.soa.org/files/edu/edu-exam-c-tables-cont-dist.pdf
  8. Pitman, Jim. Probability (1993 edition). Springer Publishers. pp 372.
  9. http://www.wolframalpha.com/input/?i=inverse+p+%3D+1+-+e^-l
  10. Frank A. Haight (1967). Handbook of the Poisson Distribution. New York: John Wiley & Sons.CS1 maint: ref=harv (link)
  11. Probability and Stochastic Processes: A Friendly Introduction for Electrical and Computer Engineers, Roy D. Yates, David Goodman, page 60.
  12. 12.0 12.1 Johnson, N.L., Kotz, S., Kemp, A.W. (1993) Univariate Discrete distributions (2nd edition). Wiley. ISBN 0-471-54897-9, p157
  13. Choi KP (1994) On the medians of Gamma distributions and an equation of Ramanujan. Proc Amer Math Soc 121 (1) 245–251
  14. Riordan, John (1937). "Moment recurrence relations for binomial, Poisson and hypergeometric frequency distributions". Annals of Mathematical Statistics 8: 103–111. doi:10.1214/aoms/1177732430.  Also see Haight (1967), p. 6.
  15. E. L. Lehmann (1986). Testing Statistical Hypotheses (second ed.). New York: Springer Verlag. ISBN 0-387-94919-4.CS1 maint: ref=harv (link) page 65.
  16. Raikov, D. (1937). On the decomposition of Poisson laws. Comptes Rendus (Doklady) de l' Academie des Sciences de l'URSS, 14, 9–11. (The proof is also given in von Mises, Richard (1964). Mathematical Theory of Probability and Statistics. New York: Academic Press.CS1 maint: ref=harv (link))
  17. Laha, R. G.; Rohatgi, V. K. Probability Theory. New York: John Wiley & Sons. p. 233. ISBN 0-471-03262-X.CS1 maint: ref=harv (link)
  18. Johnson, N.L., Kotz, S., Kemp, A.W. (1993) Univariate Discrete distributions (2nd edition). Wiley. ISBN 0-471-54897-9, p159
  19. Michael Mitzenmacher; Eli Upfal. Probability and Computing: Randomized Algorithms and Probabilistic Analysis. Cambridge University Press. p. 97. ISBN 0521835402.CS1 maint: ref=harv (link)
  20. https://www.soa.org/globalassets/assets/Files/Edu/2019/2019-02-exam-stam-tables.pdf

References