guide:E54e650503: Difference between revisions

From Stochiki
No edit summary
mNo edit summary
 
(3 intermediate revisions by the same user not shown)
Line 6: Line 6:
\newcommand{\NA}{{\rm NA}}
\newcommand{\NA}{{\rm NA}}
\newcommand{\mathds}{\mathbb}</math></div>
\newcommand{\mathds}{\mathbb}</math></div>
label{sec 3.2}
 
Having mastered permutations, we now consider combinations.  Let <math>U</math> be a set with <math>n</math>
Having mastered permutations, we now consider combinations.  Let <math>U</math> be a set with <math>n</math>
elements; we want to count the number of distinct subsets of the set
elements; we want to count the number of distinct subsets of the set
<math>U</math> that have exactly <math>j</math> elements.  The empty set and the set <math>U</math> are considered to
<math>U</math> that have exactly <math>j</math> elements.  The empty set and the set <math>U</math> are considered to
be subsets of <math>U</math>.  The empty set is usually denoted by <math>\phi</math>.
be subsets of <math>U</math>.  The empty set is usually denoted by <math>\phi</math>.
<span id="exam 3.7"/>
<span id="exam 3.7"/>
'''Example'''  
'''Example'''  
Let <math>U = \{a,b,c\}</math>.  The subsets of <math>U</math> are


<math display="block">
<math display="block">
\phi,\ \{a\},\ \{b\},\ \{c\},\ \{a,b\},\ \{a,c\},\ \{b,c\},\ \{a,b,c\}\ .
\phi,\ \{a\},\ \{b\},\ \{c\},\ \{a,b\},\ \{a,c\},\ \{b,c\},\ \{a,b,c\}\ .
</math>
</math>


===Binomial Coefficients===
===Binomial Coefficients===
Line 29: Line 29:
elements.  Thus, <math>{3 \choose 0} = 1</math>, <math>{3 \choose 1} = 3</math>, <math>{3 \choose 2} = 3</math>,
elements.  Thus, <math>{3 \choose 0} = 1</math>, <math>{3 \choose 1} = 3</math>, <math>{3 \choose 2} = 3</math>,
and <math>{3 \choose 3} = 1</math>.  Note that there are <math>2^3 = 8</math> subsets in all.  (We have
and <math>{3 \choose 3} = 1</math>.  Note that there are <math>2^3 = 8</math> subsets in all.  (We have
already seen that a set with <math>n</math>  elements has <math>2^n</math> subsets; see Exercise \ref{sec 3.1}.\ref{exer
already seen that a set with <math>n</math>  elements has <math>2^n</math> subsets; (see [[exercise:233bb6143e|Exercise]]). It follows that
3.1.8}.) It follows that


<math display="block">
<math display="block">
Line 50: Line 49:
\label{eq 3.3}                                                                               
\label{eq 3.3}                                                                               
\end{equation}
\end{equation}
</math>\n|We wish to choose a subset of <math>j</math> elements.  Choose an element <math>u</math> of <math>U</math>.  
</math>|We wish to choose a subset of <math>j</math> elements.  Choose an element <math>u</math> of <math>U</math>.  
Assume first that we do not want <math>u</math> in the subset.  Then we must choose the
Assume first that we do not want <math>u</math> in the subset.  Then we must choose the
<math>j</math> elements from a set of <math>n - 1</math> elements; this can be done in <math>{{n-1} \choose j}</math>  
<math>j</math> elements from a set of <math>n - 1</math> elements; this can be done in <math>{{n-1} \choose j}</math>  
Line 58: Line 57:
the number of ways that we can choose a subset of <math>j</math> elements is the sum of the
the number of ways that we can choose a subset of <math>j</math> elements is the sum of the
number of subsets of <math>j</math> elements which have
number of subsets of <math>j</math> elements which have
<math>u</math> as a member and the number which do not---this is what [[#eq 3.3 |Equation]] states.}}
<math>u</math> as a member and the number which do not---this is what Equation \ref{eq 3.3} states.}}
The binomial coefficient <math>n \choose j</math> is defined to be 0, if <math>j  <  0</math> or if <math>j  >  n</math>.  With
The binomial coefficient <math>n \choose j</math> is defined to be 0, if <math>j  <  0</math> or if <math>j  >  n</math>.  With
this definition, the restrictions on <math>j</math> in [[#thm 3.6 |Theorem]] are unnecessary.
this definition, the restrictions on <math>j</math> in [[#thm 3.6 |Theorem]] are unnecessary.


===Pascal's Triangle===
===Pascal's Triangle===
The [[#eq 3.3 |relation]],                                  
The [[#eq 3.3 |relation]], together with the knowledge that
together with the knowledge that


<math display="block">
<math display="block">
Line 71: Line 69:
determines completely the numbers <math>n \choose j</math>. We can use these relations to
determines completely the numbers <math>n \choose j</math>. We can use these relations to
determine the famous ''triangle of Pascal,'' which exhibits all these
determine the famous ''triangle of Pascal,'' which exhibits all these
numbers in matrix form (see Figure \ref{fig 3.6}).           
numbers in matrix form (see [[#fig 3.6|Figure]]).           
<div id="PSfig3-6" class="d-flex justify-content-center">
 
[[File:guide_e6d15_PSfig3-6.ps | 400px | thumb | ]]
<div id="fig 3.6" class="d-flex justify-content-center">
[[File:guide_e6d15_PSfig3-6.png | 600px | thumb |Pascal's triangle.]]
</div>   
</div>   
The <math>n</math>th row of this triangle has the entries <math>n \choose 0</math>, <math>n \choose 1</math>,\dots, <math>n
 
The <math>n</math>th row of this triangle has the entries <math>n \choose 0</math>, <math>n \choose 1</math>,..., <math>n
\choose n</math>.  We know that the first and last of these numbers are 1.  The remaining
\choose n</math>.  We know that the first and last of these numbers are 1.  The remaining
numbers are determined by the recurrence relation  
numbers are determined by the recurrence relation Equation \ref{eq 3.3}; that is, the
[[#eq 3.3 |Equation]]; that is, the
entry  <math>{n \choose j}</math> for <math>0  <  j  <  n</math> in the <math>n</math>th row of Pascal's
entry  <math>{n \choose j}</math> for <math>0  <  j  <  n</math> in the <math>n</math>th row of Pascal's
triangle is the ''sum'' of the entry immediately above and the one immediately to
triangle is the ''sum'' of the entry immediately above and the one immediately to
its left in the <math>(n - 1)</math>st row.  For example, <math>{5 \choose 2} = 6 + 4 = 10</math>.
its left in the <math>(n - 1)</math>st row.  For example, <math>{5 \choose 2} = 6 + 4 = 10</math>.
This algorithm for constructing Pascal's triangle can be used to write a computer
This algorithm for constructing Pascal's triangle can be used to write a computer
program to compute the binomial coefficients.  You are asked to do this in
program to compute the binomial coefficients.  You are asked to do this in [[exercise:2f7db43fe0 |Exercise]].
Exercise [[exercise:2f7db43fe0 |Exercise]].
 


While Pascal's triangle provides a way to construct recursively the binomial
While Pascal's triangle provides a way to construct recursively the binomial
Line 96: Line 93:
\label{eq 3.4}   
\label{eq 3.4}   
\end{equation}
\end{equation}
</math>\n|Each subset of size <math>j</math> of a set of size <math>n</math> can be ordered in <math>j!</math> ways. Each
</math>|Each subset of size <math>j</math> of a set of size <math>n</math> can be ordered in <math>j!</math> ways. Each
of these orderings is a <math>j</math>-permutation of the set of size <math>n</math>.  The number of
of these orderings is a <math>j</math>-permutation of the set of size <math>n</math>.  The number of
<math>j</math>-permutations is <math>(n)_j</math>, so the number of subsets of size <math>j</math> is
<math>j</math>-permutations is <math>(n)_j</math>, so the number of subsets of size <math>j</math> is
Line 114: Line 111:
{n \choose j} = {n \choose {n-j}}\ .
{n \choose j} = {n \choose {n-j}}\ .
</math>
</math>
When using [[#eq 3.4 |Equation]] in the calculation of <math>{n \choose j}</math>,  if one alternates
When using Equation \ref{eq 3.4} in the calculation of <math>{n \choose j}</math>,  if one alternates
the multiplications and divisions, then all of the intermediate values in the
the multiplications and divisions, then all of the intermediate values in the
calculation are integers.  Furthermore, none of these intermediate values exceed the
calculation are integers.  Furthermore, none of these intermediate values exceed the
final value. (See Exercise [[exercise:C2ad1f21c9 |Exercise]].)
final value. (See [[exercise:C2ad1f21c9 |Exercise]].)




Another point that should be made concerning [[#eq 3.4 |Equation]] is that if it is  
Another point that should be made concerning Equation \ref{eq 3.4} is that if it is  
used to ''define'' the binomial coefficients, then it is no longer necessary to require
used to ''define'' the binomial coefficients, then it is no longer necessary to require
<math>n</math> to be a positive integer.  The variable <math>j</math> must still be a non-negative integer under
<math>n</math> to be a positive integer.  The variable <math>j</math> must still be a non-negative integer under
Line 149: Line 146:
<math>3744/2598960 = .0014</math>.  Thus, while both types of hands are unlikely, you are six
<math>3744/2598960 = .0014</math>.  Thus, while both types of hands are unlikely, you are six
times more likely to obtain a full house than four of a kind.
times more likely to obtain a full house than four of a kind.
\pagebreak[4]


===Bernoulli Trials===
===Bernoulli Trials===
Our principal use of the binomial coefficients will occur in the study of one of the
Our principal use of the binomial coefficients will occur in the study of one of the
important chance processes called ''Bernoulli trials.''
important chance processes called ''Bernoulli trials.''
{{defncard|label=|id=def 3.4| A ''Bernoulli trials process'' is a sequence of
{{defncard|label=|id=def 3.4|A ''Bernoulli trials process'' is a sequence of
<math>n</math> chance experiments such that
<math>n</math> chance experiments such that
<ul><li> Each experiment has two possible outcomes, which we may call  ''success''
<ul><li> Each experiment has two possible outcomes, which we may call  ''success''
Line 164: Line 160:
</li>
</li>
</ul>}}
</ul>}}
<div id="PSfig3-7" class="d-flex justify-content-center">
<div id="fig 3.7" class="d-flex justify-content-center">
[[File:guide_e6d15_PSfig3-7.ps | 400px | thumb | ]]
[[File:guide_e6d15_PSfig3-7.png | 600px | thumb | Tree diagram of three Bernoulli trials. ]]
</div>  
</div>  
<span id="exam 3.9"/>
<span id="exam 3.9"/>
Line 188: Line 184:
assign a probability distribution to the paths in this tree.  Suppose, for example, that we
assign a probability distribution to the paths in this tree.  Suppose, for example, that we
have three Bernoulli trials.  The possible outcomes are indicated in the tree diagram
have three Bernoulli trials.  The possible outcomes are indicated in the tree diagram
shown in Figure \ref{fig 3.7}.  We define <math>X</math> to be the random variable which
shown in [[#fig 3.7|Figure]].  We define <math>X</math> to be the random variable which
represents the outcome of the process, i.e., an ordered triple of S's and F's. The
represents the outcome of the process, i.e., an ordered triple of S's and F's. The
probabilities assigned to the branches of the tree represent the probability for each
probabilities assigned to the branches of the tree represent the probability for each
Line 234: Line 230:
b(n,p,j) = {n \choose j} p^j q^{n - j}
b(n,p,j) = {n \choose j} p^j q^{n - j}
</math>
</math>
where <math>q = 1 - p</math>.\n|We construct a tree measure as described above.  We want to find the sum of the
where <math>q = 1 - p</math>.|We construct a tree measure as described above.  We want to find the sum of the
probabilities for all paths which have exactly <math>j</math> successes and <math>n - j</math> failures.  
probabilities for all paths which have exactly <math>j</math> successes and <math>n - j</math> failures.  
Each such path is assigned a probability <math>p^j q^{n - j}</math>.  How many such paths are
Each such path is assigned a probability <math>p^j q^{n - j}</math>.  How many such paths are
Line 265: Line 261:


To compute binomial probabilities using the computer, multiply the function
To compute binomial probabilities using the computer, multiply the function
choose<math>(n,k)</math> by <math>p^kq^{n - k}</math>.  The program '''
choose<math>(n,k)</math> by <math>p^kq^{n - k}</math>.  The program '''BinomialProbabilities''' prints out the binomial probabilities
BinomialProbabilities''' prints out the binomial probabilities
<math>b(n, p, k)</math> for <math>k</math> between <math>kmin</math> and <math>kmax</math>, and the sum of these probabilities.  We have run this program for <math>n = 100</math>, <math>p = 1/2</math>, <math>kmin = 45</math>, and <math>kmax = 55</math>; the output is shown in
<math>b(n, p, k)</math> for <math>k</math> between <math>kmin</math> and <math>kmax</math>, and the sum of these probabilities.  We have run this
program for <math>n = 100</math>, <math>p = 1/2</math>, <math>kmin = 45</math>, and <math>kmax = 55</math>; the output is shown in
[[#table 3.27 |Table]].  Note that the individual probabilities are quite small.  The probability of
[[#table 3.27 |Table]].  Note that the individual probabilities are quite small.  The probability of
exactly 50 heads in 100 tosses of a coin is about .08.  Our intuition tells us that this is the most
exactly 50 heads in 100 tosses of a coin is about .08.  Our intuition tells us that this is the most
Line 303: Line 297:


===Binomial Distributions===
===Binomial Distributions===
{{defncard|label=|id=def 3.5| Let <math>n</math> be a positive integer, and let <math>p</math> be a real
{{defncard|label=|id=def 3.5|Let <math>n</math> be a positive integer, and let <math>p</math> be a real
number between 0 and 1.  Let <math>B</math> be the random variable which counts the number of
number between 0 and 1.  Let <math>B</math> be the random variable which counts the number of
successes in a Bernoulli trials process with parameters <math>n</math> and <math>p</math>.  Then the
successes in a Bernoulli trials process with parameters <math>n</math> and <math>p</math>.  Then the
distribution <math>b(n, p, k)</math> of <math>B</math> is called the ''binomial distribution''.}}
distribution <math>b(n, p, k)</math> of <math>B</math> is called the ''binomial distribution''.}}
We can get a better idea about the binomial distribution by graphing this distribution
We can get a better idea about the binomial distribution by graphing this distribution
for different values of <math>n</math> and <math>p</math> (see Figure \ref{fig 3.8}).  The plots in this figure
for different values of <math>n</math> and <math>p</math> (see [[#fig 3.8|Figure]]).  The plots in this figure
were generated using the program ''' BinomialPlot'''.
were generated using the program ''' BinomialPlot'''.
<div id="PSfig3-8" class="d-flex justify-content-center">
<div id="fig 3.8" class="d-flex justify-content-center">
[[File:guide_e6d15_PSfig3-8.ps | 400px | thumb | ]]
[[File:guide_e6d15_PSfig3-8.png | 600px | thumb | Binomial distributions. ]]
</div>  
</div>  


Line 319: Line 313:
[[guide:Ee45340c30#chp 9 |Chapter]].  We also note that the highest probability occurs around the
[[guide:Ee45340c30#chp 9 |Chapter]].  We also note that the highest probability occurs around the
value <math>np</math>, but that these highest probabilities get smaller as <math>n</math> increases.  We
value <math>np</math>, but that these highest probabilities get smaller as <math>n</math> increases.  We
shall see in Chapter \ref{chp 6} that <math>np</math> is the ''mean'' or ''expected''
shall see in Chapter [[guide:E4fd10ce73|Expected Value and Variance]] that <math>np</math> is the ''mean'' or ''expected''
value of the binomial distribution  <math>b(n,p,k)</math>.
value of the binomial distribution  <math>b(n,p,k)</math>.


Line 336: Line 330:


We have run the program for the case of 20 rows of pins and 10,00 shots being
We have run the program for the case of 20 rows of pins and 10,00 shots being
dropped.  We show the result of this simulation in Figure \ref{fig 2.22}.  
dropped.  We show the result of this simulation in [[#fig 2.22|Figure]].  




<div id="PSfig2-22" class="d-flex justify-content-center">
<div id="fig 2.22" class="d-flex justify-content-center">
[[File:guide_e6d15_PSfig2-22.ps | 400px | thumb | ]]
[[File:guide_e6d15_PSfig2-22.png | 600px | thumb | Simulation of the Galton board. ]]
</div>  
</div>  
Note that if we write 0 every time the shot is deflected to the left, and 1
Note that if we write 0 every time the shot is deflected to the left, and 1
Line 348: Line 342:




The distribution shown in Figure \ref{fig 2.22} is an example of an empirical
The distribution shown in [[#fig 2.22|Figure]] is an example of an empirical
distribution, in the sense that it comes about by means of a sequence of
distribution, in the sense that it comes about by means of a sequence of
experiments.  As expected, this empirical distribution resembles the corresponding
experiments.  As expected, this empirical distribution resembles the corresponding
Line 357: Line 351:
<span id="exam 3.12"/>
<span id="exam 3.12"/>
'''Example'''  
'''Example'''  
effective against headaches 60 percent of the time, and that a drug company claims
Suppose that ordinary aspirin has been found effective against headaches 60 percent of the time, and that a drug company claims
that its new aspirin with a special headache additive is more effective.  We can test
that its new aspirin with a special headache additive is more effective.  We can test this claim as follows: we call their claim the ''alternate hypothesis,'' and its negation, that the additive has no appreciable effect, the ''null hypothesis.''  Thus
this claim as follows: we call their claim the ''alternate hypothesis,'' and its
negation, that the additive has no appreciable effect, the ''null hypothesis.''  Thus
the null hypothesis is that <math>p = .6</math>, and the alternate hypothesis is that <math>p  >  .6</math>,
the null hypothesis is that <math>p = .6</math>, and the alternate hypothesis is that <math>p  >  .6</math>,
where <math>p</math> is the probability that the new aspirin is effective.
where <math>p</math> is the probability that the new aspirin is effective.


We give the aspirin to <math>n</math> people to take when they have a headache.  We want to
We give the aspirin to <math>n</math> people to take when they have a headache.  We want to
Line 370: Line 361:
it.  How should we determine this critical value?
it.  How should we determine this critical value?


 
First note that we can make two kinds of errors.  The first, often called a ''type 1 error'' in statistics, is to reject the null
First note that we can make two kinds of errors.  The first, often called a ''
type 1 error'' in statistics, is to reject the null
hypothesis when in fact it is true.  The second, called a ''type 2 error,'' is
hypothesis when in fact it is true.  The second, called a ''type 2 error,'' is
to accept the null hypothesis when it is false.  To determine the probability of both these types of
to accept the null hypothesis when it is false.  To determine the probability of both these types of
Line 423: Line 412:
selected values of
selected values of
<math>m</math>, the function <math>\alpha(p)</math>, for <math>p</math> ranging from .4 to 1.  The result is shown in
<math>m</math>, the function <math>\alpha(p)</math>, for <math>p</math> ranging from .4 to 1.  The result is shown in
Figure \ref{fig 3.9}.  We include in our graph a box (in dotted lines) from .6 to .8,
[[#fig 3.9|Figure]].  We include in our graph a box (in dotted lines) from .6 to .8,
with bottom and top at heights .05 and .95.  Then a value for <math>m</math> satisfies our
with bottom and top at heights .05 and .95.  Then a value for <math>m</math> satisfies our
requirements if and only if the graph of <math>\alpha</math> enters the box from the bottom, and
requirements if and only if the graph of <math>\alpha</math> enters the box from the bottom, and
Line 433: Line 422:
on avoiding a type 1 error we favor 73, and similarly we favor 69 if we regard a
on avoiding a type 1 error we favor 73, and similarly we favor 69 if we regard a
type 2 error as worse.  Of course, the drug company may not be happy with having as
type 2 error as worse.  Of course, the drug company may not be happy with having as
much as a 5 percent chance of an error.  They might insist on having a 1 percent
much as a 5 percent chance of an error.  They might insist on having a 1 percent chance of an error.  For this we would have to increase the number <math>n</math> of trials (see [[exercise:Ad462f2fca |Exercise]]).
chance of an error.  For this we would have to increase the number <math>n</math> of trials (see
Exercise [[exercise:Ad462f2fca |Exercise]]).


<div id="PSfig3-9" class="d-flex justify-content-center">
<div id="fig 3.9" class="d-flex justify-content-center">
[[File:guide_e6d15_PSfig3-9.ps | 400px | thumb | ]]
[[File:guide_e6d15_PSfig3-9.png | 600px | thumb | The power curve. ]]
</div>  
</div>  


Line 449: Line 436:
<math display="block">
<math display="block">
(a + b)^n = \sum_{j = 0}^n {n \choose j} a^j b^{n - j}\ .
(a + b)^n = \sum_{j = 0}^n {n \choose j} a^j b^{n - j}\ .
</math>\n|To see that this expansion is correct, write
</math>|To see that this expansion is correct, write


<math display="block">
<math display="block">
Line 474: Line 461:
{{proofcard|Corollary|cor_3.1|The sum of the elements in the <math>n</math>th row of
{{proofcard|Corollary|cor_3.1|The sum of the elements in the <math>n</math>th row of
Pascal's triangle is <math>2^n</math>.  If the elements in the <math>n</math>th row of Pascal's triangle
Pascal's triangle is <math>2^n</math>.  If the elements in the <math>n</math>th row of Pascal's triangle
are added with alternating signs, the sum is 0.\n|The first statement in the corollary follows from the fact that
are added with alternating signs, the sum is 0.|The first statement in the corollary follows from the fact that


<math display="block">
<math display="block">
Line 489: Line 476:
<math>n</math> elements is <math>2^n</math>.  We shall use the second statement in our next application of
<math>n</math> elements is <math>2^n</math>.  We shall use the second statement in our next application of
the binomial theorem.
the binomial theorem.
We have seen that, when <math>A</math> and <math>B</math> are any two events (cf. Section \ref{sec 1.2}),   
We have seen that, when <math>A</math> and <math>B</math> are any two events (cf. [[guide:C9e774ade5|Section]] ),   


<math display="block">
<math display="block">
Line 518: Line 505:
That is, to find the probability that at least one of <math>n</math> events <math>A_i</math> occurs, first add the
That is, to find the probability that at least one of <math>n</math> events <math>A_i</math> occurs, first add the
probability of each event, then subtract the probabilities of all possible two-way
probability of each event, then subtract the probabilities of all possible two-way
intersections, add the probability of all three-way intersections, and so forth.\n|If the outcome <math>\omega</math> occurs in at least one of the events <math>A_i</math>, its probability is added
intersections, add the probability of all three-way intersections, and so forth.|If the outcome <math>\omega</math> occurs in at least one of the events <math>A_i</math>, its probability is added
exactly once by the left side of [[#eq 3.5 |Equation]].  We must show that it is added
exactly once by the left side of Equation \ref{eq 3.5}.  We must show that it is added
exactly once by the right side of  [[#eq 3.5 |Equation]].  Assume that <math>\omega</math> is in
exactly once by the right side of  Equation \ref{eq 3.5}.  Assume that <math>\omega</math> is in
exactly <math>k</math> of the sets.  Then its probability is added <math>k</math> times in the first term,
exactly <math>k</math> of the sets.  Then its probability is added <math>k</math> times in the first term,
subtracted <math>k \choose 2</math>  times in the second, added <math>k \choose 3</math> times in the third
subtracted <math>k \choose 2</math>  times in the second, added <math>k \choose 3</math> times in the third
Line 545: Line 532:
<span id="exam 3.13"/>
<span id="exam 3.13"/>
'''Example'''  
'''Example'''  
discussed in Section \ref{sec 3.1}, that is, the problem of finding the probability that a random
We return to the hat check problem discussed in [[guide:1cf65e65b3|Section]], that is, the problem of finding the probability that a random
permutation contains  at least one fixed point.  Recall that a permutation is a
permutation contains  at least one fixed point.  Recall that a permutation is a
one-to-one map of a set
one-to-one map of a set
Line 553: Line 540:
1)</math> objects.  Since there are <math>(n - 1)!</math> such permutations, <math>P(A_i) = (n - 1)!/n! =
1)</math> objects.  Since there are <math>(n - 1)!</math> such permutations, <math>P(A_i) = (n - 1)!/n! =
1/n</math>.  Since there are <math>n</math> choices for
1/n</math>.  Since there are <math>n</math> choices for
<math>a_i</math>, the first term of [[#eq 3.5 |Equation]] is 1.  In
<math>a_i</math>, the first term of Equation \ref{eq 3.5} is 1.  In
the same way, to have a particular pair <math>(a_i,a_j)</math> fixed, we can choose  any
the same way, to have a particular pair <math>(a_i,a_j)</math> fixed, we can choose  any
permutation of the remaining <math>n - 2</math> elements; there are <math>(n - 2)!</math> such choices and
permutation of the remaining <math>n - 2</math> elements; there are <math>(n - 2)!</math> such choices and
Line 561: Line 548:
P(A_i \cap A_j) = \frac{(n - 2)!}{n!} = \frac 1{n(n - 1)}\ .
P(A_i \cap A_j) = \frac{(n - 2)!}{n!} = \frac 1{n(n - 1)}\ .
</math>
</math>
The number of terms of this form in the right side of [[#eq 3.5 |Equation]] is  
The number of terms of this form in the right side of Equation \ref{eq 3.5} is  
<math display="block">
<math display="block">
{n
{n
\choose 2} = \frac{n(n - 1)}{2!}\ .
\choose 2} = \frac{n(n - 1)}{2!}\ .
</math>
</math>
Hence, the second term of [[#eq 3.5 |Equation]] is
Hence, the second term of Equation \ref{eq 3.5} is


<math display="block">
<math display="block">
Line 581: Line 568:
{n \choose 3} = \frac{n(n - 1)(n - 2)}{3!}\ ,
{n \choose 3} = \frac{n(n - 1)(n - 2)}{3!}\ ,
</math>
</math>
making the third term of [[#eq 3.5 |Equation]] equal to
making the third term of Equation \ref{eq 3.5} equal to
1/3!.  Continuing in this way, we obtain
1/3!.  Continuing in this way, we obtain


Line 642: Line 629:
our simulations reported in the previous section.
our simulations reported in the previous section.


===Choosing a Sample Space===


===Choosing a Sample Space===
We now have some of the tools needed to accurately describe sample spaces and to assign
We now have some of the tools needed to accurately describe sample spaces and to assign
probability functions to those sample spaces.  Nevertheless, in some cases, the
probability functions to those sample spaces.  Nevertheless, in some cases, the
Line 660: Line 647:
types of elementary particles can be  modelled using ''“Fermi-Dirac
types of elementary particles can be  modelled using ''“Fermi-Dirac
statistics.”''  Feller says:  
statistics.”''  Feller says:  
<blockquote> We have here an instructive example of the impossibility of selecting or
<blockquote> We have here an instructive example of the impossibility of selecting or justifying probability models by ''a priori'' arguments.  In fact, no pure reasoning could tell that photons and protons would not obey the same probability laws.
justifying probability models by ''a priori'' arguments.  In fact, no pure
reasoning could tell that photons and protons would not obey the same probability laws.
</blockquote>
</blockquote>
We now give some examples of this description and assignment process.
We now give some examples of this description and assignment process.
<span id="exam 3.14"/>
<span id="exam 3.14"/>
'''Example'''  
'''Example'''  
Line 684: Line 671:
<span id="exam 3.15"/>
<span id="exam 3.15"/>
'''Example'''  
'''Example'''  
are several “reasonable” ways to describe the sample space.  One way is to  count the
Suppose two pennies are flipped once each. There are several “reasonable” ways to describe the sample space.  One way is to  count the
number of heads in the outcome; in this case, the sample space can be written <math>\{0, 1,
number of heads in the outcome; in this case, the sample space can be written <math>\{0, 1,
2\}</math>.  Another description of the sample space is the set of all  ordered pairs of
2\}</math>.  Another description of the sample space is the set of all  ordered pairs of
Line 697: Line 684:
this is the way reality works.
this is the way reality works.


The reader is also referred to [[exercise:3603a3b6c0 |Exercise]] for another example of this process.


===Historical Remarks===
The binomial coefficients have a long and colorful history leading up to Pascal's ''Treatise on the Arithmetical Triangle,''<ref group="Notes" >B. Pascal, ''Traité du Triangle Arithmétique'' (Paris: Desprez, 1665).</ref> where Pascal developed many important properties of these numbers.  This history is set forth in the book ''Pascal's Arithmetical Triangle'' by A. W. F. Edwards.<ref group="Notes" >A. W. F. Edwards, ''Pascal's Arithmetical Triangle'' (London: Griffin, 1987).</ref>  Pascal wrote his triangle in the form shown in [[#table 3.8 |Table]].


The reader is also referred to Exercise [[exercise:3603a3b6c0 |Exercise]] for another example of this process.
===Historical Remarks===
The binomial coefficients have a long and colorful history leading up to Pascal's
''Treatise on the Arithmetical Triangle,''<ref group="Notes" >B. Pascal, ''Traité du Triangle
Arithmétique'' (Paris: Desprez, 1665).</ref> where Pascal developed many important
properties of these numbers.  This history is set forth in the book ''Pascal's
Arithmetical Triangle'' by A. W. F. Edwards.<ref group="Notes" >A. W. F. Edwards, ''
Pascal's Arithmetical Triangle'' (London: Griffin, 1987).</ref>  Pascal wrote his triangle in the form shown in [[#table 3.8 |Table]].
<span id="table 3.8"/>
<span id="table 3.8"/>
{|class="table"
{| class="table table-borderless text-center"
|+ Pascal's triangle.
|+ Pascal's triangle.
|-
|-
|1 || 1 || 1 || 1 || 1 || 1   || 1 || 1 || 1 || 1\cr 1 || 2 || 3 || 4 || 5 || 6   || 7 || 8 || 9 \cr  1 || 3 || 6 || 10 || 15 || 21 || 28 || 36 \cr  1 || 4 || 10 || 20 || 35 || 56 || 84 \cr  1 || 5 || 15 || 35 || 70 || 126 \cr  1 || 6 || 21 || 56 ||126 \cr  1 || 7 || 28 || 84 \cr  1 || 8 || 36 \cr  1 || 9 \cr  1 \cr
|1 || 1 || 1 || 1 || 1 || 1 || 1 || 1 || 1 || 1
|-
| 1 || 2 || 3 || 4 || 5 || 6 || 7 || 8 || 9  
|-
| 1 || 3 || 6 || 10 || 15 || 21 || 28 || 36
|-
| 1 || 4 || 10 || 20 || 35 || 56 || 84  
|-
| 1 || 5 || 15 || 35 || 70 || 126
|-
| 1 || 6 || 21 || 56 || 126
|-
| 1 || 7 || 28 || 84
|-
| 1 || 8 || 36
|-
| 1 || 9
|}
|}


 
Edwards traces three different ways that the binomial coefficients arose.  He refers to these as the ''figurate numbers,'' the ''combinatorial numbers,'' and the ''binomial numbers.''  They are all names for the same thing (which we have called binomial coefficients) but that they are all the same was not appreciated until the sixteenth century. The ''figurate numbers'' date back to the Pythagorean interest in number
Edwards traces three different ways that the binomial coefficients arose.  He
patterns around 540 BC.  The Pythagoreans considered, for example, triangular patterns shown in [[#fig 3.10|Figure]].  The sequence of numbers
refers to these as the ''figurate numbers,'' the ''combinatorial
numbers,'' and the ''binomial numbers.''  They are all names for the same thing (which we have
called binomial coefficients) but that they are all the same was not appreciated until
the sixteenth century.
The ''figurate numbers'' date back to the Pythagorean interest in number
patterns around 540 {\scriptsize{BC}.}   The Pythagoreans considered, for example,
triangular patterns shown in Figure \ref{fig 3.10}.  The sequence of numbers


<math display="block">
<math display="block">
1, 3, 6, 10, \dots
1, 3, 6, 10, \dots
</math>
</math>
obtained as the number of points in each triangle are called ''triangular
obtained as the number of points in each triangle are called ''triangular numbers.''  From the triangles it is clear that the <math>n</math>th triangular
numbers.''  From the triangles it is clear that the <math>n</math>th triangular
number is simply the sum of the first <math>n</math> integers.  The ''tetrahedral numbers'' are the sums of the triangular numbers and were obtained by the Greek mathematicians Theon and Nicomachus at the beginning of the second
number is simply the sum of the first <math>n</math> integers.  The ''tetrahedral
century BC.    The tetrahedral number 10, for example, has the geometric
numbers'' are the sums of the triangular numbers and were obtained by the
representation shown in [[#fig 3.11|Figure]].  The first three types of figurate numbers can be
Greek mathematicians Theon and Nicomachus at the beginning of the second
century {\scriptsize{BC}.}   The tetrahedral number 10, for example, has the geometric
representation shown in Figure \ref{fig 3.11}.  The first three types of figurate numbers can be
represented in tabular form as shown in [[#table 3.9 |Table]].
represented in tabular form as shown in [[#table 3.9 |Table]].
<div id="PSfig3-10" class="d-flex justify-content-center">
<div id="Pythagorean triangular patterns." class="d-flex justify-content-center">
[[File:guide_e6d15_PSfig3-10.ps | 400px | thumb | ]]
[[File:guide_e6d15_PSfig3-10.png | 600px | thumb | Pythagorean triangular patterns. ]]
</div>  
</div>  
<div id="PSfig3-11" class="d-flex justify-content-center">
<div id="fig3.11" class="d-flex justify-content-center">
[[File:guide_e6d15_PSfig3-11.ps | 400px | thumb | ]]
[[File:guide_e6d15_PSfig3-11.png | 600px | thumb |Geometric representation of the tetrahedral number 10. ]]
</div>  
</div>  


<span id="table 3.9"/>
<span id="table 3.9"/>
{|class="table"
{| class="table"
|+ Figurate numbers.
|-
|||\multicolumn{6}{c}{if each one staken 256 in}
|-
|From my opponent's 256    ||  6    || 5    || 4    || 3    || 2    || 1   
|-
|positions I get, for the  || games  || games || games || games || games || games
|-
|\cline{2-7}1st game                  || 63    || 70    || 80    || 96    || 128  || 256 
|-
|2nd game                  || 63    || 70    || 80    || 96    || 128 
|-
|3rd game                  || 56    || 60    || 64    || 64
|-
|-
|4th game                  || 42    || 40    || 32   
| Natural numbers || 1 || 2 || 3 || 4 || 5 || 6 || 7 || 8 || 9
|-
|-
|5th game                  || 24    || 16 
|Triangular numbers || 1 || 3 || 6 || 10 || 15 || 21 || 28 || 36 || 45
|-
|-
|6th game                  || \makebox[.15in][r]{8}         
| Tetrahedral numbers || 1 || 4 || 10 || 20 || 35 || 56 || 84 || 120 || 165
|}
|}


Line 779: Line 753:
{n \choose r} = \frac{(n)_r}{r!}\ .
{n \choose r} = \frac{(n)_r}{r!}\ .
</math>
</math>
<div id="PSfig3-33" class="d-flex justify-content-center">
<div id="fig 3.12" class="d-flex justify-content-center">
[[File:guide_e6d15_PSfig3-33.ps | 400px | thumb |  ]]
[[File:guide_e6d15_PSfig3-33.png | 400px | thumb | Chu Shih-chieh's triangle. [From J. Needham, ''Science
and Civilization in China,''  vol. 3 (New York: Cambridge University Press, 1959), p. 135.
Reprinted with permission. ]]
</div>
</div>


Line 788: Line 764:
tabulation system for unlocking binomial coefficients.”  The triangle to provide the
tabulation system for unlocking binomial coefficients.”  The triangle to provide the
coefficients up to the eighth power is given by Chu Shih-chieh in a book
coefficients up to the eighth power is given by Chu Shih-chieh in a book
written around 1303 (see Figure \ref{fig 3.12}).<ref group="Notes" >J.\ Needham, ''Science and
written around 1303 (see [[#fig 3.12|Figure]]).<ref group="Notes" >J. Needham, ''Science and
Civilization in China,'' vol. 3 (New York: Cambridge University Press, 1959), p. 135.</ref>  
Civilization in China,'' vol. 3 (New York: Cambridge University Press, 1959), p. 135.</ref>  
The original manuscript of Chu's book has been lost, but copies have survived.
The original manuscript of Chu's book has been lost, but copies have survived.
Line 801: Line 777:
Misure'' (Vinegia, 1556).</ref>  For one die the answer is clearly 6.  For two dice the
Misure'' (Vinegia, 1556).</ref>  For one die the answer is clearly 6.  For two dice the
possibilities may be displayed as shown in [[#table 3.10 |Table]].
possibilities may be displayed as shown in [[#table 3.10 |Table]].
<span id="table 3.10"/>
<span id="table 3.10"/>
{|class="table"
{| class="table table-borderless text-center"
|+ Outcomes for the roll of two dice.
|-
|-
|||\multicolumn{6}{c}{if each one staken 256 in}
| 11 
|-
|-
|From my opponent's 256    ||   6    || 5    || 4    || 3    || 2    || 1   
| 12 || 22
|-
|-
|positions I get, for the  || games  || games || games || games || games || games
|13 ||23 || 33
|-
|-
|\cline{2-7}1st game                  || 63    || 70    || 80    || 96    || 128  || 256 
| 14 || 24 || 34 || 44
|-
|-
|2nd game                  || 63    || 70    || 80    || 96    || 128 
| 15 || 25 || 35|| 45 || 55
|-
|-
|3rd game                  || 56    || 60    || 64    || 64
| 16 || 26 || 36 || 46 || 56 || 66
|-
|4th game                  || 42    || 40    || 32   
|-
|5th game                  || 24    || 16 
|-
|6th game                  || \makebox[.15in][r]{8}         
|}
|}
Displaying them this way suggests the sixth triangular number <math>1 + 2 + 3 + 4 + 5 + 6 =
Displaying them this way suggests the sixth triangular number <math>1 + 2 + 3 + 4 + 5 + 6 =
21</math> for the throw of 2 dice.  Tartaglia “on the first day of Lent, 1523, in Verona,
21</math> for the throw of 2 dice.  Tartaglia “on the first day of Lent, 1523, in Verona,
having thought about the problem all night,”<ref group="Notes" >Quoted in Edwards, op.\ cit., p. 37.</ref> realized
having thought about the problem all night,”<ref group="Notes" >Quoted in Edwards, op. cit., p. 37.</ref> realized
that the extension of the figurate table gave the answers for <math>n</math> dice.  The problem had suggested
that the extension of the figurate table gave the answers for <math>n</math> dice.  The problem had suggested
itself to Tartaglia from watching people casting their own horoscopes by means of a '' Book of
itself to Tartaglia from watching people casting their own horoscopes by means of a '' Book of
Line 849: Line 820:
letters with Fermat concerning a problem known as the problem
letters with Fermat concerning a problem known as the problem
of points.  This problem, and the correspondence between Pascal and
of points.  This problem, and the correspondence between Pascal and
Fermat, were discussed in Chapter \ref{chp 1}.  The reader will recall that this problem can be
Fermat, were discussed in Chapter [[guide:4f3a4e96c3|Discrete Probability Distributions]].  The reader will recall that this problem can be
described as follows: Two players A and B are playing a sequence of games and the first player to win
described as follows: Two players A and B are playing a sequence of games and the first player to win
<math>n</math> games wins the match.  It is desired to find the probability that A wins the match at a time when
<math>n</math> games wins the match.  It is desired to find the probability that A wins the match at a time when
A has won
A has won <math>a</math> games and B has won <math>b</math> games.  (See [[exercise:92bbd1fbc0|Exercise]] - [[exercise:1427ecdd1c|Exercise]].)
<math>a</math> games and B has won <math>b</math> games.  (See Exercises \ref{sec [[guide:448d2aa013#exer 5.1.11 |4.1}.]]-\ref{sec
[[guide:448d2aa013#exer 5.1.13 |4.1}.]].)
 


Pascal solved the problem by backward induction, much the way we would do today in writing a
Pascal solved the problem by backward induction, much the way we would do today in writing a
Line 883: Line 851:
His proof of this seems to use Fermat's formula and the fact that the above ratio
His proof of this seems to use Fermat's formula and the fact that the above ratio
of products of odd to products of even numbers is equal to the probability of exactly
of products of odd to products of even numbers is equal to the probability of exactly
<math>n</math> heads in <math>2n</math> tosses of a coin.  (See Exercise [[exercise:802078ad82 |Exercise]].)  
<math>n</math> heads in <math>2n</math> tosses of a coin.  (See [[exercise:802078ad82 |Exercise]].)  




Pascal presented Fermat with the table shown in [[#table 3.11 |Table]].   
Pascal presented Fermat with the table shown in [[#table 3.11 |Table]].   
<span id="table 3.11"/>
<span id="table 3.11"/>
{|class="table"
 
{|class="table table-borderless"
|+ Pascal's solution for the problem of points.
|+ Pascal's solution for the problem of points.
|-
|-
|||\multicolumn{6}{c}{if each one staken 256 in}
|
|-
|colspan="6" class="text-center" |If each one staken 256 in  
|From my opponent's 256    ||  6    || 5    || 4    || 3    || 2    || 1   
|-
|-
|positions I get, for the || games || games || games || games || games || games
|From my opponent's 256 positions I get, for the
|class="border-bottom"|6 games
|class="border-bottom" |5 games
|class="border-bottom" |4 games
|class="border-bottom" |3 games
|class="border-bottom" |2 games  
|class="border-bottom" |1 game   
|-
|-
|\cline{2-7}1st game                  || 63    || 70    || 80    || 96    || 128  || 256   
|1st game                  || 63    || 70    || 80    || 96    || 128  || 256   
|-
|-
|2nd game                  || 63    || 70    || 80    || 96    || 128   
|2nd game                  || 63    || 70    || 80    || 96    || 128   
Line 907: Line 881:
|5th game                  || 24    || 16   
|5th game                  || 24    || 16   
|-
|-
|6th game                  || \makebox[.15in][r]{8}         
|6th game                  || 8    
|}               
|}               
He states:
He states:
<blockquote>
<blockquote>
\indent You will see as always, that the value of the first game is equal to that of
You will see as always, that the value of the first game is equal to that of
the second which is easily shown by combinations.  You will see, in the same way, that
the second which is easily shown by combinations.  You will see, in the same way, that
the numbers in the first line are always increasing; so also are those in the second;
the numbers in the first line are always increasing; so also are those in the second;
and those in the third.  But those in the fourth line are decreasing, and those in the
and those in the third.  But those in the fourth line are decreasing, and those in the
fifth, etc.  This seems odd.<ref group="Notes" >F.\ N. David, op.\ cit., p. 235.</ref>
fifth, etc.  This seems odd.<ref group="Notes" >F. N. David, op. cit., p. 235.</ref>
</blockquote>
</blockquote>


Line 928: Line 903:
brought together all the different aspects of the numbers in the Pascal triangle as
brought together all the different aspects of the numbers in the Pascal triangle as
known in 1654, and, as Edwards states, “That the Arithmetical Triangle should bear
known in 1654, and, as Edwards states, “That the Arithmetical Triangle should bear
Pascal's name cannot be disputed.”<ref group="Notes" >A. W. F. Edwards, op.\ cit., p. ix.</ref>
Pascal's name cannot be disputed.”<ref group="Notes" >A. W. F. Edwards, op. cit., p. ix.</ref>
 
 
The first serious study of the binomial distribution was undertaken by James Bernoulli
in his ''Ars Conjectandi'' published in 1713.<ref group="Notes" >J. Bernoulli,
''Ars Conjectandi'' (Basil: Thurnisiorum, 1713).</ref>  We shall return to this work in the
historical remarks in Chapter \ref{chp 8}.
\exercises


The first serious study of the binomial distribution was undertaken by James Bernoulli in his ''Ars Conjectandi'' published in 1713.<ref group="Notes" >J. Bernoulli, ''Ars Conjectandi'' (Basil: Thurnisiorum, 1713).</ref>  We shall return to this work in the historical remarks in Chapter [[guide:1cf65e65b3|Law of Large Numbers]].


==General references==
==General references==

Latest revision as of 02:42, 12 June 2024

[math] \newcommand{\NA}{{\rm NA}} \newcommand{\mat}[1]{{\bf#1}} \newcommand{\exref}[1]{\ref{##1}} \newcommand{\secstoprocess}{\all} \newcommand{\NA}{{\rm NA}} \newcommand{\mathds}{\mathbb}[/math]

Having mastered permutations, we now consider combinations. Let [math]U[/math] be a set with [math]n[/math] elements; we want to count the number of distinct subsets of the set [math]U[/math] that have exactly [math]j[/math] elements. The empty set and the set [math]U[/math] are considered to be subsets of [math]U[/math]. The empty set is usually denoted by [math]\phi[/math].

Example Let [math]U = \{a,b,c\}[/math]. The subsets of [math]U[/math] are

[[math]] \phi,\ \{a\},\ \{b\},\ \{c\},\ \{a,b\},\ \{a,c\},\ \{b,c\},\ \{a,b,c\}\ . [[/math]]

Binomial Coefficients

The number of distinct subsets with [math]j[/math] elements that can be chosen from a set with [math]n[/math] elements is denoted by [math]{n \choose j}[/math], and is pronounced “[math]n[/math] choose [math]j[/math].” The number [math]n \choose j[/math] is called a binomial coefficient. This terminology comes from an application to algebra which will be discussed later in this section. In the above example, there is one subset with no elements, three subsets with exactly 1 element, three subsets with exactly 2 elements, and one subset with exactly 3 elements. Thus, [math]{3 \choose 0} = 1[/math], [math]{3 \choose 1} = 3[/math], [math]{3 \choose 2} = 3[/math], and [math]{3 \choose 3} = 1[/math]. Note that there are [math]2^3 = 8[/math] subsets in all. (We have already seen that a set with [math]n[/math] elements has [math]2^n[/math] subsets; (see Exercise). It follows that

[[math]] {3 \choose 0} + {3 \choose 1} + {3 \choose 2} + {3 \choose 3} = 2^3 = 8\ , [[/math]]

[[math]] {n \choose 0} = {n \choose n} = 1\ . [[/math]]

Assume that [math]n \gt 0[/math]. Then, since there is only one way to choose a set with no elements and only one way to choose a set with [math]n[/math] elements, the remaining values of [math]n \choose j[/math] are determined by the following recurrence relation:

Theorem

For integers [math]n[/math] and [math]j[/math], with [math]0 \lt j \lt n[/math], the binomial coefficients satisfy:

[[math]] \begin{equation} {n \choose j} = {{n-1} \choose j} + {{n-1} \choose {j - 1}}\ . \label{eq 3.3} \end{equation} [[/math]]

Show Proof

We wish to choose a subset of [math]j[/math] elements. Choose an element [math]u[/math] of [math]U[/math]. Assume first that we do not want [math]u[/math] in the subset. Then we must choose the [math]j[/math] elements from a set of [math]n - 1[/math] elements; this can be done in [math]{{n-1} \choose j}[/math] ways. On the other hand, assume that we do want [math]u[/math] in the subset. Then we must choose the other [math]j - 1[/math] elements from the remaining [math]n - 1[/math] elements of [math]U[/math]; this can be done in [math]{{n-1} \choose {j - 1}}[/math] ways. Since [math]u[/math] is either in our subset or not, the number of ways that we can choose a subset of [math]j[/math] elements is the sum of the number of subsets of [math]j[/math] elements which have [math]u[/math] as a member and the number which do not---this is what Equation \ref{eq 3.3} states.

The binomial coefficient [math]n \choose j[/math] is defined to be 0, if [math]j \lt 0[/math] or if [math]j \gt n[/math]. With this definition, the restrictions on [math]j[/math] in Theorem are unnecessary.

Pascal's Triangle

The relation, together with the knowledge that

[[math]] {n \choose 0} = {n \choose n }= 1\ , [[/math]]

determines completely the numbers [math]n \choose j[/math]. We can use these relations to determine the famous triangle of Pascal, which exhibits all these numbers in matrix form (see Figure).

Pascal's triangle.

The [math]n[/math]th row of this triangle has the entries [math]n \choose 0[/math], [math]n \choose 1[/math],..., [math]n \choose n[/math]. We know that the first and last of these numbers are 1. The remaining numbers are determined by the recurrence relation Equation \ref{eq 3.3}; that is, the entry [math]{n \choose j}[/math] for [math]0 \lt j \lt n[/math] in the [math]n[/math]th row of Pascal's triangle is the sum of the entry immediately above and the one immediately to its left in the [math](n - 1)[/math]st row. For example, [math]{5 \choose 2} = 6 + 4 = 10[/math]. This algorithm for constructing Pascal's triangle can be used to write a computer program to compute the binomial coefficients. You are asked to do this in Exercise.

While Pascal's triangle provides a way to construct recursively the binomial coefficients, it is also possible to give a formula for [math]n \choose j[/math].

Theorem

The binomial coefficients are given by the formula

[[math]] \begin{equation} {n \choose j }= \frac{(n)_j}{j!}\ . \label{eq 3.4} \end{equation} [[/math]]

Show Proof

Each subset of size [math]j[/math] of a set of size [math]n[/math] can be ordered in [math]j![/math] ways. Each of these orderings is a [math]j[/math]-permutation of the set of size [math]n[/math]. The number of [math]j[/math]-permutations is [math](n)_j[/math], so the number of subsets of size [math]j[/math] is

[[math]] \frac{(n)_j}{j!}\ . [[/math]]
This completes the proof.

The above formula can be rewritten in the form

[[math]] {n \choose j} = \frac{n!}{j!(n-j)!}\ . [[/math]]

This immediately shows that

[[math]] {n \choose j} = {n \choose {n-j}}\ . [[/math]]

When using Equation \ref{eq 3.4} in the calculation of [math]{n \choose j}[/math], if one alternates the multiplications and divisions, then all of the intermediate values in the calculation are integers. Furthermore, none of these intermediate values exceed the final value. (See Exercise.)


Another point that should be made concerning Equation \ref{eq 3.4} is that if it is used to define the binomial coefficients, then it is no longer necessary to require [math]n[/math] to be a positive integer. The variable [math]j[/math] must still be a non-negative integer under this definition. This idea is useful when extending the Binomial Theorem to general exponents. (The Binomial Theorem for non-negative integer exponents is given below as Theorem.)

Poker Hands

Example kind beats a full house. A poker hand is a random subset of 5 elements from a deck of 52 cards. A hand has four of a kind if it has four cards with the same value---for example, four sixes or four kings. It is a full house if it has three of one value and two of a second---for example, three twos and two queens. Let us see which hand is more likely. How many hands have four of a kind? There are 13 ways that we can specify the value for the four cards. For each of these, there are 48 possibilities for the fifth card. Thus, the number of four-of-a-kind hands is [math]13 \cdot 48 = 624[/math]. Since the total number of possible hands is [math]{52 \choose 5} = 2598960[/math], the probability of a hand with four of a kind is [math]624/2598960 = .00024[/math].


Now consider the case of a full house; how many such hands are there? There are 13 choices for the value which occurs three times; for each of these there are [math]{4 \choose 3} = 4[/math] choices for the particular three cards of this value that are in the hand. Having picked these three cards, there are 12 possibilities for the value which occurs twice; for each of these there are [math]{4 \choose 2} = 6[/math] possibilities for the particular pair of this value. Thus, the number of full houses is [math]13 \cdot 4 \cdot 12 \cdot 6 = 3744[/math], and the probability of obtaining a hand with a full house is [math]3744/2598960 = .0014[/math]. Thus, while both types of hands are unlikely, you are six times more likely to obtain a full house than four of a kind.

Bernoulli Trials

Our principal use of the binomial coefficients will occur in the study of one of the important chance processes called Bernoulli trials.

Definition

A Bernoulli trials process is a sequence of [math]n[/math] chance experiments such that

  • Each experiment has two possible outcomes, which we may call success and failure.
  • The probability [math]p[/math] of success on each experiment is the same for each experiment, and this probability is not affected by any knowledge of previous outcomes. The probability [math]q[/math] of failure is given by [math]q = 1 - p[/math].

Tree diagram of three Bernoulli trials.

Example

  • A coin is tossed ten times. The two possible outcomes are heads and tails. The probability of heads on any one toss is 1/2.
  • An opinion poll is carried out by asking 1000 people, randomly chosen from the population, if they favor the Equal Rights Amendment---the two outcomes being yes and no. The probability [math]p[/math] of a yes answer (i.e., a success) indicates the proportion of people in the entire population that favor this amendment.
  • A gambler makes a sequence of 1-dollar bets, betting each time on black at roulette at Las Vegas. Here a success is winning 1 dollar and a failure is losing 1 dollar. Since in American roulette the gambler wins if the ball stops on one of 18 out of 38 positions and loses otherwise, the probability of winning is [math]p = 18/38 = .474[/math].

To analyze a Bernoulli trials process, we choose as our sample space a binary tree and assign a probability distribution to the paths in this tree. Suppose, for example, that we have three Bernoulli trials. The possible outcomes are indicated in the tree diagram shown in Figure. We define [math]X[/math] to be the random variable which represents the outcome of the process, i.e., an ordered triple of S's and F's. The probabilities assigned to the branches of the tree represent the probability for each individual trial. Let the outcome of the [math]i[/math]th trial be denoted by the random variable [math]X_i[/math], with distribution function [math]m_i[/math]. Since we have assumed that outcomes on any one trial do not affect those on another, we assign the same probabilities at each level of the tree. An outcome [math]\omega[/math] for the entire experiment will be a path through the tree. For example, [math]\omega_3[/math] represents the outcomes SFS. Our frequency interpretation of probability would lead us to expect a fraction [math]p[/math] of successes on the first experiment; of these, a fraction [math]q[/math] of failures on the second; and, of these, a fraction [math]p[/math] of successes on the third experiment. This suggests assigning probability [math]pqp[/math] to the outcome [math]\omega_3[/math]. More generally, we assign a distribution function [math]m(\omega)[/math] for paths [math]\omega[/math] by defining [math]m(\omega)[/math] to be the product of the branch probabilities along the path [math]\omega[/math]. Thus, the probability that the three events S on the first trial, F on the second trial, and S on the third trial occur is the product of the probabilities for the individual events. We shall see in the next chapter that this means that the events involved are independent in the sense that the knowledge of one event does not affect our prediction for the occurrences of the other events.

Binomial Probabilities

We shall be particularly interested in the probability that in [math]n[/math] Bernoulli trials there are exactly [math]j[/math] successes. We denote this probability by [math]b(n,p,j)[/math]. Let us calculate the particular value [math]b(3,p,2)[/math] from our tree measure. We see that there are three paths which have exactly two successes and one failure, namely [math]\omega_2[/math], [math]\omega_3[/math], and [math]\omega_5[/math]. Each of these paths has the same probability [math]p^2q[/math]. Thus [math]b(3,p,2) = 3p^2q[/math]. Considering all possible numbers of successes we have

[[math]] \begin{eqnarray*} b(3,p,0) &=& q^3\ ,\\ b(3,p,1) &=& 3pq^2\ ,\\ b(3,p,2) &=& 3p^2q\ ,\\ b(3,p,3) &=& p^3\ . \end{eqnarray*} [[/math]]

We can, in the same manner, carry out a tree measure for [math]n[/math] experiments and determine [math]b(n,p,j)[/math] for the general case of [math]n[/math] Bernoulli trials.

Theorem

Given [math]n[/math] Bernoulli trials with probability [math]p[/math] of success on each experiment, the probability of exactly [math]j[/math] successes is

[[math]] b(n,p,j) = {n \choose j} p^j q^{n - j} [[/math]]
where [math]q = 1 - p[/math].

Show Proof

We construct a tree measure as described above. We want to find the sum of the probabilities for all paths which have exactly [math]j[/math] successes and [math]n - j[/math] failures. Each such path is assigned a probability [math]p^j q^{n - j}[/math]. How many such paths are there? To specify a path, we have to pick, from the [math]n[/math] possible trials, a subset of [math]j[/math] to be successes, with the remaining [math]n-j[/math] outcomes being failures. We can do this in [math]n \choose j[/math] ways. Thus the sum of the probabilities is

[[math]] b(n,p,j) = {n \choose j} p^j q^{n - j}\ . [[/math]]

Example probability that exactly three heads turn up? The answer is

[[math]] b(6,.5,3) = {6 \choose 3} \left(\frac12\right)^3 \left(\frac12\right)^3 = 20 \cdot \frac1{64} = .3125\ . [[/math]]

Example that we obtain exactly one 6? We treat this as Bernoulli trials with success = “rolling a 6” and failure = “rolling some number other than a 6.” Then [math]p = 1/6[/math], and the probability of exactly one success in four trials is

[[math]] b(4,1/6,1) = {4 \choose 1 }\left(\frac16\right)^1 \left(\frac56\right)^3 = .386\ . [[/math]]

To compute binomial probabilities using the computer, multiply the function choose[math](n,k)[/math] by [math]p^kq^{n - k}[/math]. The program BinomialProbabilities prints out the binomial probabilities [math]b(n, p, k)[/math] for [math]k[/math] between [math]kmin[/math] and [math]kmax[/math], and the sum of these probabilities. We have run this program for [math]n = 100[/math], [math]p = 1/2[/math], [math]kmin = 45[/math], and [math]kmax = 55[/math]; the output is shown in Table. Note that the individual probabilities are quite small. The probability of exactly 50 heads in 100 tosses of a coin is about .08. Our intuition tells us that this is the most likely outcome, which is correct; but, all the same, it is not a very likely outcome.

Binomial probabilities for [math]n = 100,\ p = 1/2[/math].
[math]k[/math] [math]b(n,p,k)[/math]
45 .0485
46 .0580
47 .0666
48 .0735
49 .0780
50 .0796
51 .0780
52 .0735
53 .0666
54 .0580
55 .0485


Binomial Distributions

Definition

Let [math]n[/math] be a positive integer, and let [math]p[/math] be a real number between 0 and 1. Let [math]B[/math] be the random variable which counts the number of successes in a Bernoulli trials process with parameters [math]n[/math] and [math]p[/math]. Then the distribution [math]b(n, p, k)[/math] of [math]B[/math] is called the binomial distribution.

We can get a better idea about the binomial distribution by graphing this distribution for different values of [math]n[/math] and [math]p[/math] (see Figure). The plots in this figure were generated using the program BinomialPlot.

Binomial distributions.


We have run this program for [math]p = .5[/math] and [math]p = .3[/math]. Note that even for [math]p = .3[/math] the graphs are quite symmetric. We shall have an explanation for this in Chapter. We also note that the highest probability occurs around the value [math]np[/math], but that these highest probabilities get smaller as [math]n[/math] increases. We shall see in Chapter Expected Value and Variance that [math]np[/math] is the mean or expected value of the binomial distribution [math]b(n,p,k)[/math].


The following example gives a nice way to see the binomial distribution, when [math]p = 1/2[/math]. Example A Galton board is a board in which a large number of BB-shots are dropped from a chute at the top of the board and deflected off a number of pins on their way down to the bottom of the board. The final position of each slot is the result of a number of random deflections either to the left or the right. We have written a program GaltonBoard to simulate this experiment.


We have run the program for the case of 20 rows of pins and 10,00 shots being dropped. We show the result of this simulation in Figure.


Simulation of the Galton board.

Note that if we write 0 every time the shot is deflected to the left, and 1 every time it is deflected to the right, then the path of the shot can be described by a sequence of 0's and 1's of length [math]n[/math], just as for the [math]n[/math]-fold coin toss.


The distribution shown in Figure is an example of an empirical distribution, in the sense that it comes about by means of a sequence of experiments. As expected, this empirical distribution resembles the corresponding binomial distribution with parameters [math]n = 20[/math] and [math]p = 1/2[/math].


Hypothesis Testing

Example Suppose that ordinary aspirin has been found effective against headaches 60 percent of the time, and that a drug company claims that its new aspirin with a special headache additive is more effective. We can test this claim as follows: we call their claim the alternate hypothesis, and its negation, that the additive has no appreciable effect, the null hypothesis. Thus the null hypothesis is that [math]p = .6[/math], and the alternate hypothesis is that [math]p \gt .6[/math], where [math]p[/math] is the probability that the new aspirin is effective.

We give the aspirin to [math]n[/math] people to take when they have a headache. We want to find a number [math]m[/math], called the critical value for our experiment, such that we reject the null hypothesis if at least [math]m[/math] people are cured, and otherwise we accept it. How should we determine this critical value?

First note that we can make two kinds of errors. The first, often called a type 1 error in statistics, is to reject the null hypothesis when in fact it is true. The second, called a type 2 error, is to accept the null hypothesis when it is false. To determine the probability of both these types of errors we introduce a function [math]\alpha(p)[/math], defined to be the probability that we reject the null hypothesis, where this probability is calculated under the assumption that the null hypothesis is true. In the present case, we have

[[math]] \alpha(p) = \sum_{m \leq k \leq n} b(n,p,k)\ . [[/math]]


Note that [math]\alpha(.6)[/math] is the probability of a type 1 error, since this is the probability of a high number of successes for an ineffective additive. So for a given [math]n[/math] we want to choose [math]m[/math] so as to make [math]\alpha(.6)[/math] quite small, to reduce the likelihood of a type 1 error. But as [math]m[/math] increases above the most probable value [math]np = .6n[/math], [math]\alpha(.6)[/math], being the upper tail of a binomial distribution, approaches 0. Thus increasing [math]m[/math] makes a type 1 error less likely.


Now suppose that the additive really is effective, so that [math]p[/math] is appreciably greater than .6; say [math]p = .8[/math]. (This alternative value of [math]p[/math] is chosen arbitrarily; the following calculations depend on this choice.) Then choosing [math]m[/math] well below [math]np = .8n[/math] will increase [math]\alpha(.8)[/math], since now [math]\alpha(.8)[/math] is all but the lower tail of a binomial distribution. Indeed, if we put [math]\beta(.8) = 1 - \alpha(.8)[/math], then [math]\beta(.8)[/math] gives us the probability of a type 2 error, and so decreasing [math]m[/math] makes a type 2 error less likely.


The manufacturer would like to guard against a type 2 error, since if such an error is made, then the test does not show that the new drug is better, when in fact it is. If the alternative value of [math]p[/math] is chosen closer to the value of [math]p[/math] given in the null hypothesis (in this case [math]p =.6[/math]), then for a given test population, the value of [math]\beta[/math] will increase. So, if the manufacturer's statistician chooses an alternative value for [math]p[/math] which is close to the value in the null hypothesis, then it will be an expensive proposition (i.e., the test population will have to be large) to reject the null hypothesis with a small value of [math]\beta[/math].


What we hope to do then, for a given test population [math]n[/math], is to choose a value of [math]m[/math], if possible, which makes both these probabilities small. If we make a type 1 error we end up buying a lot of essentially ordinary aspirin at an inflated price; a type 2 error means we miss a bargain on a superior medication. Let us say that we want our critical number [math]m[/math] to make each of these undesirable cases less than 5 percent probable.


We write a program PowerCurve to plot, for [math]n = 100[/math] and selected values of [math]m[/math], the function [math]\alpha(p)[/math], for [math]p[/math] ranging from .4 to 1. The result is shown in Figure. We include in our graph a box (in dotted lines) from .6 to .8, with bottom and top at heights .05 and .95. Then a value for [math]m[/math] satisfies our requirements if and only if the graph of [math]\alpha[/math] enters the box from the bottom, and leaves from the top (why?---which is the type 1 and which is the type 2 criterion?). As [math]m[/math] increases, the graph of [math]\alpha[/math] moves to the right. A few experiments have shown us that [math]m = 69[/math] is the smallest value for [math]m[/math] that thwarts a type 1 error, while [math]m = 73[/math] is the largest which thwarts a type 2. So we may choose our critical value between 69 and 73. If we're more intent on avoiding a type 1 error we favor 73, and similarly we favor 69 if we regard a type 2 error as worse. Of course, the drug company may not be happy with having as much as a 5 percent chance of an error. They might insist on having a 1 percent chance of an error. For this we would have to increase the number [math]n[/math] of trials (see Exercise).

The power curve.

Binomial Expansion

We next remind the reader of an application of the binomial coefficients to algebra. This is the binomial expansion, from which we get the term binomial coefficient.

Theorem

(Binomial Theorem) The quantity [math](a + b)^n[/math] can be expressed in the form

[[math]] (a + b)^n = \sum_{j = 0}^n {n \choose j} a^j b^{n - j}\ . [[/math]]

Show Proof

To see that this expansion is correct, write

[[math]] (a + b)^n = (a + b)(a + b) \cdots (a + b)\ . [[/math]]
When we multiply this out we will have a sum of terms each of which results from a choice of an [math]a[/math] or [math]b[/math] for each of [math]n[/math] factors. When we choose [math]j[/math] [math]a[/math]'s and [math](n - j)[/math] [math]b[/math]'s, we obtain a term of the form [math]a^j b^{n - j}[/math]. To determine such a term, we have to specify [math]j[/math] of the [math]n[/math] terms in the product from which we choose the [math]a[/math]. This can be done in [math]n \choose j[/math] ways. Thus, collecting these terms in the sum contributes a term [math]{n \choose j} a^j b^{n - j}[/math].


For example, we have

[[math]] \begin{eqnarray*} (a + b)^0 & = & 1 \\ (a + b)^1 & = & a + b \\ (a + b)^2 & = & a^2 + 2ab + b^2 \\ (a + b)^3 & = & a^3 + 3a^2b + 3ab^2 + b^3\ . \end{eqnarray*} [[/math]]

We see here that the coefficients of successive powers do indeed yield Pascal's triangle.

Corollary

The sum of the elements in the [math]n[/math]th row of Pascal's triangle is [math]2^n[/math]. If the elements in the [math]n[/math]th row of Pascal's triangle are added with alternating signs, the sum is 0.

Show Proof

The first statement in the corollary follows from the fact that

[[math]] 2^n = (1 + 1)^n = {n \choose 0} + {n \choose 1} + {n \choose 2} + \cdots + {n \choose n}\ , [[/math]]
and the second from the fact that

[[math]] 0 = (1 - 1)^n = {n \choose 0} - {n \choose 1} + {n \choose 2}- \cdots + {(-1)^n}{n \choose n}\ . [[/math]]

The first statement of the corollary tells us that the number of subsets of a set of [math]n[/math] elements is [math]2^n[/math]. We shall use the second statement in our next application of the binomial theorem. We have seen that, when [math]A[/math] and [math]B[/math] are any two events (cf. Section ),

[[math]] P(A \cup B) = P(A) + P(B) - P(A \cap B). [[/math]]

We now extend this theorem to a more general version, which will enable us to find the probability that at least one of a number of events occurs.

Inclusion-Exclusion Principle

Theorem

Let [math]P[/math] be a probability distribution on a sample space [math]\Omega[/math], and let [math]\{A_1,\ A_2,\ \dots,\ A_n\}[/math] be a finite set of events. Then

[[math]] P(A_1 \cup A_2 \cup \cdots \cup A_n) = \sum_{i = 1}^n P(A_i)\ - \sum_{1 \leq i \lt j \leq n} P(A_i \cap A_j) [[/math]]

[[math]] \begin{equation} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ + \sum_{1 \leq i \lt j \lt k \leq n} P(A_i \cap A_j \cap A_k) - \cdots\ .\label{eq 3.5} \end{equation} [[/math]]
That is, to find the probability that at least one of [math]n[/math] events [math]A_i[/math] occurs, first add the probability of each event, then subtract the probabilities of all possible two-way intersections, add the probability of all three-way intersections, and so forth.

Show Proof

If the outcome [math]\omega[/math] occurs in at least one of the events [math]A_i[/math], its probability is added exactly once by the left side of Equation \ref{eq 3.5}. We must show that it is added exactly once by the right side of Equation \ref{eq 3.5}. Assume that [math]\omega[/math] is in exactly [math]k[/math] of the sets. Then its probability is added [math]k[/math] times in the first term, subtracted [math]k \choose 2[/math] times in the second, added [math]k \choose 3[/math] times in the third term, and so forth. Thus, the total number of times that it is added is

[[math]] {k \choose 1} - {k \choose 2} + {k \choose 3} - \cdots {(-1)^{k-1}} {k \choose k}\ . [[/math]]
But

[[math]] 0 = (1 - 1)^k = \sum_{j = 0}^k {k \choose j} (-1)^j = {k \choose 0} - \sum_{j = 1}^k {k \choose j} {(-1)^{j - 1}}\ . [[/math]]
Hence,

[[math]] 1 = {k \choose 0} = \sum_{j = 1}^k {k \choose j} {(-1)^{j - 1}}\ . [[/math]]
If the outcome [math]\omega[/math] is not in any of the events [math]A_i[/math], then it is not counted on either side of the equation.

Hat Check Problem

Example We return to the hat check problem discussed in Section, that is, the problem of finding the probability that a random permutation contains at least one fixed point. Recall that a permutation is a one-to-one map of a set [math]A = \{a_1,a_2,\dots,a_n\}[/math] onto itself. Let [math]A_i[/math] be the event that the [math]i[/math]th element [math]a_i[/math] remains fixed under this map. If we require that [math]a_i[/math] is fixed, then the map of the remaining [math]n - 1[/math] elements provides an arbitrary permutation of [math](n - 1)[/math] objects. Since there are [math](n - 1)![/math] such permutations, [math]P(A_i) = (n - 1)!/n! = 1/n[/math]. Since there are [math]n[/math] choices for [math]a_i[/math], the first term of Equation \ref{eq 3.5} is 1. In the same way, to have a particular pair [math](a_i,a_j)[/math] fixed, we can choose any permutation of the remaining [math]n - 2[/math] elements; there are [math](n - 2)![/math] such choices and thus

[[math]] P(A_i \cap A_j) = \frac{(n - 2)!}{n!} = \frac 1{n(n - 1)}\ . [[/math]]

The number of terms of this form in the right side of Equation \ref{eq 3.5} is

[[math]] {n \choose 2} = \frac{n(n - 1)}{2!}\ . [[/math]]

Hence, the second term of Equation \ref{eq 3.5} is

[[math]] -\frac{n(n - 1)}{2!} \cdot \frac 1{n(n - 1)} = -\frac 1{2!}\ . [[/math]]

Similarly, for any specific three events [math]A_i[/math], [math]A_j[/math], [math]A_k[/math],

[[math]] P(A_i \cap A_j \cap A_k) = \frac{(n - 3)!}{n!} = \frac 1{n(n - 1)(n - 2)}\ , [[/math]]

and the number of such terms is

[[math]] {n \choose 3} = \frac{n(n - 1)(n - 2)}{3!}\ , [[/math]]

making the third term of Equation \ref{eq 3.5} equal to 1/3!. Continuing in this way, we obtain

[[math]] P(\mbox {at\ least\ one\ fixed\ point}) = 1 - \frac 1{2!} + \frac 1{3!} - \cdots (-1)^{n-1} \frac 1{n!} [[/math]]

and

[[math]] P(\mbox {no\ fixed\ point}) = \frac 1{2!} - \frac 1{3!} + \cdots (-1)^n \frac 1{n!}\ . [[/math]]

From calculus we learn that

[[math]] e^x = 1 + x + \frac 1{2!}x^2 + \frac 1{3!}x^3 + \cdots + \frac 1{n!}x^n + \cdots\ . [[/math]]

Thus, if [math]x = -1[/math], we have

[[math]] \begin{eqnarray*} e^{-1} & = &\frac 1{2!} - \frac 1{3!} + \cdots + \frac{(-1)^n}{n!} + \cdots \\ & = & .3678794\ . \end{eqnarray*} [[/math]]

Therefore, the probability that there is no fixed point, i.e., that none of the [math]n[/math] people gets his own hat back, is equal to the sum of the first [math]n[/math] terms in the expression for [math]e^{-1}[/math]. This series converges very fast. Calculating the partial sums for [math]n = 3[/math] to 10 gives the data in Table.

Hat check problem.
Probability that no one
n gets his own hat back
3 .333333
4 .375
5 .366667
6 .368056
7 .367857
8 .367882
9 .367879
10 .367879


After [math]n = 9[/math] the probabilities are essentially the same to six significant figures. Interestingly, the probability of no fixed point alternately increases and decreases as [math]n[/math] increases. Finally, we note that our exact results are in good agreement with our simulations reported in the previous section.

Choosing a Sample Space

We now have some of the tools needed to accurately describe sample spaces and to assign probability functions to those sample spaces. Nevertheless, in some cases, the description and assignment process is somewhat arbitrary. Of course, it is to be hoped that the description of the sample space and the subsequent assignment of a probability function will yield a model which accurately predicts what would happen if the experiment were actually carried out. As the following examples show, there are situations in which “reasonable” descriptions of the sample space do not produce a model which fits the data. In Feller's book,[Notes 1] a pair of models is given which describe arrangements of certain kinds of elementary particles, such as photons and protons. It turns out that experiments have shown that certain types of elementary particles exhibit behavior which is accurately described by one model, called “Bose-Einstein statistics,” while other types of elementary particles can be modelled using “Fermi-Dirac statistics.” Feller says:

We have here an instructive example of the impossibility of selecting or justifying probability models by a priori arguments. In fact, no pure reasoning could tell that photons and protons would not obey the same probability laws.

We now give some examples of this description and assignment process.

Example In the quantum mechanical model of the helium atom, various parameters can be used to classify the energy states of the atom. In the triplet spin state ([math]S = 1[/math]) with orbital angular momentum 1 ([math]L = 1[/math]), there are three possibilities, 0, 1, or 2, for the total angular momentum ([math]J[/math]). (It is not assumed that the reader knows what any of this means; in fact, the example is more illustrative if the reader does not know anything about quantum mechanics.) We would like to assign probabilities to the three possibilities for [math]J[/math]. The reader is undoubtedly resisting the idea of assigning the probability of [math]1/3[/math] to each of these outcomes. She should now ask herself why she is resisting this assignment. The answer is probably because she does not have any “intuition” (i.e., experience) about the way in which helium atoms behave. In fact, in this example, the probabilities [math]1/9,\ 3/9,[/math] and [math]5/9[/math] are assigned by the theory. The theory gives these assignments because these frequencies were observed in experiments and further parameters were developed in the theory to allow these frequencies to be predicted.

Example Suppose two pennies are flipped once each. There are several “reasonable” ways to describe the sample space. One way is to count the number of heads in the outcome; in this case, the sample space can be written [math]\{0, 1, 2\}[/math]. Another description of the sample space is the set of all ordered pairs of [math]H[/math]'s and [math]T[/math]'s, i.e.,

[[math]] \{(H,H), (H, T), (T, H), (T, T)\}. [[/math]]
Both of these descriptions are accurate ones, but it is easy to see that (at most) one of these, if assigned a constant probability function, can claim to accurately model reality. In this case, as opposed to the preceding example, the reader will probably say that the second description, with each outcome being assigned a probability of [math]1/4[/math], is the “right” description. This conviction is due to experience; there is no proof that this is the way reality works.

The reader is also referred to Exercise for another example of this process.

Historical Remarks

The binomial coefficients have a long and colorful history leading up to Pascal's Treatise on the Arithmetical Triangle,[Notes 2] where Pascal developed many important properties of these numbers. This history is set forth in the book Pascal's Arithmetical Triangle by A. W. F. Edwards.[Notes 3] Pascal wrote his triangle in the form shown in Table.

Pascal's triangle.
1 1 1 1 1 1 1 1 1 1
1 2 3 4 5 6 7 8 9
1 3 6 10 15 21 28 36
1 4 10 20 35 56 84
1 5 15 35 70 126
1 6 21 56 126
1 7 28 84
1 8 36
1 9

Edwards traces three different ways that the binomial coefficients arose. He refers to these as the figurate numbers, the combinatorial numbers, and the binomial numbers. They are all names for the same thing (which we have called binomial coefficients) but that they are all the same was not appreciated until the sixteenth century. The figurate numbers date back to the Pythagorean interest in number patterns around 540 BC. The Pythagoreans considered, for example, triangular patterns shown in Figure. The sequence of numbers

[[math]] 1, 3, 6, 10, \dots [[/math]]
obtained as the number of points in each triangle are called triangular numbers. From the triangles it is clear that the [math]n[/math]th triangular number is simply the sum of the first [math]n[/math] integers. The tetrahedral numbers are the sums of the triangular numbers and were obtained by the Greek mathematicians Theon and Nicomachus at the beginning of the second century BC. The tetrahedral number 10, for example, has the geometric representation shown in Figure. The first three types of figurate numbers can be represented in tabular form as shown in Table.

Pythagorean triangular patterns.
Geometric representation of the tetrahedral number 10.

Natural numbers 1 2 3 4 5 6 7 8 9
Triangular numbers 1 3 6 10 15 21 28 36 45
Tetrahedral numbers 1 4 10 20 35 56 84 120 165


These numbers provide the first four rows of Pascal's triangle, but the table was not to be completed in the West until the sixteenth century.


In the East, Hindu mathematicians began to encounter the binomial coefficients in combinatorial problems. Bhaskara in his Lilavati of 1150 gave a rule to find the number of medicinal preparations using 1, 2, 3, 4, 5, or 6 possible ingredients.[Notes 4] His rule is equivalent to our formula

[[math]] {n \choose r} = \frac{(n)_r}{r!}\ . [[/math]]

Chu Shih-chieh's triangle. [From J. Needham, Science and Civilization in China, vol. 3 (New York: Cambridge University Press, 1959), p. 135. Reprinted with permission.


The binomial numbers as coefficients of [math](a + b)^n[/math] appeared in the works of mathematicians in China around 1100. There are references about this time to “the tabulation system for unlocking binomial coefficients.” The triangle to provide the coefficients up to the eighth power is given by Chu Shih-chieh in a book written around 1303 (see Figure).[Notes 5] The original manuscript of Chu's book has been lost, but copies have survived. Edwards notes that there is an error in this copy of Chu's triangle. Can you find it? ( Hint: Two numbers which should be equal are not.) Other copies do not show this error.


The first appearance of Pascal's triangle in the West seems to have come from calculations of Tartaglia in calculating the number of possible ways that [math]n[/math] dice might turn up.[Notes 6] For one die the answer is clearly 6. For two dice the possibilities may be displayed as shown in Table.

11
12 22
13 23 33
14 24 34 44
15 25 35 45 55
16 26 36 46 56 66

Displaying them this way suggests the sixth triangular number [math]1 + 2 + 3 + 4 + 5 + 6 = 21[/math] for the throw of 2 dice. Tartaglia “on the first day of Lent, 1523, in Verona, having thought about the problem all night,”[Notes 7] realized that the extension of the figurate table gave the answers for [math]n[/math] dice. The problem had suggested itself to Tartaglia from watching people casting their own horoscopes by means of a Book of Fortune, selecting verses by a process which included noting the numbers on the faces of three dice. The 56 ways that three dice can fall were set out on each page. The way the numbers were written in the book did not suggest the connection with figurate numbers, but a method of enumeration similar to the one we used for 2 dice does. Tartaglia's table was not published until 1556.


A table for the binomial coefficients was published in 1554 by the German mathematician Stifel.[Notes 8] Pascal's triangle appears also in Cardano's Opus novum of 1570.[Notes 9] Cardano was interested in the problem of finding the number of ways to choose [math]r[/math] objects out of [math]n[/math]. Thus by the time of Pascal's work, his triangle had appeared as a result of looking at the figurate numbers, the combinatorial numbers, and the binomial numbers, and the fact that all three were the same was presumably pretty well understood.


Pascal's interest in the binomial numbers came from his letters with Fermat concerning a problem known as the problem of points. This problem, and the correspondence between Pascal and Fermat, were discussed in Chapter Discrete Probability Distributions. The reader will recall that this problem can be described as follows: Two players A and B are playing a sequence of games and the first player to win [math]n[/math] games wins the match. It is desired to find the probability that A wins the match at a time when A has won [math]a[/math] games and B has won [math]b[/math] games. (See Exercise - Exercise.)

Pascal solved the problem by backward induction, much the way we would do today in writing a computer program for its solution. He referred to the combinatorial method of Fermat which proceeds as follows: If A needs [math]c[/math] games and B needs [math]d[/math] games to win, we require that the players continue to play until they have played [math]c + d - 1[/math] games. The winner in this extended series will be the same as the winner in the original series. The probability that A wins in the extended series and hence in the original series is

[[math]] \sum_{r = c}^{c + d - 1} \frac 1{2^{c + d - 1}} {{c + d - 1} \choose r}\ . [[/math]]

Even at the time of the letters Pascal seemed to understand this formula.


Suppose that the first player to win [math]n[/math] games wins the match, and suppose that each player has put up a stake of [math]x[/math]. Pascal studied the value of winning a particular game. By this he meant the increase in the expected winnings of the winner of the particular game under consideration. He showed that the value of the first game is

[[math]] \frac {1\cdot3\cdot5\cdot\dots\cdot(2n - 1)}{2\cdot4\cdot6\cdot\dots\cdot(2n)}x\ . [[/math]]

His proof of this seems to use Fermat's formula and the fact that the above ratio of products of odd to products of even numbers is equal to the probability of exactly [math]n[/math] heads in [math]2n[/math] tosses of a coin. (See Exercise.)


Pascal presented Fermat with the table shown in Table.

Pascal's solution for the problem of points.
If each one staken 256 in
From my opponent's 256 positions I get, for the 6 games 5 games 4 games 3 games 2 games 1 game
1st game 63 70 80 96 128 256
2nd game 63 70 80 96 128
3rd game 56 60 64 64
4th game 42 40 32
5th game 24 16
6th game 8

He states:

You will see as always, that the value of the first game is equal to that of the second which is easily shown by combinations. You will see, in the same way, that the numbers in the first line are always increasing; so also are those in the second; and those in the third. But those in the fourth line are decreasing, and those in the fifth, etc. This seems odd.[Notes 10]


The student can pursue this question further using the computer and Pascal's backward iteration method for computing the expected payoff at any point in the series.


In his treatise, Pascal gave a formal proof of Fermat's combinatorial formula as well as proofs of many other basic properties of binomial numbers. Many of his proofs involved induction and represent some of the first proofs by this method. His book brought together all the different aspects of the numbers in the Pascal triangle as known in 1654, and, as Edwards states, “That the Arithmetical Triangle should bear Pascal's name cannot be disputed.”[Notes 11]

The first serious study of the binomial distribution was undertaken by James Bernoulli in his Ars Conjectandi published in 1713.[Notes 12] We shall return to this work in the historical remarks in Chapter Law of Large Numbers.

General references

Doyle, Peter G. (2006). "Grinstead and Snell's Introduction to Probability" (PDF). Retrieved June 6, 2024.

Notes

  1. W. Feller, Introduction to Probability Theory and Its Applications vol. 1, 3rd ed. (New York: John Wiley and Sons, 1968), p. 41
  2. B. Pascal, Traité du Triangle Arithmétique (Paris: Desprez, 1665).
  3. A. W. F. Edwards, Pascal's Arithmetical Triangle (London: Griffin, 1987).
  4. ibid., p. 27.
  5. J. Needham, Science and Civilization in China, vol. 3 (New York: Cambridge University Press, 1959), p. 135.
  6. N. Tartaglia, General Trattato di Numeri et Misure (Vinegia, 1556).
  7. Quoted in Edwards, op. cit., p. 37.
  8. M. Stifel, Arithmetica Integra (Norimburgae, 1544).
  9. G. Cardano, Opus Novum de Proportionibus Numerorum (Basilea, 1570).
  10. F. N. David, op. cit., p. 235.
  11. A. W. F. Edwards, op. cit., p. ix.
  12. J. Bernoulli, Ars Conjectandi (Basil: Thurnisiorum, 1713).