Combinations

[math] \newcommand{\NA}{{\rm NA}} \newcommand{\mat}[1]{{\bf#1}} \newcommand{\exref}[1]{\ref{##1}} \newcommand{\secstoprocess}{\all} \newcommand{\NA}{{\rm NA}} \newcommand{\mathds}{\mathbb}[/math]

Having mastered permutations, we now consider combinations. Let [math]U[/math] be a set with [math]n[/math] elements; we want to count the number of distinct subsets of the set [math]U[/math] that have exactly [math]j[/math] elements. The empty set and the set [math]U[/math] are considered to be subsets of [math]U[/math]. The empty set is usually denoted by [math]\phi[/math].

Example Let [math]U = \{a,b,c\}[/math]. The subsets of [math]U[/math] are

[[math]] \phi,\ \{a\},\ \{b\},\ \{c\},\ \{a,b\},\ \{a,c\},\ \{b,c\},\ \{a,b,c\}\ . [[/math]]

Binomial Coefficients

The number of distinct subsets with [math]j[/math] elements that can be chosen from a set with [math]n[/math] elements is denoted by [math]{n \choose j}[/math], and is pronounced “[math]n[/math] choose [math]j[/math].” The number [math]n \choose j[/math] is called a binomial coefficient. This terminology comes from an application to algebra which will be discussed later in this section. In the above example, there is one subset with no elements, three subsets with exactly 1 element, three subsets with exactly 2 elements, and one subset with exactly 3 elements. Thus, [math]{3 \choose 0} = 1[/math], [math]{3 \choose 1} = 3[/math], [math]{3 \choose 2} = 3[/math], and [math]{3 \choose 3} = 1[/math]. Note that there are [math]2^3 = 8[/math] subsets in all. (We have already seen that a set with [math]n[/math] elements has [math]2^n[/math] subsets; (see Exercise). It follows that

[[math]] {3 \choose 0} + {3 \choose 1} + {3 \choose 2} + {3 \choose 3} = 2^3 = 8\ , [[/math]]

[[math]] {n \choose 0} = {n \choose n} = 1\ . [[/math]]

Assume that [math]n \gt 0[/math]. Then, since there is only one way to choose a set with no elements and only one way to choose a set with [math]n[/math] elements, the remaining values of [math]n \choose j[/math] are determined by the following recurrence relation:

Theorem

For integers [math]n[/math] and [math]j[/math], with [math]0 \lt j \lt n[/math], the binomial coefficients satisfy:

[[math]] \begin{equation} {n \choose j} = {{n-1} \choose j} + {{n-1} \choose {j - 1}}\ . \label{eq 3.3} \end{equation} [[/math]]

Show Proof

We wish to choose a subset of [math]j[/math] elements. Choose an element [math]u[/math] of [math]U[/math]. Assume first that we do not want [math]u[/math] in the subset. Then we must choose the [math]j[/math] elements from a set of [math]n - 1[/math] elements; this can be done in [math]{{n-1} \choose j}[/math] ways. On the other hand, assume that we do want [math]u[/math] in the subset. Then we must choose the other [math]j - 1[/math] elements from the remaining [math]n - 1[/math] elements of [math]U[/math]; this can be done in [math]{{n-1} \choose {j - 1}}[/math] ways. Since [math]u[/math] is either in our subset or not, the number of ways that we can choose a subset of [math]j[/math] elements is the sum of the number of subsets of [math]j[/math] elements which have [math]u[/math] as a member and the number which do not---this is what Equation \ref{eq 3.3} states.

The binomial coefficient [math]n \choose j[/math] is defined to be 0, if [math]j \lt 0[/math] or if [math]j \gt n[/math]. With this definition, the restrictions on [math]j[/math] in Theorem are unnecessary.

Pascal's Triangle

The relation, together with the knowledge that

[[math]] {n \choose 0} = {n \choose n }= 1\ , [[/math]]

determines completely the numbers [math]n \choose j[/math]. We can use these relations to determine the famous triangle of Pascal, which exhibits all these numbers in matrix form (see Figure).

Pascal's triangle.

The [math]n[/math]th row of this triangle has the entries [math]n \choose 0[/math], [math]n \choose 1[/math],..., [math]n \choose n[/math]. We know that the first and last of these numbers are 1. The remaining numbers are determined by the recurrence relation Equation \ref{eq 3.3}; that is, the entry [math]{n \choose j}[/math] for [math]0 \lt j \lt n[/math] in the [math]n[/math]th row of Pascal's triangle is the sum of the entry immediately above and the one immediately to its left in the [math](n - 1)[/math]st row. For example, [math]{5 \choose 2} = 6 + 4 = 10[/math]. This algorithm for constructing Pascal's triangle can be used to write a computer program to compute the binomial coefficients. You are asked to do this in Exercise.

While Pascal's triangle provides a way to construct recursively the binomial coefficients, it is also possible to give a formula for [math]n \choose j[/math].

Theorem

The binomial coefficients are given by the formula

[[math]] \begin{equation} {n \choose j }= \frac{(n)_j}{j!}\ . \label{eq 3.4} \end{equation} [[/math]]

Show Proof

Each subset of size [math]j[/math] of a set of size [math]n[/math] can be ordered in [math]j![/math] ways. Each of these orderings is a [math]j[/math]-permutation of the set of size [math]n[/math]. The number of [math]j[/math]-permutations is [math](n)_j[/math], so the number of subsets of size [math]j[/math] is

[[math]] \frac{(n)_j}{j!}\ . [[/math]]
This completes the proof.

The above formula can be rewritten in the form

[[math]] {n \choose j} = \frac{n!}{j!(n-j)!}\ . [[/math]]

This immediately shows that

[[math]] {n \choose j} = {n \choose {n-j}}\ . [[/math]]

When using Equation \ref{eq 3.4} in the calculation of [math]{n \choose j}[/math], if one alternates the multiplications and divisions, then all of the intermediate values in the calculation are integers. Furthermore, none of these intermediate values exceed the final value. (See Exercise.)


Another point that should be made concerning Equation \ref{eq 3.4} is that if it is used to define the binomial coefficients, then it is no longer necessary to require [math]n[/math] to be a positive integer. The variable [math]j[/math] must still be a non-negative integer under this definition. This idea is useful when extending the Binomial Theorem to general exponents. (The Binomial Theorem for non-negative integer exponents is given below as Theorem.)

Poker Hands

Example kind beats a full house. A poker hand is a random subset of 5 elements from a deck of 52 cards. A hand has four of a kind if it has four cards with the same value---for example, four sixes or four kings. It is a full house if it has three of one value and two of a second---for example, three twos and two queens. Let us see which hand is more likely. How many hands have four of a kind? There are 13 ways that we can specify the value for the four cards. For each of these, there are 48 possibilities for the fifth card. Thus, the number of four-of-a-kind hands is [math]13 \cdot 48 = 624[/math]. Since the total number of possible hands is [math]{52 \choose 5} = 2598960[/math], the probability of a hand with four of a kind is [math]624/2598960 = .00024[/math].


Now consider the case of a full house; how many such hands are there? There are 13 choices for the value which occurs three times; for each of these there are [math]{4 \choose 3} = 4[/math] choices for the particular three cards of this value that are in the hand. Having picked these three cards, there are 12 possibilities for the value which occurs twice; for each of these there are [math]{4 \choose 2} = 6[/math] possibilities for the particular pair of this value. Thus, the number of full houses is [math]13 \cdot 4 \cdot 12 \cdot 6 = 3744[/math], and the probability of obtaining a hand with a full house is [math]3744/2598960 = .0014[/math]. Thus, while both types of hands are unlikely, you are six times more likely to obtain a full house than four of a kind.

Bernoulli Trials

Our principal use of the binomial coefficients will occur in the study of one of the important chance processes called Bernoulli trials.

Definition

A Bernoulli trials process is a sequence of [math]n[/math] chance experiments such that

  • Each experiment has two possible outcomes, which we may call success and failure.
  • The probability [math]p[/math] of success on each experiment is the same for each experiment, and this probability is not affected by any knowledge of previous outcomes. The probability [math]q[/math] of failure is given by [math]q = 1 - p[/math].

Tree diagram of three Bernoulli trials.

Example

  • A coin is tossed ten times. The two possible outcomes are heads and tails. The probability of heads on any one toss is 1/2.
  • An opinion poll is carried out by asking 1000 people, randomly chosen from the population, if they favor the Equal Rights Amendment---the two outcomes being yes and no. The probability [math]p[/math] of a yes answer (i.e., a success) indicates the proportion of people in the entire population that favor this amendment.
  • A gambler makes a sequence of 1-dollar bets, betting each time on black at roulette at Las Vegas. Here a success is winning 1 dollar and a failure is losing 1 dollar. Since in American roulette the gambler wins if the ball stops on one of 18 out of 38 positions and loses otherwise, the probability of winning is [math]p = 18/38 = .474[/math].

To analyze a Bernoulli trials process, we choose as our sample space a binary tree and assign a probability distribution to the paths in this tree. Suppose, for example, that we have three Bernoulli trials. The possible outcomes are indicated in the tree diagram shown in Figure. We define [math]X[/math] to be the random variable which represents the outcome of the process, i.e., an ordered triple of S's and F's. The probabilities assigned to the branches of the tree represent the probability for each individual trial. Let the outcome of the [math]i[/math]th trial be denoted by the random variable [math]X_i[/math], with distribution function [math]m_i[/math]. Since we have assumed that outcomes on any one trial do not affect those on another, we assign the same probabilities at each level of the tree. An outcome [math]\omega[/math] for the entire experiment will be a path through the tree. For example, [math]\omega_3[/math] represents the outcomes SFS. Our frequency interpretation of probability would lead us to expect a fraction [math]p[/math] of successes on the first experiment; of these, a fraction [math]q[/math] of failures on the second; and, of these, a fraction [math]p[/math] of successes on the third experiment. This suggests assigning probability [math]pqp[/math] to the outcome [math]\omega_3[/math]. More generally, we assign a distribution function [math]m(\omega)[/math] for paths [math]\omega[/math] by defining [math]m(\omega)[/math] to be the product of the branch probabilities along the path [math]\omega[/math]. Thus, the probability that the three events S on the first trial, F on the second trial, and S on the third trial occur is the product of the probabilities for the individual events. We shall see in the next chapter that this means that the events involved are independent in the sense that the knowledge of one event does not affect our prediction for the occurrences of the other events.

Binomial Probabilities

We shall be particularly interested in the probability that in [math]n[/math] Bernoulli trials there are exactly [math]j[/math] successes. We denote this probability by [math]b(n,p,j)[/math]. Let us calculate the particular value [math]b(3,p,2)[/math] from our tree measure. We see that there are three paths which have exactly two successes and one failure, namely [math]\omega_2[/math], [math]\omega_3[/math], and [math]\omega_5[/math]. Each of these paths has the same probability [math]p^2q[/math]. Thus [math]b(3,p,2) = 3p^2q[/math]. Considering all possible numbers of successes we have

[[math]] \begin{eqnarray*} b(3,p,0) &=& q^3\ ,\\ b(3,p,1) &=& 3pq^2\ ,\\ b(3,p,2) &=& 3p^2q\ ,\\ b(3,p,3) &=& p^3\ . \end{eqnarray*} [[/math]]

We can, in the same manner, carry out a tree measure for [math]n[/math] experiments and determine [math]b(n,p,j)[/math] for the general case of [math]n[/math] Bernoulli trials.

Theorem

Given [math]n[/math] Bernoulli trials with probability [math]p[/math] of success on each experiment, the probability of exactly [math]j[/math] successes is

[[math]] b(n,p,j) = {n \choose j} p^j q^{n - j} [[/math]]
where [math]q = 1 - p[/math].

Show Proof

We construct a tree measure as described above. We want to find the sum of the probabilities for all paths which have exactly [math]j[/math] successes and [math]n - j[/math] failures. Each such path is assigned a probability [math]p^j q^{n - j}[/math]. How many such paths are there? To specify a path, we have to pick, from the [math]n[/math] possible trials, a subset of [math]j[/math] to be successes, with the remaining [math]n-j[/math] outcomes being failures. We can do this in [math]n \choose j[/math] ways. Thus the sum of the probabilities is

[[math]] b(n,p,j) = {n \choose j} p^j q^{n - j}\ . [[/math]]

Example probability that exactly three heads turn up? The answer is

[[math]] b(6,.5,3) = {6 \choose 3} \left(\frac12\right)^3 \left(\frac12\right)^3 = 20 \cdot \frac1{64} = .3125\ . [[/math]]

Example that we obtain exactly one 6? We treat this as Bernoulli trials with success = “rolling a 6” and failure = “rolling some number other than a 6.” Then [math]p = 1/6[/math], and the probability of exactly one success in four trials is

[[math]] b(4,1/6,1) = {4 \choose 1 }\left(\frac16\right)^1 \left(\frac56\right)^3 = .386\ . [[/math]]

To compute binomial probabilities using the computer, multiply the function choose[math](n,k)[/math] by [math]p^kq^{n - k}[/math]. The program BinomialProbabilities prints out the binomial probabilities [math]b(n, p, k)[/math] for [math]k[/math] between [math]kmin[/math] and [math]kmax[/math], and the sum of these probabilities. We have run this program for [math]n = 100[/math], [math]p = 1/2[/math], [math]kmin = 45[/math], and [math]kmax = 55[/math]; the output is shown in Table. Note that the individual probabilities are quite small. The probability of exactly 50 heads in 100 tosses of a coin is about .08. Our intuition tells us that this is the most likely outcome, which is correct; but, all the same, it is not a very likely outcome.

Binomial probabilities for [math]n = 100,\ p = 1/2[/math].
[math]k[/math] [math]b(n,p,k)[/math]
45 .0485
46 .0580
47 .0666
48 .0735
49 .0780
50 .0796
51 .0780
52 .0735
53 .0666
54 .0580
55 .0485


Binomial Distributions

Definition

Let [math]n[/math] be a positive integer, and let [math]p[/math] be a real number between 0 and 1. Let [math]B[/math] be the random variable which counts the number of successes in a Bernoulli trials process with parameters [math]n[/math] and [math]p[/math]. Then the distribution [math]b(n, p, k)[/math] of [math]B[/math] is called the binomial distribution.

We can get a better idea about the binomial distribution by graphing this distribution for different values of [math]n[/math] and [math]p[/math] (see Figure). The plots in this figure were generated using the program BinomialPlot.

Binomial distributions.


We have run this program for [math]p = .5[/math] and [math]p = .3[/math]. Note that even for [math]p = .3[/math] the graphs are quite symmetric. We shall have an explanation for this in Chapter. We also note that the highest probability occurs around the value [math]np[/math], but that these highest probabilities get smaller as [math]n[/math] increases. We shall see in Chapter Expected Value and Variance that [math]np[/math] is the mean or expected value of the binomial distribution [math]b(n,p,k)[/math].


The following example gives a nice way to see the binomial distribution, when [math]p = 1/2[/math]. Example A Galton board is a board in which a large number of BB-shots are dropped from a chute at the top of the board and deflected off a number of pins on their way down to the bottom of the board. The final position of each slot is the result of a number of random deflections either to the left or the right. We have written a program GaltonBoard to simulate this experiment.


We have run the program for the case of 20 rows of pins and 10,00 shots being dropped. We show the result of this simulation in Figure.


Simulation of the Galton board.

Note that if we write 0 every time the shot is deflected to the left, and 1 every time it is deflected to the right, then the path of the shot can be described by a sequence of 0's and 1's of length [math]n[/math], just as for the [math]n[/math]-fold coin toss.


The distribution shown in Figure is an example of an empirical distribution, in the sense that it comes about by means of a sequence of experiments. As expected, this empirical distribution resembles the corresponding binomial distribution with parameters [math]n = 20[/math] and [math]p = 1/2[/math].


Hypothesis Testing

Example Suppose that ordinary aspirin has been found effective against headaches 60 percent of the time, and that a drug company claims that its new aspirin with a special headache additive is more effective. We can test this claim as follows: we call their claim the alternate hypothesis, and its negation, that the additive has no appreciable effect, the null hypothesis. Thus the null hypothesis is that [math]p = .6[/math], and the alternate hypothesis is that [math]p \gt .6[/math], where [math]p[/math] is the probability that the new aspirin is effective.

We give the aspirin to [math]n[/math] people to take when they have a headache. We want to find a number [math]m[/math], called the critical value for our experiment, such that we reject the null hypothesis if at least [math]m[/math] people are cured, and otherwise we accept it. How should we determine this critical value?

First note that we can make two kinds of errors. The first, often called a type 1 error in statistics, is to reject the null hypothesis when in fact it is true. The second, called a type 2 error, is to accept the null hypothesis when it is false. To determine the probability of both these types of errors we introduce a function [math]\alpha(p)[/math], defined to be the probability that we reject the null hypothesis, where this probability is calculated under the assumption that the null hypothesis is true. In the present case, we have

[[math]] \alpha(p) = \sum_{m \leq k \leq n} b(n,p,k)\ . [[/math]]


Note that [math]\alpha(.6)[/math] is the probability of a type 1 error, since this is the probability of a high number of successes for an ineffective additive. So for a given [math]n[/math] we want to choose [math]m[/math] so as to make [math]\alpha(.6)[/math] quite small, to reduce the likelihood of a type 1 error. But as [math]m[/math] increases above the most probable value [math]np = .6n[/math], [math]\alpha(.6)[/math], being the upper tail of a binomial distribution, approaches 0. Thus increasing [math]m[/math] makes a type 1 error less likely.


Now suppose that the additive really is effective, so that [math]p[/math] is appreciably greater than .6; say [math]p = .8[/math]. (This alternative value of [math]p[/math] is chosen arbitrarily; the following calculations depend on this choice.) Then choosing [math]m[/math] well below [math]np = .8n[/math] will increase [math]\alpha(.8)[/math], since now [math]\alpha(.8)[/math] is all but the lower tail of a binomial distribution. Indeed, if we put [math]\beta(.8) = 1 - \alpha(.8)[/math], then [math]\beta(.8)[/math] gives us the probability of a type 2 error, and so decreasing [math]m[/math] makes a type 2 error less likely.


The manufacturer would like to guard against a type 2 error, since if such an error is made, then the test does not show that the new drug is better, when in fact it is. If the alternative value of [math]p[/math] is chosen closer to the value of [math]p[/math] given in the null hypothesis (in this case [math]p =.6[/math]), then for a given test population, the value of [math]\beta[/math] will increase. So, if the manufacturer's statistician chooses an alternative value for [math]p[/math] which is close to the value in the null hypothesis, then it will be an expensive proposition (i.e., the test population will have to be large) to reject the null hypothesis with a small value of [math]\beta[/math].


What we hope to do then, for a given test population [math]n[/math], is to choose a value of [math]m[/math], if possible, which makes both these probabilities small. If we make a type 1 error we end up buying a lot of essentially ordinary aspirin at an inflated price; a type 2 error means we miss a bargain on a superior medication. Let us say that we want our critical number [math]m[/math] to make each of these undesirable cases less than 5 percent probable.


We write a program PowerCurve to plot, for [math]n = 100[/math] and selected values of [math]m[/math], the function [math]\alpha(p)[/math], for [math]p[/math] ranging from .4 to 1. The result is shown in Figure. We include in our graph a box (in dotted lines) from .6 to .8, with bottom and top at heights .05 and .95. Then a value for [math]m[/math] satisfies our requirements if and only if the graph of [math]\alpha[/math] enters the box from the bottom, and leaves from the top (why?---which is the type 1 and which is the type 2 criterion?). As [math]m[/math] increases, the graph of [math]\alpha[/math] moves to the right. A few experiments have shown us that [math]m = 69[/math] is the smallest value for [math]m[/math] that thwarts a type 1 error, while [math]m = 73[/math] is the largest which thwarts a type 2. So we may choose our critical value between 69 and 73. If we're more intent on avoiding a type 1 error we favor 73, and similarly we favor 69 if we regard a type 2 error as worse. Of course, the drug company may not be happy with having as much as a 5 percent chance of an error. They might insist on having a 1 percent chance of an error. For this we would have to increase the number [math]n[/math] of trials (see Exercise).

The power curve.

Binomial Expansion

We next remind the reader of an application of the binomial coefficients to algebra. This is the binomial expansion, from which we get the term binomial coefficient.

Theorem

(Binomial Theorem) The quantity [math](a + b)^n[/math] can be expressed in the form

[[math]] (a + b)^n = \sum_{j = 0}^n {n \choose j} a^j b^{n - j}\ . [[/math]]

Show Proof

To see that this expansion is correct, write

[[math]] (a + b)^n = (a + b)(a + b) \cdots (a + b)\ . [[/math]]
When we multiply this out we will have a sum of terms each of which results from a choice of an [math]a[/math] or [math]b[/math] for each of [math]n[/math] factors. When we choose [math]j[/math] [math]a[/math]'s and [math](n - j)[/math] [math]b[/math]'s, we obtain a term of the form [math]a^j b^{n - j}[/math]. To determine such a term, we have to specify [math]j[/math] of the [math]n[/math] terms in the product from which we choose the [math]a[/math]. This can be done in [math]n \choose j[/math] ways. Thus, collecting these terms in the sum contributes a term [math]{n \choose j} a^j b^{n - j}[/math].


For example, we have

[[math]] \begin{eqnarray*} (a + b)^0 & = & 1 \\ (a + b)^1 & = & a + b \\ (a + b)^2 & = & a^2 + 2ab + b^2 \\ (a + b)^3 & = & a^3 + 3a^2b + 3ab^2 + b^3\ . \end{eqnarray*} [[/math]]

We see here that the coefficients of successive powers do indeed yield Pascal's triangle.

Corollary

The sum of the elements in the [math]n[/math]th row of Pascal's triangle is [math]2^n[/math]. If the elements in the [math]n[/math]th row of Pascal's triangle are added with alternating signs, the sum is 0.

Show Proof

The first statement in the corollary follows from the fact that

[[math]] 2^n = (1 + 1)^n = {n \choose 0} + {n \choose 1} + {n \choose 2} + \cdots + {n \choose n}\ , [[/math]]
and the second from the fact that

[[math]] 0 = (1 - 1)^n = {n \choose 0} - {n \choose 1} + {n \choose 2}- \cdots + {(-1)^n}{n \choose n}\ . [[/math]]

The first statement of the corollary tells us that the number of subsets of a set of [math]n[/math] elements is [math]2^n[/math]. We shall use the second statement in our next application of the binomial theorem. We have seen that, when [math]A[/math] and [math]B[/math] are any two events (cf. Section ),

[[math]] P(A \cup B) = P(A) + P(B) - P(A \cap B). [[/math]]

We now extend this theorem to a more general version, which will enable us to find the probability that at least one of a number of events occurs.

Inclusion-Exclusion Principle

Theorem

Let [math]P[/math] be a probability distribution on a sample space [math]\Omega[/math], and let [math]\{A_1,\ A_2,\ \dots,\ A_n\}[/math] be a finite set of events. Then

[[math]] P(A_1 \cup A_2 \cup \cdots \cup A_n) = \sum_{i = 1}^n P(A_i)\ - \sum_{1 \leq i \lt j \leq n} P(A_i \cap A_j) [[/math]]

[[math]] \begin{equation} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ + \sum_{1 \leq i \lt j \lt k \leq n} P(A_i \cap A_j \cap A_k) - \cdots\ .\label{eq 3.5} \end{equation} [[/math]]
That is, to find the probability that at least one of [math]n[/math] events [math]A_i[/math] occurs, first add the probability of each event, then subtract the probabilities of all possible two-way intersections, add the probability of all three-way intersections, and so forth.

Show Proof

If the outcome [math]\omega[/math] occurs in at least one of the events [math]A_i[/math], its probability is added exactly once by the left side of Equation \ref{eq 3.5}. We must show that it is added exactly once by the right side of Equation \ref{eq 3.5}. Assume that [math]\omega[/math] is in exactly [math]k[/math] of the sets. Then its probability is added [math]k[/math] times in the first term, subtracted [math]k \choose 2[/math] times in the second, added [math]k \choose 3[/math] times in the third term, and so forth. Thus, the total number of times that it is added is

[[math]] {k \choose 1} - {k \choose 2} + {k \choose 3} - \cdots {(-1)^{k-1}} {k \choose k}\ . [[/math]]
But

[[math]] 0 = (1 - 1)^k = \sum_{j = 0}^k {k \choose j} (-1)^j = {k \choose 0} - \sum_{j = 1}^k {k \choose j} {(-1)^{j - 1}}\ . [[/math]]
Hence,

[[math]] 1 = {k \choose 0} = \sum_{j = 1}^k {k \choose j} {(-1)^{j - 1}}\ . [[/math]]
If the outcome [math]\omega[/math] is not in any of the events [math]A_i[/math], then it is not counted on either side of the equation.

Hat Check Problem

Example We return to the hat check problem discussed in Section, that is, the problem of finding the probability that a random permutation contains at least one fixed point. Recall that a permutation is a one-to-one map of a set [math]A = \{a_1,a_2,\dots,a_n\}[/math] onto itself. Let [math]A_i[/math] be the event that the [math]i[/math]th element [math]a_i[/math] remains fixed under this map. If we require that [math]a_i[/math] is fixed, then the map of the remaining [math]n - 1[/math] elements provides an arbitrary permutation of [math](n - 1)[/math] objects. Since there are [math](n - 1)![/math] such permutations, [math]P(A_i) = (n - 1)!/n! = 1/n[/math]. Since there are [math]n[/math] choices for [math]a_i[/math], the first term of Equation \ref{eq 3.5} is 1. In the same way, to have a particular pair [math](a_i,a_j)[/math] fixed, we can choose any permutation of the remaining [math]n - 2[/math] elements; there are [math](n - 2)![/math] such choices and thus

[[math]] P(A_i \cap A_j) = \frac{(n - 2)!}{n!} = \frac 1{n(n - 1)}\ . [[/math]]

The number of terms of this form in the right side of Equation \ref{eq 3.5} is

[[math]] {n \choose 2} = \frac{n(n - 1)}{2!}\ . [[/math]]

Hence, the second term of Equation \ref{eq 3.5} is

[[math]] -\frac{n(n - 1)}{2!} \cdot \frac 1{n(n - 1)} = -\frac 1{2!}\ . [[/math]]

Similarly, for any specific three events [math]A_i[/math], [math]A_j[/math], [math]A_k[/math],

[[math]] P(A_i \cap A_j \cap A_k) = \frac{(n - 3)!}{n!} = \frac 1{n(n - 1)(n - 2)}\ , [[/math]]

and the number of such terms is

[[math]] {n \choose 3} = \frac{n(n - 1)(n - 2)}{3!}\ , [[/math]]

making the third term of Equation \ref{eq 3.5} equal to 1/3!. Continuing in this way, we obtain

[[math]] P(\mbox {at\ least\ one\ fixed\ point}) = 1 - \frac 1{2!} + \frac 1{3!} - \cdots (-1)^{n-1} \frac 1{n!} [[/math]]

and

[[math]] P(\mbox {no\ fixed\ point}) = \frac 1{2!} - \frac 1{3!} + \cdots (-1)^n \frac 1{n!}\ . [[/math]]

From calculus we learn that

[[math]] e^x = 1 + x + \frac 1{2!}x^2 + \frac 1{3!}x^3 + \cdots + \frac 1{n!}x^n + \cdots\ . [[/math]]

Thus, if [math]x = -1[/math], we have

[[math]] \begin{eqnarray*} e^{-1} & = &\frac 1{2!} - \frac 1{3!} + \cdots + \frac{(-1)^n}{n!} + \cdots \\ & = & .3678794\ . \end{eqnarray*} [[/math]]

Therefore, the probability that there is no fixed point, i.e., that none of the [math]n[/math] people gets his own hat back, is equal to the sum of the first [math]n[/math] terms in the expression for [math]e^{-1}[/math]. This series converges very fast. Calculating the partial sums for [math]n = 3[/math] to 10 gives the data in Table.

Hat check problem.
Probability that no one
n gets his own hat back
3 .333333
4 .375
5 .366667
6 .368056
7 .367857
8 .367882
9 .367879
10 .367879


After [math]n = 9[/math] the probabilities are essentially the same to six significant figures. Interestingly, the probability of no fixed point alternately increases and decreases as [math]n[/math] increases. Finally, we note that our exact results are in good agreement with our simulations reported in the previous section.

Choosing a Sample Space

We now have some of the tools needed to accurately describe sample spaces and to assign probability functions to those sample spaces. Nevertheless, in some cases, the description and assignment process is somewhat arbitrary. Of course, it is to be hoped that the description of the sample space and the subsequent assignment of a probability function will yield a model which accurately predicts what would happen if the experiment were actually carried out. As the following examples show, there are situations in which “reasonable” descriptions of the sample space do not produce a model which fits the data. In Feller's book,[Notes 1] a pair of models is given which describe arrangements of certain kinds of elementary particles, such as photons and protons. It turns out that experiments have shown that certain types of elementary particles exhibit behavior which is accurately described by one model, called “Bose-Einstein statistics,” while other types of elementary particles can be modelled using “Fermi-Dirac statistics.” Feller says:

We have here an instructive example of the impossibility of selecting or justifying probability models by a priori arguments. In fact, no pure reasoning could tell that photons and protons would not obey the same probability laws.

We now give some examples of this description and assignment process.

Example In the quantum mechanical model of the helium atom, various parameters can be used to classify the energy states of the atom. In the triplet spin state ([math]S = 1[/math]) with orbital angular momentum 1 ([math]L = 1[/math]), there are three possibilities, 0, 1, or 2, for the total angular momentum ([math]J[/math]). (It is not assumed that the reader knows what any of this means; in fact, the example is more illustrative if the reader does not know anything about quantum mechanics.) We would like to assign probabilities to the three possibilities for [math]J[/math]. The reader is undoubtedly resisting the idea of assigning the probability of [math]1/3[/math] to each of these outcomes. She should now ask herself why she is resisting this assignment. The answer is probably because she does not have any “intuition” (i.e., experience) about the way in which helium atoms behave. In fact, in this example, the probabilities [math]1/9,\ 3/9,[/math] and [math]5/9[/math] are assigned by the theory. The theory gives these assignments because these frequencies were observed in experiments and further parameters were developed in the theory to allow these frequencies to be predicted.

Example Suppose two pennies are flipped once each. There are several “reasonable” ways to describe the sample space. One way is to count the number of heads in the outcome; in this case, the sample space can be written [math]\{0, 1, 2\}[/math]. Another description of the sample space is the set of all ordered pairs of [math]H[/math]'s and [math]T[/math]'s, i.e.,

[[math]] \{(H,H), (H, T), (T, H), (T, T)\}. [[/math]]
Both of these descriptions are accurate ones, but it is easy to see that (at most) one of these, if assigned a constant probability function, can claim to accurately model reality. In this case, as opposed to the preceding example, the reader will probably say that the second description, with each outcome being assigned a probability of [math]1/4[/math], is the “right” description. This conviction is due to experience; there is no proof that this is the way reality works.

The reader is also referred to Exercise for another example of this process.

Historical Remarks

The binomial coefficients have a long and colorful history leading up to Pascal's Treatise on the Arithmetical Triangle,[Notes 2] where Pascal developed many important properties of these numbers. This history is set forth in the book Pascal's Arithmetical Triangle by A. W. F. Edwards.[Notes 3] Pascal wrote his triangle in the form shown in Table.

Pascal's triangle.
1 1 1 1 1 1 1 1 1 1
1 2 3 4 5 6 7 8 9
1 3 6 10 15 21 28 36
1 4 10 20 35 56 84
1 5 15 35 70 126
1 6 21 56 126
1 7 28 84
1 8 36
1 9

Edwards traces three different ways that the binomial coefficients arose. He refers to these as the figurate numbers, the combinatorial numbers, and the binomial numbers. They are all names for the same thing (which we have called binomial coefficients) but that they are all the same was not appreciated until the sixteenth century. The figurate numbers date back to the Pythagorean interest in number patterns around 540 BC. The Pythagoreans considered, for example, triangular patterns shown in Figure. The sequence of numbers

[[math]] 1, 3, 6, 10, \dots [[/math]]
obtained as the number of points in each triangle are called triangular numbers. From the triangles it is clear that the [math]n[/math]th triangular number is simply the sum of the first [math]n[/math] integers. The tetrahedral numbers are the sums of the triangular numbers and were obtained by the Greek mathematicians Theon and Nicomachus at the beginning of the second century BC. The tetrahedral number 10, for example, has the geometric representation shown in Figure. The first three types of figurate numbers can be represented in tabular form as shown in Table.

Pythagorean triangular patterns.
Geometric representation of the tetrahedral number 10.

Natural numbers 1 2 3 4 5 6 7 8 9
Triangular numbers 1 3 6 10 15 21 28 36 45
Tetrahedral numbers 1 4 10 20 35 56 84 120 165


These numbers provide the first four rows of Pascal's triangle, but the table was not to be completed in the West until the sixteenth century.


In the East, Hindu mathematicians began to encounter the binomial coefficients in combinatorial problems. Bhaskara in his Lilavati of 1150 gave a rule to find the number of medicinal preparations using 1, 2, 3, 4, 5, or 6 possible ingredients.[Notes 4] His rule is equivalent to our formula

[[math]] {n \choose r} = \frac{(n)_r}{r!}\ . [[/math]]

Chu Shih-chieh's triangle. [From J. Needham, Science and Civilization in China, vol. 3 (New York: Cambridge University Press, 1959), p. 135. Reprinted with permission.


The binomial numbers as coefficients of [math](a + b)^n[/math] appeared in the works of mathematicians in China around 1100. There are references about this time to “the tabulation system for unlocking binomial coefficients.” The triangle to provide the coefficients up to the eighth power is given by Chu Shih-chieh in a book written around 1303 (see Figure).[Notes 5] The original manuscript of Chu's book has been lost, but copies have survived. Edwards notes that there is an error in this copy of Chu's triangle. Can you find it? ( Hint: Two numbers which should be equal are not.) Other copies do not show this error.


The first appearance of Pascal's triangle in the West seems to have come from calculations of Tartaglia in calculating the number of possible ways that [math]n[/math] dice might turn up.[Notes 6] For one die the answer is clearly 6. For two dice the possibilities may be displayed as shown in Table.

11
12 22
13 23 33
14 24 34 44
15 25 35 45 55
16 26 36 46 56 66

Displaying them this way suggests the sixth triangular number [math]1 + 2 + 3 + 4 + 5 + 6 = 21[/math] for the throw of 2 dice. Tartaglia “on the first day of Lent, 1523, in Verona, having thought about the problem all night,”[Notes 7] realized that the extension of the figurate table gave the answers for [math]n[/math] dice. The problem had suggested itself to Tartaglia from watching people casting their own horoscopes by means of a Book of Fortune, selecting verses by a process which included noting the numbers on the faces of three dice. The 56 ways that three dice can fall were set out on each page. The way the numbers were written in the book did not suggest the connection with figurate numbers, but a method of enumeration similar to the one we used for 2 dice does. Tartaglia's table was not published until 1556.


A table for the binomial coefficients was published in 1554 by the German mathematician Stifel.[Notes 8] Pascal's triangle appears also in Cardano's Opus novum of 1570.[Notes 9] Cardano was interested in the problem of finding the number of ways to choose [math]r[/math] objects out of [math]n[/math]. Thus by the time of Pascal's work, his triangle had appeared as a result of looking at the figurate numbers, the combinatorial numbers, and the binomial numbers, and the fact that all three were the same was presumably pretty well understood.


Pascal's interest in the binomial numbers came from his letters with Fermat concerning a problem known as the problem of points. This problem, and the correspondence between Pascal and Fermat, were discussed in Chapter Discrete Probability Distributions. The reader will recall that this problem can be described as follows: Two players A and B are playing a sequence of games and the first player to win [math]n[/math] games wins the match. It is desired to find the probability that A wins the match at a time when A has won [math]a[/math] games and B has won [math]b[/math] games. (See Exercise - Exercise.)

Pascal solved the problem by backward induction, much the way we would do today in writing a computer program for its solution. He referred to the combinatorial method of Fermat which proceeds as follows: If A needs [math]c[/math] games and B needs [math]d[/math] games to win, we require that the players continue to play until they have played [math]c + d - 1[/math] games. The winner in this extended series will be the same as the winner in the original series. The probability that A wins in the extended series and hence in the original series is

[[math]] \sum_{r = c}^{c + d - 1} \frac 1{2^{c + d - 1}} {{c + d - 1} \choose r}\ . [[/math]]

Even at the time of the letters Pascal seemed to understand this formula.


Suppose that the first player to win [math]n[/math] games wins the match, and suppose that each player has put up a stake of [math]x[/math]. Pascal studied the value of winning a particular game. By this he meant the increase in the expected winnings of the winner of the particular game under consideration. He showed that the value of the first game is

[[math]] \frac {1\cdot3\cdot5\cdot\dots\cdot(2n - 1)}{2\cdot4\cdot6\cdot\dots\cdot(2n)}x\ . [[/math]]

His proof of this seems to use Fermat's formula and the fact that the above ratio of products of odd to products of even numbers is equal to the probability of exactly [math]n[/math] heads in [math]2n[/math] tosses of a coin. (See Exercise.)


Pascal presented Fermat with the table shown in Table.

Pascal's solution for the problem of points.
If each one staken 256 in
From my opponent's 256 positions I get, for the 6 games 5 games 4 games 3 games 2 games 1 game
1st game 63 70 80 96 128 256
2nd game 63 70 80 96 128
3rd game 56 60 64 64
4th game 42 40 32
5th game 24 16
6th game 8

He states:

You will see as always, that the value of the first game is equal to that of the second which is easily shown by combinations. You will see, in the same way, that the numbers in the first line are always increasing; so also are those in the second; and those in the third. But those in the fourth line are decreasing, and those in the fifth, etc. This seems odd.[Notes 10]


The student can pursue this question further using the computer and Pascal's backward iteration method for computing the expected payoff at any point in the series.


In his treatise, Pascal gave a formal proof of Fermat's combinatorial formula as well as proofs of many other basic properties of binomial numbers. Many of his proofs involved induction and represent some of the first proofs by this method. His book brought together all the different aspects of the numbers in the Pascal triangle as known in 1654, and, as Edwards states, “That the Arithmetical Triangle should bear Pascal's name cannot be disputed.”[Notes 11]

The first serious study of the binomial distribution was undertaken by James Bernoulli in his Ars Conjectandi published in 1713.[Notes 12] We shall return to this work in the historical remarks in Chapter Law of Large Numbers.

General references

Doyle, Peter G. (2006). "Grinstead and Snell's Introduction to Probability" (PDF). Retrieved June 6, 2024.

Notes

  1. W. Feller, Introduction to Probability Theory and Its Applications vol. 1, 3rd ed. (New York: John Wiley and Sons, 1968), p. 41
  2. B. Pascal, Traité du Triangle Arithmétique (Paris: Desprez, 1665).
  3. A. W. F. Edwards, Pascal's Arithmetical Triangle (London: Griffin, 1987).
  4. ibid., p. 27.
  5. J. Needham, Science and Civilization in China, vol. 3 (New York: Cambridge University Press, 1959), p. 135.
  6. N. Tartaglia, General Trattato di Numeri et Misure (Vinegia, 1556).
  7. Quoted in Edwards, op. cit., p. 37.
  8. M. Stifel, Arithmetica Integra (Norimburgae, 1544).
  9. G. Cardano, Opus Novum de Proportionibus Numerorum (Basilea, 1570).
  10. F. N. David, op. cit., p. 235.
  11. A. W. F. Edwards, op. cit., p. ix.
  12. J. Bernoulli, Ars Conjectandi (Basil: Thurnisiorum, 1713).