Branching Processes

Historical Background

In this section we apply the theory of generating functions to the study of an important chance process called a branching process. Until recently it was thought that the theory of branching processes originated with the following problem posed by Francis Galton in the Educational Times in 1873.^{[Notes 1]}

Problem 4001: A large nation, of whom we will only concern ourselves with the adult males, [math]N[/math] in number, and who each bear separate surnames, colonise a district. Their law of population is such that, in each generation, [math]a_0[/math] per cent of the adult males have no male children who reach adult life; [math]a_1[/math] have one such male child; [math]a_2[/math] have two; and so on up to [math]a_5[/math] who have five. Find (1) what proportion of the surnames will have become extinct after [math]r[/math] generations; and (2) how many instances there will be of the same surname being held by [math]m[/math] persons.

The first attempt at a solution was given by Reverend H. W. Watson. Because of a mistake in algebra, he incorrectly concluded that a family name would always die out with probability 1. However, the methods that he employed to solve the problems were, and still are, the basis for obtaining the correct solution.

Heyde and Seneta discovered an earlier communication by Bienaymé (1845) that anticipated Galton and Watson by 28 years. Bienaymé showed, in fact, that he was aware of the correct solution to Galton's problem. Heyde and Seneta in their book I. J. Bienaymé: Statistical Theory Anticipated,^{[Notes 2]} give the following translation from Bienaymé's paper:

If ... the mean of the number of male children who replace the number of males of the preceding generation were less than unity, it would be easily realized that families are dying out due to the disappearance of the members of which they are composed. However, the analysis shows further that when this mean is equal to unity families tend to disappear, although less rapidly ... The analysis also shows clearly that if the mean ratio is greater than unity, the probability of the extinction of families with the passing of time no longer reduces to certainty. It only approaches a finite limit, which is fairly simple to calculate and which has the singular characteristic of being given by one of the roots of the equation (in which the number of generations is made infinite) which is not relevant to the question when the mean ratio is less than unity.^{[Notes 3]}

Although Bienaymé does not give his reasoning for these results, he did indicate that he intended to publish a special paper on the problem. The paper was never written, or at least has never been found. In his communication Bienaymé indicated that he was motivated by the same problem that occurred to Galton. The opening paragraph of his paper as translated by Heyde and Seneta says,

A great deal of consideration has been given to the possible multiplication of the numbers of mankind; and recently various very curious observations have been published on the fate which allegedly hangs over the aristocrary and middle classes; the families of famous men, etc. This fate, it is alleged, will inevitably bring about the disappearance of the so-called families fermées.^{[Notes 4]}

A much more extensive discussion of the history of branching processes may be found in two papers by David G. Kendall.^{[Notes 5]} Branching processes have served not only as crude models for population growth but also as models for certain physical processes such as chemical and nuclear chain reactions.

Problem of Extinction

We turn now to the first problem posed by Galton (i.e., the problem of finding the probability of extinction for a branching process). We start in the 0th generation with 1 male parent. In the first generation we shall have 0, 1,2, 3, ... male offspring with probabilities [math]p_0,p_1,p_2,p_3,\ldots[/math] If in the first generation there are [math]k[/math] offspring, then in the second generation there will be [math]X_1 + X_2 +\cdots+ X_k[/math] offspring, where [math]X_1, X_2, \ldots, X_k[/math] are independent random variables, each with the common distribution [math]p_0, p_1, p_2, \ldots[/math]. This description enables us to construct a tree, and a tree measure, for any number of generations.

Examples

Example Assume that [math]p_0 = 1/2[/math], [math]p_1 = 1/4[/math], and [math]p_2 = 1/4[/math]. Then the tree measure for the first two generations is shown in Figure.

Note that we use the theory of sums of independent random variables to assign branch probabilities. For example, if there are two offspring in the first generation, the probability that there will be two in the second generation is

[[math]] \begin{eqnarray*} P(X_1 + X_2 = 2) &=& p_0p_2 + p_1p_1 + p_2p_0 \\ &=& \frac12\cdot\frac14 + \frac14\cdot\frac14 + \frac14\cdot\frac12 = \frac 5{16}\ . \end{eqnarray*} [[/math]]

We now study the probability that our process dies out (i.e., that at some generation there are no offspring).

Let [math]d_m[/math] be the probability that the process dies out by the [math]m[/math]th generation. Of course, [math]d_0 = 0[/math]. In our example, [math]d_1 = 1/2[/math] and [math]d_2 = 1/2 + 1/8 + 1/16 = 11/16[/math] (see Figure). Note that we must add the probabilities for all paths that lead to 0 by the [math]m[/math]th generation. It is clear from the definition that

[[math]] 0 = d_0 \leq d_1 \leq d_2 \leq\cdots\leq 1\ . [[/math]]

Hence, [math]d_m[/math] converges to a limit [math]d[/math], [math]0 \leq d \leq 1[/math], and [math]d[/math] is the probability that the process will ultimately die out. It is this value that we wish to determine. We begin by expressing the value [math]d_m[/math] in terms of all possible outcomes on the first generation. If there are [math]j[/math] offspring in the first generation, then to die out by the [math]m[/math]th generation, each of these lines must die out in [math]m - 1[/math] generations. Since they proceed independently, this probability is [math](d_{m - 1})^j[/math]. Therefore

[[math]] \begin{equation} d_m = p_0 + p_1d_{m - 1} + p_2(d_{m - 1})^2 + p_3(d_{m - 1})^3 +\cdots\ . \label{eq 10.2.1} \end{equation} [[/math]]

Let [math]h(z)[/math] be the ordinary generating function for the [math]p_i[/math]:

[[math]] h(z) = p_0 + p_1z + p_2z^2 +\cdots\ . [[/math]]

Using this generating function, we can rewrite Equation \ref{eq 10.2.1} in the form

[[math]] \begin{equation} d_m = h(d_{m - 1})\ . \label{eq 10.2.2} \end{equation} [[/math]]

Since [math]d_m \to d[/math], by Equation \ref{eq 10.2.2} we see that the value [math]d[/math] that we are looking for satisfies the equation

[[math]] \begin{equation} d = h(d)\ . \label{eq 10.2.3} \end{equation} [[/math]]

One solution of this equation is always [math]d = 1[/math], since

[[math]] 1 = p_0 + p_1 + p_2 +\cdots\ . [[/math]]

This is where Watson made his mistake. He assumed that 1 was the only solution to Equation \ref{eq 10.2.3}. To examine this question more carefully, we first note that solutions to Equation \ref{eq 10.2.3} represent intersections of the graphs of

[[math]] y = z [[/math]]

and

[[math]] y = h(z) = p_0 + p_1z + p_2z^2 +\cdots\ . [[/math]]

Thus we need to study the graph of [math]y = h(z)[/math]. We note that [math]h(0) = p_0[/math]. Also,

[[math]] \begin{equation} h'(z) = p_1 + 2p_2z + 3p_3z^2 +\cdots\ , \label{eq 10.2.4} \end{equation} [[/math]]

and

[[math]] h''(z) = 2p_2 + 3 \cdot 2p_3z + 4 \cdot 3p_4z^2 + \cdots\ . [[/math]]

From this we see that for [math]z \geq 0[/math], [math]h'(z) \geq 0[/math] and [math]h''(z) \geq 0[/math]. Thus for nonnegative [math]z[/math], [math]h(z)[/math] is an increasing function and is concave upward. Therefore the graph of [math]y = h(z)[/math] can intersect the line [math]y = z[/math] in at most two points. Since we know it must intersect the line [math]y = z[/math] at [math](1,1)[/math], we know that there are just three possibilities, as shown in Figure.

Graphs of [math]y = z[/math] and [math]y = h(z)[/math].

In case (a) the equation [math]d = h(d)[/math] has roots [math]\{d,1\}[/math] with [math]0 \leq d \lt 1[/math]. In the second case (b) it has only the one root [math]d = 1[/math]. In case (c) it has two roots [math]\{1,d\}[/math] where [math]1 \lt d[/math]. Since we are looking for a solution [math]0 \leq d \leq 1[/math], we see in cases (b) and (c) that our only solution is 1. In these cases we can conclude that the process will die out with probability 1. However in case (a) we are in doubt. We must study this case more carefully.

From Equation \ref{eq 10.2.4} we see that

[[math]] h'(1) = p_1 + 2p_2 + 3p_3 +\cdots= m\ , [[/math]]

where [math]m[/math] is the expected number of offspring produced by a single parent. In case (a) we have [math]h'(1) \gt 1[/math], in (b) [math]h'(1) = 1[/math], and in (c) [math]h'(1) \lt 1[/math]. Thus our three cases correspond to [math]m \gt 1[/math], [math]m = 1[/math], and [math]m \lt 1[/math]. We assume now that [math]m \gt 1[/math]. Recall that [math]d_0 = 0[/math], [math]d_1 = h(d_0) = p_0[/math], [math]d_2 = h(d_1)[/math], \ldots, and [math]d_n = h(d_{n - 1})[/math]. We can construct these values geometrically, as shown in Figure.

Geometric determination of [math]d[/math].

We can see geometrically, as indicated for [math]d_0[/math], [math]d_1[/math], [math]d_2[/math], and [math]d_3[/math] in Figure, that the points [math](d_i,h(d_i))[/math] will always lie above the line [math]y = z[/math]. Hence, they must converge to the first intersection of the curves [math]y = z[/math] and [math]y = h(z)[/math] (i.e., to the root [math]d \lt 1[/math]). This leads us to the following theorem.

Theorem

Consider a branching process with generating function [math]h(z)[/math] for the number of offspring of a given parent. Let [math]d[/math] be the smallest root of the equation [math]z = h(z)[/math]. If the mean number [math]m[/math] of offspring produced by a single parent is [math]{} \leq 1[/math], then [math]d = 1[/math] and the process dies out with probability 1. If [math]m \gt 1[/math] then [math]d \lt 1[/math] and the process dies out with probability [math]d[/math].

We shall often want to know the probability that a branching process dies out by a particular generation, as well as the limit of these probabilities. Let [math]d_n[/math] be the probability of dying out by the [math]n[/math]th generation. Then we know that [math]d_1 = p_0[/math]. We know further that [math]d_n = h(d_{n - 1})[/math] where [math]h(z)[/math] is the generating function for the number of offspring produced by a single parent. This makes it easy to compute these probabilities.

The program Branch calculates the values of [math]d_n[/math]. We have run this program for 12 generations for the case that a parent can produce at most two offspring and the probabilities for the number produced are [math]p_0 = .2[/math], [math]p_1 = .5[/math], and [math]p_2 = .3[/math]. The results are given in Table.

Probability of dying out.
Generation	Probability of dying out
1	.2
2	.312
3	.385203
4	.437116
5	.475879
6	.505878
7	.529713
8	.549035
9	.564949
10	.578225
11	.589416
12	.598931

We see that the probability of dying out by 12 generations is about .6. We shall see in the next example that the probability of eventually dying out is 2/3, so that even 12 generations is not enough to give an accurate estimate for this probability.

We now assume that at most two offspring can be produced. Then

[[math]] h(z) = p_0 + p_1z + p_2z^2\ . [[/math]]

In this simple case the condition [math]z = h(z)[/math] yields the equation

[[math]] d = p_0 + p_1d + p_2d^2\ , [[/math]]

which is satisfied by [math]d = 1[/math] and [math]d = p_0/p_2[/math]. Thus, in addition to the root [math]d = 1[/math] we have the second root [math]d = p_0/p_2[/math]. The mean number [math]m[/math] of offspring produced by a single parent is

[[math]] m = p_1 + 2p_2 = 1 - p_0 - p_2 + 2p_2 = 1 - p_0 + p_2\ . [[/math]]

Thus, if [math]p_0 \gt p_2[/math], [math]m \lt 1[/math] and the second root is [math]{} \gt 1[/math]. If [math]p_0 = p_2[/math], we have a double root [math]d = 1[/math]. If [math]p_0 \lt p_2[/math], [math]m \gt 1[/math] and the second root [math]d[/math] is less than 1 and represents the probability that the process will die out.

Example Keyfitz^{[Notes 6]} compiled and analyzed data on the continuation of the female family line among Japanese women. His estimates at the basic probability distribution for the number of female children born to Japanese women of ages 45--49 in 1960 are given in Table.

Distribution of number of female children.
[math]p_0[/math]	= .2092
[math]p_1[/math]	= .2584
[math]p_2[/math]	= .2360
[math]p_3[/math]	= .1593
[math]p_4[/math]	= .0828
[math]p_5[/math]	= .0357
[math]p_6[/math]	= .0133
[math]p_7[/math]	= .0042
[math]p_8[/math]	= .0011
[math]p_9[/math]	= .0002
[math]p_{10}[/math]	= .0000

The expected number of girls in a family is then 1.837 so the probability [math]d[/math] of extinction is less than 1. If we run the program Branch, we can estimate that [math]d[/math] is in fact only about .324.

Distribution of Offspring

So far we have considered only the first of the two problems raised by Galton, namely the probability of extinction. We now consider the second problem, that is, the distribution of the number [math]Z_n[/math] of offspring in the [math]n[/math]th generation. The exact form of the distribution is not known except in very special cases. We shall see, however, that we can describe the limiting behavior of [math]Z_n[/math] as [math]n \to \infty[/math].

We first show that the generating function [math]h_n(z)[/math] of the distribution of [math]Z_n[/math] can be obtained from [math]h(z)[/math] for any branching process.

We recall that the value of the generating function at the value [math]z[/math] for any random variable [math]X[/math] can be written as

[[math]] h(z) = E(z^X) = p_0 + p_1z + p_2z^2 +\cdots\ . [[/math]]

That is, [math]h(z)[/math] is the expected value of an experiment which has outcome [math]z^j[/math] with probability [math]p_j[/math].

Let [math]S_n = X_1 + X_2 +\cdots+ X_n[/math] where each [math]X_j[/math] has the same integer-valued distribution [math](p_j)[/math] with generating function [math]k(z) = p_0 + p_1z + p_2z^2 +\cdots.[/math] Let [math]k_n(z)[/math] be the generating function of [math]S_n[/math]. Then using one of the properties of ordinary generating functions discussed in Generating Functions for Discrete Distributions, we have

[[math]] k_n(z) = (k(z))^n\ , [[/math]]

since the [math]X_j[/math]'s are independent and all have the same distribution.

Consider now the branching process [math]Z_n[/math]. Let [math]h_n(z)[/math] be the generating function of [math]Z_n[/math]. Then

[[math]] \begin{eqnarray*} h_{n + 1}(z) &=& E(z^{Z_{n + 1}}) \\ &=& \sum_k E(z^{Z_{n + 1}} | Z_n = k) P(Z_n = k)\ . \end{eqnarray*} [[/math]]

If [math]Z_n = k[/math], then [math]Z_{n + 1} = X_1 + X_2 +\cdots+ X_k[/math] where [math]X_1[/math], [math]X_2[/math], \ldots, [math]X_k[/math] are independent random variables with common generating function [math]h(z)[/math]. Thus

[[math]] E(z^{Z_{n + 1}} | Z_n = k) = E(z^{X_1 + X_2 +\cdots+ X_k}) = (h(z))^k\ , [[/math]]

and

[[math]] h_{n + 1}(z) = \sum_k (h(z))^k P(Z_n = k)\ . [[/math]]

But

[[math]] h_n(z) = \sum_k P(Z_n = k) z^k\ . [[/math]]

Thus,

[[math]] \begin{equation} h_{n + 1}(z) = h_n(h(z))\ . \label{eq 10.2.5} \end{equation} [[/math]]

If we differentiate Equation \ref{eq 10.2.5} and use the chain rule we have

[[math]] h'_{n+1}(z) = h'_n(h(z)) h'(z) . [[/math]]

Putting [math]z = 1[/math] and using the fact that [math]h(1) = 1[/math], [math]h'(1) = m[/math], and [math]h_n'(1) = m_n = {}[/math]the mean number of offspring in the [math]n[/math]'th generation, we have

[[math]] m_{n + 1} = m_n \cdot m\ . [[/math]]

Thus, [math]m_2 = m \cdot m = m^2[/math], [math]m_3 = m^2 \cdot m = m^3[/math], and in general

[[math]] m_n = m^n\ . [[/math]]

Thus, for a branching process with [math]m \gt 1[/math], the mean number of offspring grows exponentially at a rate [math]m[/math].

Examples

Example For the branching process of Example we have

[[math]] \begin{eqnarray*} h(z) &=& 1/2 + (1/4)z + (1/4)z^2\ , \\ h_2(z) &=& h(h(z)) = 1/2 + (1/4)[1/2 + (1/4)z + (1/4)z^2] \\ &=& + (1/4)[1/2 + (1/4)z + (1/4)z^2]^2 \\ &=& 11/16 + (1/8)z + (9/64)z^2 + (1/32)z^3 + (1/64)z^4\ . \end{eqnarray*} [[/math]]

The probabilities for the number of offspring in the second generation agree with those obtained directly from the tree measure (see Figure 1).

It is clear that even in the simple case of at most two offspring, we cannot easily carry out the calculation of [math]h_n(z)[/math] by this method. However, there is one special case in which this can be done.

Example Assume that the probabilities [math]p_1, p_2, \ldots[/math] form a geometric series: [math]p_k = bc^{k - 1}[/math], [math]k = 1, 2, \ldots,[/math] with [math]0 \lt b \leq 1 - c[/math] and [math]0 \lt c \lt 1[/math]. Then we have

[[math]] \begin{eqnarray*} p_0 &=& 1 - p_1 - p_2 -\cdots \\ &=& 1 - b - bc - bc^2 -\cdots \\ &=& 1 - \frac b{1 - c}\ . \end{eqnarray*} [[/math]]

The generating function [math]h(z)[/math] for this distribution is

[[math]] \begin{eqnarray*} h(z) &=& p_0 + p_1z + p_2z^2 +\cdots \\ &=& 1 - \frac b{1 - c} + bz + bcz^2 + bc^2z^3 +\cdots \\ &=& 1 - \frac b{1 - c} + \frac{bz}{1 - cz}\ . \end{eqnarray*} [[/math]]

From this we find

[[math]] h'(z) = \frac{bcz}{(1 - cz)^2} + \frac b{1 - cz} = \frac b{(1 - cz)^2} [[/math]]

and

[[math]] m = h'(1) = \frac b{(1 - c)^2}\ . [[/math]]

We know that if [math]m \leq 1[/math] the process will surely die out and [math]d = 1[/math]. To find the probability [math]d[/math] when [math]m \gt 1[/math] we must find a root [math]d \lt 1[/math] of the equation

[[math]] z = h(z)\ , [[/math]]

or

[[math]] z = 1 - \frac b{1 - c} + \frac{bz}{1 - cz}. [[/math]]

This leads us to a quadratic equation. We know that [math]z = 1[/math] is one solution. The other is found to be

[[math]] d = \frac{1 - b - c}{c(1 - c)}. [[/math]]

It is easy to verify that [math]d \lt 1[/math] just when [math]m \gt 1[/math]. It is possible in this case to find the distribution of [math]Z_n[/math]. This is done by first finding the generating function [math]h_n(z)[/math].^{[Notes 7]} The result for [math]m \ne 1[/math] is:

[[math]] h_n(z) = 1 - m^n \left[\frac{1 - d}{m^n - d}\right] + \frac{m^n \left[\frac{1 - d}{m^n - d}\right]^2 z} {1 - \left[\frac{m^n - 1}{m^n - d}\right]z}\ . [[/math]]

The coefficients of the powers of [math]z[/math] give the distribution for [math]Z_n[/math]:

[[math]] P(Z_n = 0) = 1 - m^n\frac{1 - d}{m^n - d} = \frac{d(m^n - 1)}{m^n - d} [[/math]]

and

[[math]] P(Z_n = j) = m^n \Bigl(\frac{1 - d}{m^n - d}\Bigr)^2 \cdot \Bigl(\frac{m^n - 1}{ m^n - d}\Bigr)^{j - 1}, [[/math]]

for [math]j \geq 1[/math].

Example Let us re-examine the Keyfitz data to see if a distribution of the type considered in Example could reasonably be used as a model for this population. We would have to estimate from the data the parameters [math]b[/math] and [math]c[/math] for the formula [math]p_k = bc^{k - 1}[/math]. Recall that

[[math]] \begin{equation} m = \frac b{(1 - c)^2} \label{eq 10.2.7} \end{equation} [[/math]]

and the probability [math]d[/math] that the process dies out is

[[math]] \begin{equation} d = \frac{1 - b - c}{c(1 - c)}\ . \label{eq 10.2.8} \end{equation} [[/math]]

Solving \ref{eq 10.2.7} and \ref{eq 10.2.8} for [math]b[/math] and [math]c[/math] gives

[[math]] c = \frac{m - 1}{m - d} [[/math]]

and

[[math]] b = m\Bigl(\frac{1 - d}{m - d}\Bigr)^2\ . [[/math]]

We shall use the value 1.837 for [math]m[/math] and .324 for [math]d[/math] that we found in the Keyfitz example. Using these values, we obtain [math]b = .3666[/math] and [math]c = .5533[/math]. Note that [math](1 - c)^2 \lt b \lt 1 - c[/math], as required. In Table we give for comparison the probabilities [math]p_0[/math] through [math]p_8[/math] as calculated by the geometric distribution versus the empirical values.

Comparison of observed and expected frequencies.
[math]p_j[/math]	Data	Geometric Model
0	.2092	.1816
1	.2584	.3666
2	.2360	.2028
3	.1593	.1122
4	.0828	.0621
5	.0357	.0344
6	.0133	.0190
7	.0042	.0105
8	.0011	.0058
9	.0002	.0032
10	.0000	.0018

The geometric model tends to favor the larger numbers of offspring but is similar enough to show that this modified geometric distribution might be appropriate to use for studies of this kind. Recall that if [math]S_n = X_1 + X_2 +\cdots+ X_n[/math] is the sum of independent random variables with the same distribution then the Law of Large Numbers states that [math]S_n/n[/math] converges to a constant, namely [math]E(X_1)[/math]. It is natural to ask if there is a similar limiting theorem for branching processes. Consider a branching process with [math]Z_n[/math] representing the number of offspring after [math]n[/math] generations. Then we have seen that the expected value of [math]Z_n[/math] is [math]m^n[/math]. Thus we can scale the random variable [math]Z_n[/math] to have expected value 1 by considering the random variable

[[math]] W_n = \frac{Z_n}{m^n}\ . [[/math]]

In the theory of branching processes it is proved that this random variable [math]W_n[/math] will tend to a limit as [math]n[/math] tends to infinity. However, unlike the case of the Law of Large Numbers where this limit is a constant, for a branching process the limiting value of the random variables [math]W_n[/math] is itself a random variable. Although we cannot prove this theorem here we can illustrate it by simulation. This requires a little care. When a branching process survives, the number of offspring is apt to get very large. If in a given generation there are 1000 offspring, the offspring of the next generation are the result of 1000 chance events, and it will take a while to simulate these 1000 experiments. However, since the final result is the sum of 1000 independent experiments we can use the Central Limit Theorem to replace these 1000 experiments by a single experiment with normal density having the appropriate mean and variance. The program BranchingSimulation carries out this process.

We have run this program for the Keyfitz example, carrying out 10 simulations and graphing the results in Figure.

Simulation of [math]Z_n/m^n[/math] for the Keyfitz example.

The expected number of female offspring per female is 1.837, so that we are graphing the outcome for the random variables [math]W_n = Z_n/(1.837)^n[/math]. For three of the simulations the process died out, which is consistent with the value [math]d = .3[/math] that we found for this example. For the other seven simulations the value of [math]W_n[/math] tends to a limiting value which is different for each simulation.

Example We now examine the random variable [math]Z_n[/math] more closely for the case [math]m \lt 1[/math] (see Example). Fix a value [math]t \gt 0[/math]; let [math][tm^n][/math] be the integer part of [math]tm^n[/math]. Then

[[math]] \begin{eqnarray*} P(Z_n = [tm^n]) &=& m^n (\frac{1 - d}{m^n - d})^2 (\frac{m^n - 1}{m^n - d}) ^{[tm^n] - 1} \\ &=& \frac1{m^n}(\frac{1 - d}{1 - d/m^n})^2 (\frac{1 - 1/m^n} {1 - d/m^n})^{tm^n + a}\ , \end{eqnarray*} [[/math]]

where [math]|a| \leq 2[/math]. Thus, as [math]n \to \infty[/math],

[[math]] m^n P(Z_n = [tm^n]) \to (1 - d)^2 \frac{e^{-t}}{e^{-td}} = (1 - d)^2 e^{-t(1 - d)}\ . [[/math]]

For [math]t = 0[/math],

[[math]] P(Z_n = 0) \to d\ . [[/math]]

We can compare this result with the Central Limit Theorem for sums [math]S_n[/math] of integer-valued independent random variables (see Theorem), which states that if [math]t[/math] is an integer and [math]u = (t - n\mu)/\sqrt{\sigma^2 n}[/math], then as [math]n \to \infty[/math],

[[math]] \sqrt{\sigma^2 n}\, P(S_n = u\sqrt{\sigma^2 n} + \mu n) \to \frac1{\sqrt{2\pi}} e^{-u^2/2}\ . [[/math]]

We see that the form of these statements are quite similar. It is possible to prove a limit theorem for a general class of branching processes that states that under suitable hypotheses, as [math]n \to \infty[/math],

[[math]] m^n P(Z_n = [tm^n]) \to k(t)\ , [[/math]]

for [math]t \gt 0[/math], and

[[math]] P(Z_n = 0) \to d\ . [[/math]]

However, unlike the Central Limit Theorem for sums of independent random variables, the function [math]k(t)[/math] will depend upon the basic distribution that determines the process. Its form is known for only a very few examples similar to the one we have considered here.

Chain Letter Problem

Example An interesting example of a branching process was suggested by Free Huizinga.^{[Notes 8]} In 1978, a chain letter called the “Circle of Gold,” believed to have started in California, found its way across the country to the theater district of New York. The chain required a participant to buy a letter containing a list of 12 names for 100 dollars. The buyer gives 50 dollars to the person from whom the letter was purchased and then sends 50 dollars to the person whose name is at the top of the list. The buyer then crosses off the name at the top of the list and adds her own name at the bottom in each letter before it is sold again.

Let us first assume that the buyer may sell the letter only to a single person. If you buy the letter you will want to compute your expected winnings. (We are ignoring here the fact that the passing on of chain letters through the mail is a federal offense with certain obvious resulting penalties.) Assume that each person involved has a probability [math]p[/math] of selling the letter. Then you will receive 50 dollars with probability [math]p[/math] and another 50 dollars if the letter is sold to 12 people, since then your name would have risen to the top of the list. This occurs with probability [math]p^{12}[/math], and so your expected winnings are [math]-100 + 50p + 50p^{12}[/math]. Thus the chain in this situation is a highly unfavorable game.

It would be more reasonable to allow each person involved to make a copy of the list and try to sell the letter to at least 2 other people. Then you would have a chance of recovering your 100 dollars on these sales, and if any of the letters is sold 12 times you will receive a bonus of 50 dollars for each of these cases. We can consider this as a branching process with 12 generations. The members of the first generation are the letters you sell. The second generation consists of the letters sold by members of the first generation, and so forth.

Let us assume that the probabilities that each individual sells letters to 0, 1, or 2 others are [math]p_0[/math], [math]p_1[/math], and [math]p_2[/math], respectively. Let [math]Z_1[/math], [math]Z_2[/math], \ldots, [math]Z_{12}[/math] be the number of letters in the first 12 generations of this branching process. Then your expected winnings are

[[math]] 50(E(Z_1) + E(Z_{12})) = 50m + 50m^{12}\ , [[/math]]

where [math]m = p_1 + 2p_2[/math] is the expected number of letters you sold. Thus to be favorable we just have

[[math]] 50m + 50m^{12} \gt 100\ , [[/math]]

or

[[math]] m + m^{12} \gt 2\ . [[/math]]

But this will be true if and only if [math]m \gt 1[/math]. We have seen that this will occur in the quadratic case if and only if [math]p_2 \gt p_0[/math]. Let us assume for example that [math]p_0 = .2[/math], [math]p_1 = .5[/math], and [math]p_2 = .3[/math]. Then [math]m = 1.1[/math] and the chain would be a favorable game. Your expected profit would be

[[math]] 50(1.1 + 1.1^{12}) - 100 \approx 112\ . [[/math]]

The probability that you receive at least one payment from the 12th generation is [math]1 - d_{12}[/math]. We find from our program Branch that [math]d_{12} = .599[/math]. Thus, [math]1 - d_{12} = .401[/math] is the probability that you receive some bonus. The maximum that you could receive from the chain would be [math]50(2 + 2^{12}) = 204,00[/math] if everyone were to successfully sell two letters. Of course you can not always expect to be so lucky. (What is the probability of this happening?) To simulate this game, we need only simulate a branching process for 12 generations. Using a slightly modified version of our program BranchingSimulation we carried out twenty such simulations, giving the results shown in Table. Note that we were quite lucky on a few runs, but we came out ahead only a little less than half the time. The process died out by the twelfth generation in 12 out of the 20 experiments, in good agreement with the probability [math]d_{12} = .599[/math] that we calculated using the program Branch.

Simulation of chain letter (finite distribution case).
[math]Z_{1}[/math]	[math]Z_{2}[/math]	[math] Z_{3}[/math]	[math]Z_{4}[/math]	[math]Z_{5}[/math]	[math]Z_{6}[/math]	[math]Z_{7}[/math]	[math]Z_{8}[/math]	[math]Z_{9}[/math]	[math] Z_{10}[/math]	[math] Z_{11}[/math]	[math]Z_{12}[/math]	Profit
1	0	0	0	0	0	0	0	0	0	0	0	-50
1	1	2	3	2	3	2	1	2	3	3	6	250
0	0	0	0	0	0	0	0	0	0	0	0	-100
2	4	4	2	3	4	4	3	2	2	1	1	50
1	2	3	5	4	3	3	3	5	8	6	6	250
0	0	0	0	0	0	0	0	0	0	0	0	-100
2	3	2	2	2	1	2	3	3	3	4	6	300
1	2	1	1	1	1	2	1	0	0	0	0	-50
0	0	0	0	0	0	0	0	0	0	0	0	-100
1	0	0	0	0	0	0	0	0	0	0	0	-50
2	3	2	3	3	3	5	9	12	12	13	15	750
1	1	1	0	0	0	0	0	0	0	0	0	-50
1	2	2	3	3	0	0	0	0	0	0	0	-50
1	1	1	1	2	2	3	4	4	6	4	5	200
1	1	0	0	0	0	0	0	0	0	0	0	-50
1	0	0	0	0	0	0	0	0	0	0	0	-50
1	0	0	0	0	0	0	0	0	0	0	0	-50
1	1	2	3	3	4	2	3	3	3	3	2	50
1	2	4	6	6	9	10	13	16	17	15	18	850
1	0	0	0	0	0	0	0	0	0	0	0	-50

Let us modify the assumptions about our chain letter to let the buyer sell the letter to as many people as she can instead of to a maximum of two. We shall assume, in fact, that a person has a large number [math]N[/math] of acquaintances and a small probability [math]p[/math] of persuading any one of them to buy the letter. Then the distribution for the number of letters that she sells will be a binomial distribution with mean [math]m = Np[/math]. Since [math]N[/math] is large and [math]p[/math] is small, we can assume that the probability [math]p_j[/math] that an individual sells the letter to [math]j[/math] people is given by the Poisson distribution

[[math]] p_j = \frac{e^{-m} m^j}{j!}\ . [[/math]]

The generating function for the Poisson distribution is

[[math]] \begin{eqnarray*} h(z) &=& \sum_{j = 0}^\infty \frac{e^{-m} m^j z^j}{j!} \\ &=& e^{-m} \sum_{j = 0}^\infty \frac{m^j z^j}{j!} \\ &=& e^{-m} e^{mz} = e^{m(z - 1)}\ . \end{eqnarray*} [[/math]]

The expected number of letters that an individual passes on is [math]m[/math], and again to be favorable we must have [math]m \gt 1[/math]. Let us assume again that [math]m = 1.1[/math]. Then we can find again the probability [math]1 - d_{12}[/math] of a bonus from Branch. The result is .232. Although the expected winnings are the same, the variance is larger in this case, and the buyer has a better chance for a reasonably large profit. We again carried out 20 simulations using the Poisson distribution with mean 1.1. The results are shown in Table.

Simulation of chain letter (Poisson case).
[math]Z_{1}[/math]	[math]Z_{2}[/math]	[math] Z_{3}[/math]	[math]Z_{4}[/math]	[math]Z_{5}[/math]	[math]Z_{6}[/math]	[math]Z_{7}[/math]	[math]Z_{8}[/math]	[math]Z_{9}[/math]	[math] Z_{10}[/math]	[math] Z_{11}[/math]	[math]Z_{12}[/math]	Profit
1	2	6	7	7	8	11	9	7	6	6	5	200
1	0	0	0	0	0	0	0	0	0	0	0	-50
1	0	0	0	0	0	0	0	0	0	0	0	-50
1	1	1	0	0	0	0	0	0	0	0	0	-50
0	0	0	0	0	0	0	0	0	0	0	0	-100
1	1	1	1	1	1	2	4	9	7	9	7	300
2	3	3	4	2	0	0	0	0	0	0	0	0
1	0	0	0	0	0	0	0	0	0	0	0	-50
2	1	0	0	0	0	0	0	0	0	0	0	0
3	3	4	7	11	17	14	11	11	10	16	25	1300
0	0	0	0	0	0	0	0	0	0	0	0	-100
1	2	2	1	1	3	1	0	0	0	0	0	-50
0	0	0	0	0	0	0	0	0	0	0	0	-100
2	3	1	0	0	0	0	0	0	0	0	0	0
3	1	0	0	0	0	0	0	0	0	0	0	50
1	0	0	0	0	0	0	0	0	0	0	0	-50
3	4	4	7	10	11	9	11	12	14	13	10	550
1	3	3	4	9	5	7	9	8	8	6	3	100
1	0	4	6	6	9	10	13	0	0	0	0	-50
1	0	0	0	0	0	0	0	0	0	0	0	-50

We note that, as before, we came out ahead less than half the time, but we also had one large profit. In only 6 of the 20 cases did we receive any profit. This is again in reasonable agreement with our calculation of a probability .232 for this happening.

General references

Doyle, Peter G. (2006). "Grinstead and Snell's Introduction to Probability" (PDF). Retrieved June 6, 2024.

Notes

D.G. Kendall, “Branching Processes Since 1873,” Journal of London Mathematics Society, vol.41 (1966), p.386.
C. C. Heyde and E. Seneta, I. J. Bienaymé: Statistical Theory Anticipated (New York: Springer Verlag, 1977).
ibid., pp. 117--118.
ibid., p. 118.
D. G. Kendall, “Branching Processes Since 1873,” pp. 385--406; and “The Genealogy of Genealogy: Branching Processes Before (and After) 1873,” Bulletin London Mathematics Society, vol. 7 (1975), pp. 225--253.
N. Keyfitz, Introduction to the Mathematics of Population, rev. ed. (Reading, PA: Addison Wesley, 1977).
T. E. Harris, The Theory of Branching Processes (Berlin: Springer, 1963), p. 9.
Private communication.

[1] D.G. Kendall, “Branching Processes Since 1873,” Journal of London Mathematics Society, vol.41 (1966), p.386.

[2] C. C. Heyde and E. Seneta, I. J. Bienaymé: Statistical Theory Anticipated (New York: Springer Verlag, 1977).

[3] ., pp. 117--118.

[4] ., p. 118.

[5] D. G. Kendall, “Branching Processes Since 1873,” pp. 385--406; and “The Genealogy of Genealogy: Branching Processes Before (and After) 1873,” Bulletin London Mathematics Society, vol. 7 (1975), pp. 225--253.

[6] N. Keyfitz, Introduction to the Mathematics of Population, rev. ed. (Reading, PA: Addison Wesley, 1977).

[7] T. E. Harris, The Theory of Branching Processes (Berlin: Springer, 1963), p. 9.

[8] Private communication.

[Notes 1]

[Notes 2]

[Notes 3]

[Notes 4]

[Notes 5]

[Notes 6]

[Notes 7]

[Notes 8]