Generating Functions for Discrete Distributions
So far we have considered in detail only the two most important attributes of a random variable, namely, the mean and the variance. We have seen how these attributes enter into the fundamental limit theorems of probability, as well as into all sorts of practical calculations. We have seen that the mean and variance of a random variable contain important information about the random variable, or, more precisely, about the distribution function of that variable. Now we shall see that the mean and variance do not contain all the available information about the density function of a random variable. To begin with, it is easy to give examples of different distribution functions which have the same mean and the same variance. For instance, suppose [math]X[/math] and [math]Y[/math] are random variables, with distributions
Then with these choices, we have [math]E(X) = E(Y) = 7/2[/math] and [math]V(X) = V(Y) = 9/4[/math], and yet certainly [math]p_X[/math] and [math]p_Y[/math] are quite different density functions. This raises a question: If [math]X[/math] is a random variable with range [math]\{x_1, x_2, \ldots\}[/math] of at most countable size, and distribution function [math]p = p_X[/math], and if we know its mean [math]\mu = E(X)[/math] and its variance [math]\sigma^2 = V(X)[/math], then what else do we need to know to determine [math]p[/math] completely?
Moments
A nice answer to this question, at least in the case that [math]X[/math] has finite range, can be given in terms of the moments of [math]X[/math], which are numbers defined as follows:
provided the sum converges. Here [math]p(x_j) = P(X = x_j)[/math].
In terms of these moments, the mean [math]\mu[/math] and variance [math]\sigma^2[/math] of [math]X[/math] are
given simply by
so that a knowledge of the first two moments of [math]X[/math] gives us its mean and variance. But a knowledge of all the moments of [math]X[/math] determines its distribution function [math]p[/math] completely.
Moment Generating Functions
To see how this comes about, we introduce a new variable [math]t[/math], and define a function [math]g(t)[/math] as follows:
We call [math]g(t)[/math] the moment generating function for [math]X[/math], and think of it as a convenient bookkeeping device for describing the moments of [math]X[/math]. Indeed, if we differentiate [math]g(t)[/math] [math]n[/math] times and then set [math]t = 0[/math], we get [math]\mu_n[/math]:
It is easy to calculate the moment generating function for simple examples.
Examples
Example Suppose [math]X[/math] has range [math]\{1,2,3,\ldots,n\}[/math] and [math]p_X(j) = 1/n[/math] for [math]1 \leq j \leq n[/math] (uniform distribution). Then
If we use the expression on the right-hand side of the second line above, then it is easy to see that
and that [math]\mu = \mu_1 = (n + 1)/2[/math] and [math]\sigma^2 = \mu_2 - \mu_1^2 = (n^2 - 1)/12[/math].
Example Suppose now that [math]X[/math] has range [math]\{0,1,2,3,\ldots,n\}[/math] and [math]p_X(j) = {n \choose j} p^j q^{n - j}[/math] for [math]0 \leq j \leq n[/math] (binomial distribution). Then
Note that
so that [math]\mu = \mu_1 = np[/math], and [math]\sigma^2 = \mu_2 - \mu_1^2 = np(1 - p)[/math], as expected.
Example Suppose [math]X[/math] has range [math]\{1,2,3,\ldots\}[/math] and [math]p_X(j) = q^{j - 1}p[/math] for all [math]j[/math] (geometric distribution). Then
Here
[math]\mu = \mu_1 = 1/p[/math], and [math]\sigma^2 = \mu_2 - \mu_1^2 = q/p^2[/math], as computed in Example.
Example Let [math]X[/math] have range [math]\{0,1,2,3,\ldots\}[/math] and let [math]p_X(j) =e^{-\lambda}\lambda^j/j![/math] for all [math]j[/math] (Poisson distribution with mean [math]\lambda[/math]). Then
Then
[math]\mu = \mu_1 = \lambda[/math], and [math]\sigma^2 = \mu_2 - \mu_1^2 = \lambda[/math]. The variance of the Poisson distribution is easier to obtain in this way than directly from the definition (as was done in Exercise).
Moment Problem
Using the moment generating function, we can now show, at least in the case of a discrete random variable with finite range, that its distribution function is completely determined by its moments.
Let [math]X[/math] be a discrete random variable with finite range [math]\{x_1,x_2,\ldots, x_n\}[/math], distribution function [math]p[/math], and moment generating function [math]g[/math]. Then [math]g[/math] is uniquely determined by [math]p[/math], and conversely.
Show ProofWe know that [math]p[/math] determines [math]g[/math], since
If we delete the hypothesis that [math]X[/math] have finite range in the above theorem, then the conclusion is no longer necessarily true.
Ordinary Generating Functions
In the special but important case where the [math]x_j[/math] are all nonnegative integers, [math]x_j = j[/math], we can prove this theorem in a simpler way. In this case, we have
and we see that [math]g(t)[/math] is a polynomial in [math]e^t[/math]. If we write [math]z = e^t[/math], and define the function [math]h[/math] by
then [math]h(z)[/math] is a polynomial in [math]z[/math] containing the same information as [math]g(t)[/math], and in fact
The function [math]h(z)[/math] is often called the ordinary generating function for [math]X[/math]. Note that [math]h(1) = g(0) = 1[/math], [math]h'(1) = g'(0) = \mu_1[/math], and [math]h''(1) = g''(0) - g'(0) = \mu_2 - \mu_1[/math]. It follows from all this that if we know [math]g(t)[/math], then we know [math]h(z)[/math], and if we know [math]h(z)[/math], then we can find the [math]p(j)[/math] by Taylor's formula:
For example, suppose we know that the moments of a certain discrete random
variable [math]X[/math] are given by
Then the moment generating function [math]g[/math] of [math]X[/math] is
This is a polynomial in [math]z = e^t[/math], and
Hence, [math]X[/math] must have range [math]\{0,1,2\}[/math], and [math]p[/math] must have values [math]\{1/4,1/2,1/4\}[/math].
Properties
Both the moment generating function [math]g[/math] and the ordinary generating function [math]h[/math] have many properties useful in the study of random variables, of which we can consider only a few here. In particular, if [math]X[/math] is any discrete random variable and [math]Y = X + a[/math], then
while if [math]Y = bX[/math], then
In particular, if
then (see Exercise)
If [math]X[/math] and [math]Y[/math] are independent random variables and [math]Z = X + Y[/math] is their sum, with [math]p_X[/math], [math]p_Y[/math], and [math]p_Z[/math] the associated distribution functions, then we have seen in Sums of Independent Random Variables that [math]p_Z[/math] is the convolution of [math]p_X[/math] and [math]p_Y[/math], and we know that convolution involves a rather complicated calculation. But for the generating functions we have instead the simple relations
that is, [math]g_Z[/math] is simply the product of [math]g_X[/math] and [math]g_Y[/math], and similarly for [math]h_Z[/math]. To see this, first note that if [math]X[/math] and [math]Y[/math] are independent, then [math]e^{tX}[/math] and [math]e^{tY}[/math] are independent (see Exercise), and hence
It follows that
and, replacing [math]t[/math] by [math]\log z[/math], we also get
Example If [math]X[/math] and [math]Y[/math] are independent discrete random variables with range [math]\{0,1,2,\ldots,n\}[/math] and binomial distribution
and if [math]Z = X + Y[/math], then we know (cf. Section \ref{sec 7.1}) that the range of [math]X[/math] is
and [math]X[/math] has binomial distribution
Here we can easily verify this result by using generating functions. We know that
and
Hence, we have
or, what is the same,
from which we can see that the coefficient of [math]z^j[/math] is just [math]p_Z(j) = {2n \choose j} p^j q^{2n - j}[/math].
Example If [math]X[/math] and [math]Y[/math] are independent discrete random variables with the non-negative integers [math]\{0,1,2,3,\ldots\}[/math] as range, and with geometric distribution function
then
and if [math]Z = X + Y[/math], then
If we replace [math]e^t[/math] by [math]z[/math], we get
and we can read off the values of [math]p_Z(j)[/math] as the coefficient of [math]z^j[/math] in this expansion for [math]h(z)[/math], even though [math]h(z)[/math] is not a polynomial in this case. The distribution [math]p_Z[/math] is a negative binomial distribution (see Important Distributions).
Here is a more interesting example of the power and scope of the method of generating functions.
Heads or Tails
Example In the coin-tossing game discussed in Example, we now consider the question “When is Peter first in the lead?”
Let [math]X_k[/math] describe the outcome of the [math]k[/math]th trial in the game
Then the [math]X_k[/math] are independent random variables describing a Bernoulli process. Let [math]S_0 = 0[/math], and, for [math]n \geq 1[/math], let
Then [math]S_n[/math] describes Peter's fortune after [math]n[/math] trials, and Peter is first in the lead after [math]n[/math] trials if [math]S_k \leq 0[/math] for [math]1 \leq k \lt n[/math] and [math]S_n = 1[/math].
Now this can happen when [math]n = 1[/math], in which case [math]S_1 = X_1 = 1[/math], or when [math]n \gt
1[/math], in which case [math]S_1 = X_1 = -1[/math]. In the latter case, [math]S_k = 0[/math] for [math]k = n -
1[/math], and perhaps for other [math]k[/math] between 1 and [math]n[/math]. Let [math]m[/math] be the least
such value of [math]k[/math]; then [math]S_m = 0[/math] and [math]S_k \lt 0[/math] for [math]1 \leq k \lt m[/math]. In this
case Peter loses on the first trial, regains his initial position in the next
[math]m - 1[/math] trials, and gains the lead in the next [math]n - m[/math] trials.
Let [math]p[/math] be the probability that the coin comes up heads, and let [math]q = 1-p[/math].
Let [math]r_n[/math] be the probability that Peter is first in the lead after [math]n[/math] trials.
Then from the discussion above, we see that
Now let [math]T[/math] describe the time (that is, the number of trials) required for Peter to take the lead. Then [math]T[/math] is a random variable, and since [math]P(T = n) = r_n[/math], [math]r[/math] is the distribution function for [math]T[/math]. We introduce the generating function [math]h_T(z)[/math] for [math]T[/math]:
Then, by using the relations above, we can verify the relation
If we solve this quadratic equation for [math]h_T(z)[/math], we get
Of these two solutions, we want the one that has a convergent power series in [math]z[/math] (i.e., that is finite for [math]z = 0[/math]). Hence we choose
Now we can ask: What is the probability that Peter is ever in the lead? This probability is given by (see Exercise)
so that Peter is sure to be in the lead eventually if [math]p \geq q[/math].
How long will it take? That is, what is the expected value of [math]T[/math]? This value
is given by
This says that if [math]p \gt q[/math], then Peter can expect to be in the lead by about [math]1/(p - q)[/math] trials, but if [math]p = q[/math], he can expect to wait a long time.
A related problem, known as the Gambler's Ruin problem, is studied in Exercise and in Gambler's Ruin.
General references
Doyle, Peter G. (2006). "Grinstead and Snell's Introduction to Probability" (PDF). Retrieved June 6, 2024.