guide:04b2b9c99f: Difference between revisions

From Stochiki
No edit summary
 
No edit summary
Line 1: Line 1:
<div class="d-none"><math>
\newcommand{\NA}{{\rm NA}}
\newcommand{\mat}[1]{{\bf#1}}
\newcommand{\exref}[1]{\ref{##1}}
\newcommand{\secstoprocess}{\all}
\newcommand{\NA}{{\rm NA}}
\newcommand{\mathds}{\mathbb}</math></div>
label{sec
10.1}


So far we have considered in detail only the two most important attributes of a
random variable, namely, the mean and the variance.  We have seen how these
attributes enter into the fundamental limit theorems of probability, as well as
into all sorts of practical calculations.  We have seen that the mean and
variance of a random variable contain important information about the random
variable, or, more precisely, about the distribution function of that variable.
Now we shall see that the mean and variance do  ''not'' contain ''all''
the available information about the density function of a random variable.  To
begin with, it is easy to give examples of different distribution functions which
have the same mean and the same variance.  For instance, suppose <math>X</math> and <math>Y</math>
are random variables, with distributions
<math display="block">
p_X = \pmatrix{
1 & 2 & 3 & 4 & 5 & 6\cr
0 & 1/4 & 1/2 & 0 & 0 & 1/4\cr},
</math>
<math display="block">
p_Y = \pmatrix{
1 & 2 & 3 & 4 & 5 & 6\cr
1/4 & 0 & 0 & 1/2 & 1/4 & 0\cr}.
</math>
Then with these choices, we have <math>E(X) = E(Y) = 7/2</math> and <math>V(X) = V(Y) = 9/4</math>, and
yet certainly <math>p_X</math> and <math>p_Y</math> are quite different density functions.
This raises a question: If <math>X</math> is a random variable with range
<math>\{x_1, x_2, \ldots\}</math> of at most countable size, and distribution function <math>p = p_X</math>,
and if we know its mean <math>\mu = E(X)</math> and its variance <math>\sigma^2 = V(X)</math>, then
what else do we need to know to determine <math>p</math> completely?
===Moments===
A nice answer to this question, at least in the case that <math>X</math> has finite range, can be given in terms of
the
''moments'' of <math>X</math>, which are numbers defined as follows:
\pagebreak[4]
<math display="block">
\begin{eqnarray*}
\mu_k &=& k \mbox{th}\,\,\mbox{moment~of}\,\, X\\
    &=& E(X^k) \\
    &=& \sum_{j = 1}^\infty (x_j)^k p(x_j)\ ,
\end{eqnarray*}
</math>
provided the sum converges.  Here <math>p(x_j) = P(X = x_j)</math>.
In terms of these moments, the mean <math>\mu</math> and variance <math>\sigma^2</math> of <math>X</math> are
given simply by
<math display="block">
\begin{eqnarray*}
    \mu &=& \mu_1, \\
\sigma^2 &=& \mu_2 - \mu_1^2\ ,
\end{eqnarray*}
</math>
so that a knowledge of the first two moments of <math>X</math> gives us its mean and
variance.  But a knowledge of ''all'' the moments of <math>X</math> determines its
distribution function <math>p</math> completely.
===Moment Generating Functions===
To see how this comes about, we introduce a new variable <math>t</math>, and define a
function <math>g(t)</math> as follows:
<math display="block">
\begin{eqnarray*}
g(t) &=& E(e^{tX}) \\
    &=& \sum_{k = 0}^\infty \frac {\mu_k t^k}{k!} \\
    &=& E\left(\sum_{k = 0}^\infty \frac {X^k t^k}{k!} \right) \\
    &=& \sum_{j = 1}^\infty e^{tx_j} p(x_j)\ .
\end{eqnarray*}
</math>
We call <math>g(t)</math> the ''moment generating function'' for <math>X</math>, and think of it as a convenient
bookkeeping device for describing the moments of <math>X</math>.  Indeed, if we differentiate <math>g(t)</math> <math>n</math>
times and then set
<math>t = 0</math>, we get <math>\mu_n</math>:
<math display="block">
\begin{eqnarray*}
\left. \frac {d^n}{dt^n} g(t) \right|_{t = 0} &=& g^{(n)}(0) \\
    &=& \left. \sum_{k = n}^\infty \frac {k!\, \mu_k t^{k - n}} {(k - n)!\, k!}
\right|_{t = 0} \\
    &=& \mu_n\ .
\end{eqnarray*}
</math>
It is easy to calculate the moment generating function for simple examples.
===Examples===
<span id="exam 10.1"/>
'''Example'''
Suppose <math>X</math> has range <math>\{1,2,3,\ldots,n\}</math> and <math>p_X(j) = 1/n</math> for <math>1 \leq j
\leq n</math> (uniform distribution).  Then
<math display="block">
\begin{eqnarray*}
g(t) &=& \sum_{j = 1}^n \frac 1n e^{tj} \\
    &=& \frac 1n (e^t + e^{2t} +\cdots+ e^{nt}) \\
    &=& \frac {e^t (e^{nt} - 1)} {n (e^t - 1)}\ .
\end{eqnarray*}
</math>
If we use the expression on the right-hand side of the second line above, then it is
easy to see that
<math display="block">
\begin{eqnarray*}
\mu_1 &=& g'(0) = \frac 1n (1 + 2 + 3 + \cdots + n) = \frac {n + 1}2, \\
\mu_2 &=& g''(0) = \frac 1n (1 + 4 + 9+ \cdots + n^2) = \frac {(n + 1)(2n + 1)}6\ ,
\end{eqnarray*}
</math>
and that <math>\mu = \mu_1 = (n + 1)/2</math> and <math>\sigma^2 = \mu_2 - \mu_1^2 = (n^2 -
1)/12</math>.
<span id="exam 10.2"/>
'''Example'''
Suppose now that <math>X</math> has range <math>\{0,1,2,3,\ldots,n\}</math> and <math>p_X(j) = {n \choose j} p^j
q^{n - j}</math> for <math>0 \leq j \leq n</math> (binomial distribution).  Then
<math display="block">
\begin{eqnarray*}
g(t) &=& \sum_{j = 0}^n e^{tj} {n \choose j} p^j q^{n - j} \\
    &=& \sum_{j = 0}^n {n \choose j} (pe^t)^j q^{n - j} \\
    &=& (pe^t + q)^n\ .
\end{eqnarray*}
</math>
Note that
<math display="block">
\begin{eqnarray*}
\mu_1 = g'(0) &=& \left. n(pe^t + q)^{n - 1}pe^t \right|_{t = 0} = np\ , \\
\mu_2 = g''(0) &=& n(n - 1)p^2 + np\ ,
\end{eqnarray*}
</math>
so that <math>\mu = \mu_1 = np</math>, and <math>\sigma^2 = \mu_2 - \mu_1^2 = np(1 - p)</math>, as
expected.
<span id="exam 10.3"/>
'''Example'''
Suppose <math>X</math> has range <math>\{1,2,3,\ldots\}</math> and <math>p_X(j) = q^{j - 1}p</math> for all <math>j</math>
(geometric distribution).  Then
<math display="block">
\begin{eqnarray*}
g(t) &=& \sum_{j = 1}^\infty e^{tj} q^{j - 1}p \\
    &=& \frac {pe^t}{1 - qe^t}\ .
\end{eqnarray*}
</math>
Here
<math display="block">
\begin{eqnarray*}
\mu_1 &=& g'(0) = \left. \frac {pe^t}{(1 - qe^t)^2} \right|_{t = 0} = \frac 1p\ , \\
\mu_2 &=& g''(0) = \left. \frac {pe^t + pqe^{2t}}{(1 - qe^t)^3} \right|_{t = 0} = \frac
{1 + q}{p^2}\ ,
\end{eqnarray*}
</math>
<math>\mu = \mu_1 = 1/p</math>, and <math>\sigma^2 = \mu_2 - \mu_1^2 = q/p^2</math>, as computed in
[[guide:E5be6e0c81#exam 6.21 |Example]].
<span id="exam 10.4"/>
'''Example'''
Let <math>X</math> have range <math>\{0,1,2,3,\ldots\}</math> and let <math>p_X(j) =
e^{-\lambda}\lambda^j/j!</math> for all <math>j</math> (Poisson distribution with mean <math>\lambda</math>).
Then
<math display="block">
\begin{eqnarray*}
g(t) &=& \sum_{j = 0}^\infty e^{tj} \frac {e^{-\lambda}\lambda^j}{j!} \\
    &=& e^{-\lambda} \sum_{j = 0}^\infty \frac {(\lambda e^t)^j}{j!} \\
    &=& e^{-\lambda} e^{\lambda e^t} = e^{\lambda(e^t - 1)}\ .
\end{eqnarray*}
</math>
Then
<math display="block">
\begin{eqnarray*}
\mu_1 &=& g'(0) = \left. e^{\lambda(e^t - 1)}\lambda e^t \right|_{t = 0} = \lambda\ ,\\
\mu_2 &=& g''(0) = \left. e^{\lambda(e^t - 1)} (\lambda^2 e^{2t} + \lambda e^t)
\right|_{t = 0} = \lambda^2 + \lambda\ ,
\end{eqnarray*}
</math>
<math>\mu = \mu_1 = \lambda</math>, and <math>\sigma^2 = \mu_2 - \mu_1^2 = \lambda</math>.
The variance of the Poisson distribution is easier to obtain in this way than directly
from the definition (as was done in Exercise \ref{sec [[guide:C631488f9a#exer 6.2.100 |6.2}.]]).
===Moment Problem===
Using the moment generating function, we can now show, at least in the case of
a discrete random variable with finite range, that its distribution function is
completely determined by its moments.
{{proofcard|Theorem|thm_10.2|Let <math>X</math> be a discrete random variable with finite range
<math>\{x_1,x_2,\ldots,\linebreak x_n\}</math>, distribution function <math>p</math>, and moment generating
function <math>g</math>.  Then <math>g</math> is uniquely determined by <math>p</math>, and conversely.\n|We know that <math>p</math> determines <math>g</math>, since
<math display="block">
g(t) = \sum_{j = 1}^n e^{tx_j} p(x_j)\ .
</math>
Conversely, assume that <math>g(t)</math> is known.  We wish to determine the values of <math>x_j</math> and <math>p(x_j)</math>, for <math>1 \le j \le n</math>.  We assume, without loss of generality, that <math>p(x_j)  >  0</math> for <math>1 \le j \le n</math>, and that
<math display="block">
x_1  <  x_2  <  \ldots  <  x_n\ .
</math>
We note that <math>g(t)</math> is differentiable for all <math>t</math>, since it is a finite linear combination of exponential functions.  If we compute <math>g'(t)/g(t)</math>, we obtain
<math display="block">
{{x_1 p(x_1) e^{tx_1} + \ldots + x_n p(x_n) e^{t x_n}}\over{p(x_1) e^{tx_1} + \ldots + p(x_n)e^{tx_n}}}\ .
</math>
Dividing both top and bottom by <math>e^{tx_n}</math>, we obtain the expression
<math display="block">
{{x_1 p(x_1) e^{t(x_1-x_n)} + \ldots + x_n p(x_n)}\over{p(x_1) e^{t(x_1 - x_n)} + \ldots + p(x_n)}}\ .
</math>
Since <math>x_n</math> is the largest of the <math>x_j</math>'s, this expression approaches <math>x_n</math> as <math>t</math> goes to <math>\infty</math>. So we have shown that
<math display="block">
x_n = \lim_{t \rightarrow \infty} {{g'(t)}\over{g(t)}}\ .
</math>
To find <math>p(x_n)</math>, we simply divide <math>g(t)</math> by <math>e^{tx_n}</math> and let <math>t</math> go to <math>\infty</math>.
Once <math>x_n</math> and <math>p(x_n)</math> have been determined, we can subtract <math>p(x_n) e^{tx_n}</math> from <math>g(t)</math>, and repeat the above procedure with the resulting function, obtaining, in turn, <math>x_{n-1}, \ldots, x_1</math> and <math>p(x_{n-1}), \ldots, p(x_1)</math>.}}
If we delete the hypothesis that <math>X</math> have finite range in the above theorem, then
the conclusion is no longer necessarily true.
===Ordinary Generating Functions===
In the special but important case where the <math>x_j</math> are all nonnegative integers,
<math>x_j = j</math>, we can prove this theorem in a simpler way.
In this case, we have
<math display="block">
g(t) = \sum_{j = 0}^n e^{tj} p(j)\ ,
</math>
and we see that <math>g(t)</math> is a ''polynomial'' in <math>e^t</math>.  If we write <math>z =
e^t</math>, and define the function <math>h</math> by
<math display="block">
h(z) = \sum_{j = 0}^n z^j p(j)\ ,
</math>
then <math>h(z)</math> is a polynomial in <math>z</math> containing the same information as <math>g(t)</math>,
and in fact
<math display="block">
\begin{eqnarray*}
h(z) &=& g(\log z)\ , \\
g(t) &=& h(e^t)\ .
\end{eqnarray*}
</math>
The function <math>h(z)</math> is often called the ''ordinary generating function'' for <math>X</math>.  Note that <math>h(1) = g(0) =
1</math>,
<math>h'(1) = g'(0) =
\mu_1</math>, and
<math>h''(1) = g''(0) - g'(0) = \mu_2 - \mu_1</math>.  It follows from all this that if we know
<math>g(t)</math>, then we know <math>h(z)</math>, and if we know <math>h(z)</math>, then we can find the <math>p(j)</math>
by Taylor's formula:
<math display="block">
\begin{eqnarray*}
p(j) &=& \mbox{coefficient~of}\,\, z^j \,\, \mbox{in}\,\, h(z) \\
    &=& \frac{h^{(j)}(0)}{j!}\ .
\end{eqnarray*}
</math>
For example, suppose we know that the moments of a certain discrete random
variable <math>X</math> are given by
<math display="block">
\begin{eqnarray*}
\mu_0 &=& 1\ , \\
\mu_k &=& \frac12 + \frac{2^k}4\ , \qquad \mbox{for}\,\, k \geq 1\ .
\end{eqnarray*}
</math>
Then the moment generating function <math>g</math> of <math>X</math> is
<math display="block">
\begin{eqnarray*}
g(t) &=& \sum_{k = 0}^\infty \frac{\mu_k t^k}{k!} \\
    &=& 1 + \frac12 \sum_{k = 1}^\infty \frac{t^k}{k!} + \frac14 \sum_{k =
1}^\infty \frac{(2t)^k}{k!} \\
    &=& \frac14 + \frac12 e^t + \frac14 e^{2t}\ .
\end{eqnarray*}
</math>
This is a polynomial in <math>z = e^t</math>, and
<math display="block">
h(z) = \frac14 + \frac12 z + \frac14 z^2\ .
</math>
Hence, <math>X</math> must have range <math>\{0,1,2\}</math>, and <math>p</math> must have values
<math>\{1/4,1/2,1/4\}</math>.
===Properties===
Both the moment generating function <math>g</math> and the ordinary generating function
<math>h</math> have many properties useful in the study of random variables, of which we
can consider only a few here.  In particular, if <math>X</math> is any discrete random
variable and <math>Y = X + a</math>, then
<math display="block">
\begin{eqnarray*}
g_Y(t) &=& E(e^{tY}) \\
      &=& E(e^{t(X + a)}) \\
      &=& e^{ta} E(e^{tX}) \\
      &=& e^{ta} g_X(t)\ ,
\end{eqnarray*}
</math>
while if <math>Y = bX</math>, then
<math display="block">
\begin{eqnarray*}
g_Y(t) &=& E(e^{tY}) \\
      &=& E(e^{tbX}) \\
      &=& g_X(bt)\ .
\end{eqnarray*}
</math>
In particular, if
<math display="block">
X^* = \frac{X - \mu}\sigma\ ,
</math>
then (see Exercise [[exercise:C5e148cb87 |Exercise]])
<math display="block">
g_{x^*}(t) = e^{-\mu t/\sigma} g_X\left( \frac t\sigma \right)\ .
</math>
If <math>X</math> and <math>Y</math> are ''independent'' random variables and <math>Z = X + Y</math> is
their sum, with <math>p_X</math>, <math>p_Y</math>, and <math>p_Z</math> the associated distribution functions, then
we have seen in [[guide:E5be6e0c81#chp 7 |Chapter]] that <math>p_Z</math> is the ''convolution'' of
<math>p_X</math> and <math>p_Y</math>, and we know that convolution involves a rather complicated
calculation.  But for the generating functions we have instead the simple
relations
<math display="block">
\begin{eqnarray*}
g_Z(t) &=& g_X(t) g_Y(t)\ , \\
h_Z(z) &=& h_X(z) h_Y(z)\ ,
\end{eqnarray*}
</math>
that is, <math>g_Z</math> is simply the ''product'' of <math>g_X</math> and <math>g_Y</math>, and similarly
for <math>h_Z</math>.
To see this, first note that if <math>X</math> and <math>Y</math> are independent, then <math>e^{tX}</math> and
<math>e^{tY}</math> are independent (see Exercise \ref{sec [[guide:D26a5cb8f7#exer 5.2.38 |5.2}.]]), and hence
<math display="block">
E(e^{tX} e^{tY}) = E(e^{tX}) E(e^{tY})\ .
</math>
It follows that
<math display="block">
\begin{eqnarray*}
g_Z(t) &=& E(e^{tZ}) = E(e^{t(X + Y)}) \\
      &=& E(e^{tX}) E(e^{tY}) \\
      &=& g_X(t) g_Y(t)\ ,
\end{eqnarray*}
</math>
and, replacing <math>t</math> by <math>\log z</math>, we also get
<math display="block">
h_Z(z) = h_X(z) h_Y(z)\ .
</math>
<span id="exam 10.5"/>
'''Example'''
If <math>X</math> and <math>Y</math> are independent discrete random variables with range
<math>\{0,1,2,\ldots,n\}</math> and binomial distribution
<math display="block">
p_X(j) = p_Y(j) = {n \choose j} p^j q^{n - j}\ ,
</math>
and if <math>Z = X + Y</math>, then we know (cf. Section \ref{sec 7.1}) that the range of <math>X</math> is
<math display="block">
\{0,1,2,\ldots,2n\}
</math>
and <math>X</math> has binomial distribution
<math display="block">
p_Z(j) = (p_X * p_Y)(j) =
{2n \choose j} p^j q^{2n - j}\ .
</math>
Here we can easily verify this result by using generating functions.  We know
that
<math display="block">
\begin{eqnarray*}
g_X(t) = g_Y(t) &=& \sum_{j = 0}^n e^{tj} {n \choose j} p^j q^{n - j} \\
      &=& (pe^t + q)^n\ ,
\end{eqnarray*}
</math>
and
<math display="block">
h_X(z) = h_Y(z) = (pz + q)^n\ .
</math>
Hence, we have
<math display="block">
g_Z(t) = g_X(t) g_Y(t) = (pe^t + q)^{2n}\ ,
</math>
or, what is the same,
<math display="block">
\begin{eqnarray*}
h_Z(z) &=& h_X(z) h_Y(z) = (pz + q)^{2n} \\
      &=& \sum_{j = 0}^{2n} {2n \choose j} (pz)^j q^{2n - j}\ ,
\end{eqnarray*}
</math>
from which we can see that the coefficient of <math>z^j</math> is just <math>p_Z(j) =
{2n \choose j} p^j q^{2n - j}</math>.
<span id="exam 10.6"/>
'''Example'''
If <math>X</math> and <math>Y</math> are independent discrete random variables with the non-negative
integers <math>\{0,1,2,3,\ldots\}</math> as range, and with geometric distribution function
<math display="block">
p_X(j) = p_Y(j) = q^j p\ ,
</math>
then
<math display="block">
g_X(t) = g_Y(t) = \frac p{1 - qe^t}\ ,
</math>
and if <math>Z = X + Y</math>, then
<math display="block">
\begin{eqnarray*}
g_Z(t) &=& g_X(t) g_Y(t) \\
      &=& \frac{p^2}{1 - 2qe^t + q^2 e^{2t}}\ .
\end{eqnarray*}
</math>
If we replace <math>e^t</math> by <math>z</math>, we get
<math display="block">
\begin{eqnarray*}
h_Z(z) &=& \frac{p^2}{(1 - qz)^2} \\
      &=& p^2 \sum_{k = 0}^\infty (k + 1) q^k z^k\ ,
\end{eqnarray*}
</math>
and we can read off the values of <math>p_Z(j)</math> as the coefficient of <math>z^j</math> in this
expansion for <math>h(z)</math>, even though <math>h(z)</math> is not a polynomial in this case.  The
distribution <math>p_Z</math> is a negative binomial distribution (see
Section \ref{sec 5.1}).
Here is a more interesting example of the power and scope of the method of
generating functions.
===Heads or Tails===
<span id="exam 10.1.7"/>
'''Example'''
In the coin-tossing game discussed in [[guide:4f3a4e96c3#exam 1.3 |Example]], we now consider the
question “When is Peter first in the lead?”
Let <math>X_k</math> describe the outcome of the <math>k</math>th trial in the game
<math display="block">
X_k = \left \{ \matrix{
              +1,  &\mbox{if}\,\, k{\rm th}\,\, \mbox{toss~is~heads}, \cr
              -1,  &\mbox{if}\,\, k{\rm th}\,\, \mbox{toss~is~tails.}\cr}\right.
</math>
Then the <math>X_k</math> are independent random variables describing a Bernoulli
process.  Let <math>S_0 = 0</math>, and, for <math>n \geq 1</math>, let
<math display="block">
S_n = X_1 + X_2 + \cdots + X_n\ .
</math>
Then <math>S_n</math> describes Peter's fortune after <math>n</math> trials, and Peter is first in
the lead after <math>n</math> trials if <math>S_k \leq 0</math> for <math>1 \leq k  <  n</math> and <math>S_n = 1</math>.
Now this can happen when <math>n = 1</math>, in which case <math>S_1 = X_1 = 1</math>, or when <math>n  >
1</math>, in which case <math>S_1 = X_1 = -1</math>.  In the latter case, <math>S_k = 0</math> for <math>k = n -
1</math>, and perhaps for other <math>k</math> between 1 and <math>n</math>.  Let <math>m</math> be the ''least''
such value of <math>k</math>; then <math>S_m = 0</math> and <math>S_k  <  0</math> for <math>1 \leq k  <  m</math>.  In this
case Peter loses on the first trial, regains his initial position in the next
<math>m - 1</math> trials, and gains the lead in the next <math>n - m</math> trials.
Let <math>p</math> be the probability that the coin comes up heads, and let <math>q = 1-p</math>.
Let <math>r_n</math> be the probability that Peter is first in the lead after <math>n</math> trials.
Then from the discussion above, we see that
<math display="block">
\begin{eqnarray*}
r_n &=& 0\ , \qquad \mbox{if}\,\, n\,\, \mbox{even}, \\
r_1 &=& p \qquad  (= \mbox{probability~of~heads~in~a~single~toss)}, \\
r_n &=& q(r_1r_{n-2} + r_3r_{n-4} +\cdots+ r_{n-2}r_1)\ , \qquad \mbox{if}
\ n  >  1,\ n\ \mbox{odd}.
\end{eqnarray*}
</math>
Now let <math>T</math> describe the time (that is, the number of trials) required for
Peter to take the lead.  Then <math>T</math> is a random variable, and since <math>P(T = n) =
r_n</math>, <math>r</math> is the distribution function for <math>T</math>.
We introduce the generating function <math>h_T(z)</math> for <math>T</math>:
<math display="block">
h_T(z) = \sum_{n = 0}^\infty r_n z^n\ .
</math>
Then, by using the relations above, we can verify the relation
<math display="block">
h_T(z) = pz + qz(h_T(z))^2\ .
</math>
If we solve this quadratic equation for <math>h_T(z)</math>, we get
<math display="block">
h_T(z) = \frac{1 \pm \sqrt{1 - 4pqz^2}}{2qz} = \frac{2pz}{1 \mp \sqrt{1 -
4pqz^2}}\ .
</math>
Of these two solutions, we want the one that has a convergent power series
in <math>z</math> (i.e., that is finite for <math>z = 0</math>).  Hence we choose
<math display="block">
h_T(z) = \frac{1 - \sqrt{1 - 4pqz^2}}{2qz} = \frac{2pz}{1 + \sqrt{1 - 4pqz^2}}\ .
</math>
Now we can ask: What is the probability that Peter is ''ever'' in the
lead?  This probability is given by (see Exercise [[exercise:5a5fe2185b |Exercise]])
<math display="block">
\begin{eqnarray*}
\sum_{n = 0}^\infty r_n &=& h_T(1) = \frac{1 - \sqrt{\mathstrut1 - 4pq}}{2q} \\
                      &=& \frac{1 - |p - q|}{2q} \\
                      &=& \left \{ \begin{array}{ll}
                                  p/q, & \mbox{if $p  <  q$},    \\
                                    1, & \mbox{if $p \geq q$}, \end{array}\right. 
\end{eqnarray*}
</math>
so that Peter is sure to be in the lead eventually if <math>p \geq q</math>.
How long will it take?  That is, what is the expected value of <math>T</math>?  This value
is given by
<math display="block">
E(T) = h_T'(1) = \left \{ \matrix {
                        1/(p - q),  & \mbox{if}\,\, p  >  q, \cr
                        \infty,    & \mbox{if}\,\, p = q.\cr}\right.
</math>
This says that if <math>p  >  q</math>, then Peter can expect to be in the lead by about
<math>1/(p - q)</math> trials, but if <math>p = q</math>, he can expect to wait a long time.
A related problem, known as the Gambler's Ruin problem, is studied in Exercise \ref{exer
11.2.22} and in Section \ref{sec 12.2}.
\exercises
==General references==
{{cite web |url=https://math.dartmouth.edu/~prob/prob/prob.pdf |title=Grinstead and Snell’s Introduction to Probability |last=Doyle |first=Peter G.|date=2006 |access-date=June 6, 2024}}

Revision as of 02:37, 9 June 2024

[math] \newcommand{\NA}{{\rm NA}} \newcommand{\mat}[1]{{\bf#1}} \newcommand{\exref}[1]{\ref{##1}} \newcommand{\secstoprocess}{\all} \newcommand{\NA}{{\rm NA}} \newcommand{\mathds}{\mathbb}[/math]

label{sec 10.1}


So far we have considered in detail only the two most important attributes of a random variable, namely, the mean and the variance. We have seen how these attributes enter into the fundamental limit theorems of probability, as well as into all sorts of practical calculations. We have seen that the mean and variance of a random variable contain important information about the random variable, or, more precisely, about the distribution function of that variable. Now we shall see that the mean and variance do not contain all the available information about the density function of a random variable. To begin with, it is easy to give examples of different distribution functions which have the same mean and the same variance. For instance, suppose [math]X[/math] and [math]Y[/math] are random variables, with distributions

[[math]] p_X = \pmatrix{ 1 & 2 & 3 & 4 & 5 & 6\cr 0 & 1/4 & 1/2 & 0 & 0 & 1/4\cr}, [[/math]]


[[math]] p_Y = \pmatrix{ 1 & 2 & 3 & 4 & 5 & 6\cr 1/4 & 0 & 0 & 1/2 & 1/4 & 0\cr}. [[/math]]

Then with these choices, we have [math]E(X) = E(Y) = 7/2[/math] and [math]V(X) = V(Y) = 9/4[/math], and yet certainly [math]p_X[/math] and [math]p_Y[/math] are quite different density functions. This raises a question: If [math]X[/math] is a random variable with range [math]\{x_1, x_2, \ldots\}[/math] of at most countable size, and distribution function [math]p = p_X[/math], and if we know its mean [math]\mu = E(X)[/math] and its variance [math]\sigma^2 = V(X)[/math], then what else do we need to know to determine [math]p[/math] completely?

Moments

A nice answer to this question, at least in the case that [math]X[/math] has finite range, can be given in terms of the moments of [math]X[/math], which are numbers defined as follows: \pagebreak[4]

[[math]] \begin{eqnarray*} \mu_k &=& k \mbox{th}\,\,\mbox{moment~of}\,\, X\\ &=& E(X^k) \\ &=& \sum_{j = 1}^\infty (x_j)^k p(x_j)\ , \end{eqnarray*} [[/math]]

provided the sum converges. Here [math]p(x_j) = P(X = x_j)[/math].


In terms of these moments, the mean [math]\mu[/math] and variance [math]\sigma^2[/math] of [math]X[/math] are given simply by

[[math]] \begin{eqnarray*} \mu &=& \mu_1, \\ \sigma^2 &=& \mu_2 - \mu_1^2\ , \end{eqnarray*} [[/math]]

so that a knowledge of the first two moments of [math]X[/math] gives us its mean and variance. But a knowledge of all the moments of [math]X[/math] determines its distribution function [math]p[/math] completely.

Moment Generating Functions

To see how this comes about, we introduce a new variable [math]t[/math], and define a function [math]g(t)[/math] as follows:

[[math]] \begin{eqnarray*} g(t) &=& E(e^{tX}) \\ &=& \sum_{k = 0}^\infty \frac {\mu_k t^k}{k!} \\ &=& E\left(\sum_{k = 0}^\infty \frac {X^k t^k}{k!} \right) \\ &=& \sum_{j = 1}^\infty e^{tx_j} p(x_j)\ . \end{eqnarray*} [[/math]]

We call [math]g(t)[/math] the moment generating function for [math]X[/math], and think of it as a convenient bookkeeping device for describing the moments of [math]X[/math]. Indeed, if we differentiate [math]g(t)[/math] [math]n[/math] times and then set [math]t = 0[/math], we get [math]\mu_n[/math]:

[[math]] \begin{eqnarray*} \left. \frac {d^n}{dt^n} g(t) \right|_{t = 0} &=& g^{(n)}(0) \\ &=& \left. \sum_{k = n}^\infty \frac {k!\, \mu_k t^{k - n}} {(k - n)!\, k!} \right|_{t = 0} \\ &=& \mu_n\ . \end{eqnarray*} [[/math]]

It is easy to calculate the moment generating function for simple examples.

Examples

Example Suppose [math]X[/math] has range [math]\{1,2,3,\ldots,n\}[/math] and [math]p_X(j) = 1/n[/math] for [math]1 \leq j \leq n[/math] (uniform distribution). Then

[[math]] \begin{eqnarray*} g(t) &=& \sum_{j = 1}^n \frac 1n e^{tj} \\ &=& \frac 1n (e^t + e^{2t} +\cdots+ e^{nt}) \\ &=& \frac {e^t (e^{nt} - 1)} {n (e^t - 1)}\ . \end{eqnarray*} [[/math]]

If we use the expression on the right-hand side of the second line above, then it is easy to see that

[[math]] \begin{eqnarray*} \mu_1 &=& g'(0) = \frac 1n (1 + 2 + 3 + \cdots + n) = \frac {n + 1}2, \\ \mu_2 &=& g''(0) = \frac 1n (1 + 4 + 9+ \cdots + n^2) = \frac {(n + 1)(2n + 1)}6\ , \end{eqnarray*} [[/math]]

and that [math]\mu = \mu_1 = (n + 1)/2[/math] and [math]\sigma^2 = \mu_2 - \mu_1^2 = (n^2 - 1)/12[/math].

Example Suppose now that [math]X[/math] has range [math]\{0,1,2,3,\ldots,n\}[/math] and [math]p_X(j) = {n \choose j} p^j q^{n - j}[/math] for [math]0 \leq j \leq n[/math] (binomial distribution). Then

[[math]] \begin{eqnarray*} g(t) &=& \sum_{j = 0}^n e^{tj} {n \choose j} p^j q^{n - j} \\ &=& \sum_{j = 0}^n {n \choose j} (pe^t)^j q^{n - j} \\ &=& (pe^t + q)^n\ . \end{eqnarray*} [[/math]]

Note that

[[math]] \begin{eqnarray*} \mu_1 = g'(0) &=& \left. n(pe^t + q)^{n - 1}pe^t \right|_{t = 0} = np\ , \\ \mu_2 = g''(0) &=& n(n - 1)p^2 + np\ , \end{eqnarray*} [[/math]]

so that [math]\mu = \mu_1 = np[/math], and [math]\sigma^2 = \mu_2 - \mu_1^2 = np(1 - p)[/math], as expected.

Example Suppose [math]X[/math] has range [math]\{1,2,3,\ldots\}[/math] and [math]p_X(j) = q^{j - 1}p[/math] for all [math]j[/math] (geometric distribution). Then

[[math]] \begin{eqnarray*} g(t) &=& \sum_{j = 1}^\infty e^{tj} q^{j - 1}p \\ &=& \frac {pe^t}{1 - qe^t}\ . \end{eqnarray*} [[/math]]

Here

[[math]] \begin{eqnarray*} \mu_1 &=& g'(0) = \left. \frac {pe^t}{(1 - qe^t)^2} \right|_{t = 0} = \frac 1p\ , \\ \mu_2 &=& g''(0) = \left. \frac {pe^t + pqe^{2t}}{(1 - qe^t)^3} \right|_{t = 0} = \frac {1 + q}{p^2}\ , \end{eqnarray*} [[/math]]

[math]\mu = \mu_1 = 1/p[/math], and [math]\sigma^2 = \mu_2 - \mu_1^2 = q/p^2[/math], as computed in Example.

Example Let [math]X[/math] have range [math]\{0,1,2,3,\ldots\}[/math] and let [math]p_X(j) = e^{-\lambda}\lambda^j/j![/math] for all [math]j[/math] (Poisson distribution with mean [math]\lambda[/math]). Then

[[math]] \begin{eqnarray*} g(t) &=& \sum_{j = 0}^\infty e^{tj} \frac {e^{-\lambda}\lambda^j}{j!} \\ &=& e^{-\lambda} \sum_{j = 0}^\infty \frac {(\lambda e^t)^j}{j!} \\ &=& e^{-\lambda} e^{\lambda e^t} = e^{\lambda(e^t - 1)}\ . \end{eqnarray*} [[/math]]

Then

[[math]] \begin{eqnarray*} \mu_1 &=& g'(0) = \left. e^{\lambda(e^t - 1)}\lambda e^t \right|_{t = 0} = \lambda\ ,\\ \mu_2 &=& g''(0) = \left. e^{\lambda(e^t - 1)} (\lambda^2 e^{2t} + \lambda e^t) \right|_{t = 0} = \lambda^2 + \lambda\ , \end{eqnarray*} [[/math]]

[math]\mu = \mu_1 = \lambda[/math], and [math]\sigma^2 = \mu_2 - \mu_1^2 = \lambda[/math].


The variance of the Poisson distribution is easier to obtain in this way than directly from the definition (as was done in Exercise \ref{sec 6.2}.).


Moment Problem

Using the moment generating function, we can now show, at least in the case of a discrete random variable with finite range, that its distribution function is completely determined by its moments.

Theorem

Let [math]X[/math] be a discrete random variable with finite range [math]\{x_1,x_2,\ldots,\linebreak x_n\}[/math], distribution function [math]p[/math], and moment generating function [math]g[/math]. Then [math]g[/math] is uniquely determined by [math]p[/math], and conversely.\n

Show Proof

We know that [math]p[/math] determines [math]g[/math], since

[[math]] g(t) = \sum_{j = 1}^n e^{tx_j} p(x_j)\ . [[/math]]
Conversely, assume that [math]g(t)[/math] is known. We wish to determine the values of [math]x_j[/math] and [math]p(x_j)[/math], for [math]1 \le j \le n[/math]. We assume, without loss of generality, that [math]p(x_j) \gt 0[/math] for [math]1 \le j \le n[/math], and that

[[math]] x_1 \lt x_2 \lt \ldots \lt x_n\ . [[/math]]
We note that [math]g(t)[/math] is differentiable for all [math]t[/math], since it is a finite linear combination of exponential functions. If we compute [math]g'(t)/g(t)[/math], we obtain

[[math]] {{x_1 p(x_1) e^{tx_1} + \ldots + x_n p(x_n) e^{t x_n}}\over{p(x_1) e^{tx_1} + \ldots + p(x_n)e^{tx_n}}}\ . [[/math]]
Dividing both top and bottom by [math]e^{tx_n}[/math], we obtain the expression

[[math]] {{x_1 p(x_1) e^{t(x_1-x_n)} + \ldots + x_n p(x_n)}\over{p(x_1) e^{t(x_1 - x_n)} + \ldots + p(x_n)}}\ . [[/math]]
Since [math]x_n[/math] is the largest of the [math]x_j[/math]'s, this expression approaches [math]x_n[/math] as [math]t[/math] goes to [math]\infty[/math]. So we have shown that

[[math]] x_n = \lim_{t \rightarrow \infty} {{g'(t)}\over{g(t)}}\ . [[/math]]
To find [math]p(x_n)[/math], we simply divide [math]g(t)[/math] by [math]e^{tx_n}[/math] and let [math]t[/math] go to [math]\infty[/math]. Once [math]x_n[/math] and [math]p(x_n)[/math] have been determined, we can subtract [math]p(x_n) e^{tx_n}[/math] from [math]g(t)[/math], and repeat the above procedure with the resulting function, obtaining, in turn, [math]x_{n-1}, \ldots, x_1[/math] and [math]p(x_{n-1}), \ldots, p(x_1)[/math].

If we delete the hypothesis that [math]X[/math] have finite range in the above theorem, then the conclusion is no longer necessarily true.

Ordinary Generating Functions

In the special but important case where the [math]x_j[/math] are all nonnegative integers, [math]x_j = j[/math], we can prove this theorem in a simpler way. In this case, we have

[[math]] g(t) = \sum_{j = 0}^n e^{tj} p(j)\ , [[/math]]

and we see that [math]g(t)[/math] is a polynomial in [math]e^t[/math]. If we write [math]z = e^t[/math], and define the function [math]h[/math] by

[[math]] h(z) = \sum_{j = 0}^n z^j p(j)\ , [[/math]]

then [math]h(z)[/math] is a polynomial in [math]z[/math] containing the same information as [math]g(t)[/math], and in fact

[[math]] \begin{eqnarray*} h(z) &=& g(\log z)\ , \\ g(t) &=& h(e^t)\ . \end{eqnarray*} [[/math]]

The function [math]h(z)[/math] is often called the ordinary generating function for [math]X[/math]. Note that [math]h(1) = g(0) = 1[/math], [math]h'(1) = g'(0) = \mu_1[/math], and [math]h''(1) = g''(0) - g'(0) = \mu_2 - \mu_1[/math]. It follows from all this that if we know [math]g(t)[/math], then we know [math]h(z)[/math], and if we know [math]h(z)[/math], then we can find the [math]p(j)[/math] by Taylor's formula:

[[math]] \begin{eqnarray*} p(j) &=& \mbox{coefficient~of}\,\, z^j \,\, \mbox{in}\,\, h(z) \\ &=& \frac{h^{(j)}(0)}{j!}\ . \end{eqnarray*} [[/math]]


For example, suppose we know that the moments of a certain discrete random variable [math]X[/math] are given by

[[math]] \begin{eqnarray*} \mu_0 &=& 1\ , \\ \mu_k &=& \frac12 + \frac{2^k}4\ , \qquad \mbox{for}\,\, k \geq 1\ . \end{eqnarray*} [[/math]]

Then the moment generating function [math]g[/math] of [math]X[/math] is

[[math]] \begin{eqnarray*} g(t) &=& \sum_{k = 0}^\infty \frac{\mu_k t^k}{k!} \\ &=& 1 + \frac12 \sum_{k = 1}^\infty \frac{t^k}{k!} + \frac14 \sum_{k = 1}^\infty \frac{(2t)^k}{k!} \\ &=& \frac14 + \frac12 e^t + \frac14 e^{2t}\ . \end{eqnarray*} [[/math]]

This is a polynomial in [math]z = e^t[/math], and

[[math]] h(z) = \frac14 + \frac12 z + \frac14 z^2\ . [[/math]]

Hence, [math]X[/math] must have range [math]\{0,1,2\}[/math], and [math]p[/math] must have values [math]\{1/4,1/2,1/4\}[/math].

Properties

Both the moment generating function [math]g[/math] and the ordinary generating function [math]h[/math] have many properties useful in the study of random variables, of which we can consider only a few here. In particular, if [math]X[/math] is any discrete random variable and [math]Y = X + a[/math], then

[[math]] \begin{eqnarray*} g_Y(t) &=& E(e^{tY}) \\ &=& E(e^{t(X + a)}) \\ &=& e^{ta} E(e^{tX}) \\ &=& e^{ta} g_X(t)\ , \end{eqnarray*} [[/math]]

while if [math]Y = bX[/math], then

[[math]] \begin{eqnarray*} g_Y(t) &=& E(e^{tY}) \\ &=& E(e^{tbX}) \\ &=& g_X(bt)\ . \end{eqnarray*} [[/math]]

In particular, if

[[math]] X^* = \frac{X - \mu}\sigma\ , [[/math]]

then (see Exercise Exercise)

[[math]] g_{x^*}(t) = e^{-\mu t/\sigma} g_X\left( \frac t\sigma \right)\ . [[/math]]

If [math]X[/math] and [math]Y[/math] are independent random variables and [math]Z = X + Y[/math] is their sum, with [math]p_X[/math], [math]p_Y[/math], and [math]p_Z[/math] the associated distribution functions, then we have seen in Chapter that [math]p_Z[/math] is the convolution of [math]p_X[/math] and [math]p_Y[/math], and we know that convolution involves a rather complicated calculation. But for the generating functions we have instead the simple relations

[[math]] \begin{eqnarray*} g_Z(t) &=& g_X(t) g_Y(t)\ , \\ h_Z(z) &=& h_X(z) h_Y(z)\ , \end{eqnarray*} [[/math]]

that is, [math]g_Z[/math] is simply the product of [math]g_X[/math] and [math]g_Y[/math], and similarly for [math]h_Z[/math]. To see this, first note that if [math]X[/math] and [math]Y[/math] are independent, then [math]e^{tX}[/math] and [math]e^{tY}[/math] are independent (see Exercise \ref{sec 5.2}.), and hence

[[math]] E(e^{tX} e^{tY}) = E(e^{tX}) E(e^{tY})\ . [[/math]]

It follows that

[[math]] \begin{eqnarray*} g_Z(t) &=& E(e^{tZ}) = E(e^{t(X + Y)}) \\ &=& E(e^{tX}) E(e^{tY}) \\ &=& g_X(t) g_Y(t)\ , \end{eqnarray*} [[/math]]

and, replacing [math]t[/math] by [math]\log z[/math], we also get

[[math]] h_Z(z) = h_X(z) h_Y(z)\ . [[/math]]

Example If [math]X[/math] and [math]Y[/math] are independent discrete random variables with range [math]\{0,1,2,\ldots,n\}[/math] and binomial distribution

[[math]] p_X(j) = p_Y(j) = {n \choose j} p^j q^{n - j}\ , [[/math]]

and if [math]Z = X + Y[/math], then we know (cf. Section \ref{sec 7.1}) that the range of [math]X[/math] is

[[math]] \{0,1,2,\ldots,2n\} [[/math]]

and [math]X[/math] has binomial distribution

[[math]] p_Z(j) = (p_X * p_Y)(j) = {2n \choose j} p^j q^{2n - j}\ . [[/math]]

Here we can easily verify this result by using generating functions. We know that

[[math]] \begin{eqnarray*} g_X(t) = g_Y(t) &=& \sum_{j = 0}^n e^{tj} {n \choose j} p^j q^{n - j} \\ &=& (pe^t + q)^n\ , \end{eqnarray*} [[/math]]

and

[[math]] h_X(z) = h_Y(z) = (pz + q)^n\ . [[/math]]

Hence, we have

[[math]] g_Z(t) = g_X(t) g_Y(t) = (pe^t + q)^{2n}\ , [[/math]]

or, what is the same,

[[math]] \begin{eqnarray*} h_Z(z) &=& h_X(z) h_Y(z) = (pz + q)^{2n} \\ &=& \sum_{j = 0}^{2n} {2n \choose j} (pz)^j q^{2n - j}\ , \end{eqnarray*} [[/math]]

from which we can see that the coefficient of [math]z^j[/math] is just [math]p_Z(j) = {2n \choose j} p^j q^{2n - j}[/math].

Example If [math]X[/math] and [math]Y[/math] are independent discrete random variables with the non-negative integers [math]\{0,1,2,3,\ldots\}[/math] as range, and with geometric distribution function

[[math]] p_X(j) = p_Y(j) = q^j p\ , [[/math]]

then

[[math]] g_X(t) = g_Y(t) = \frac p{1 - qe^t}\ , [[/math]]

and if [math]Z = X + Y[/math], then

[[math]] \begin{eqnarray*} g_Z(t) &=& g_X(t) g_Y(t) \\ &=& \frac{p^2}{1 - 2qe^t + q^2 e^{2t}}\ . \end{eqnarray*} [[/math]]

If we replace [math]e^t[/math] by [math]z[/math], we get

[[math]] \begin{eqnarray*} h_Z(z) &=& \frac{p^2}{(1 - qz)^2} \\ &=& p^2 \sum_{k = 0}^\infty (k + 1) q^k z^k\ , \end{eqnarray*} [[/math]]

and we can read off the values of [math]p_Z(j)[/math] as the coefficient of [math]z^j[/math] in this expansion for [math]h(z)[/math], even though [math]h(z)[/math] is not a polynomial in this case. The distribution [math]p_Z[/math] is a negative binomial distribution (see Section \ref{sec 5.1}).

Here is a more interesting example of the power and scope of the method of generating functions.

Heads or Tails

Example In the coin-tossing game discussed in Example, we now consider the question “When is Peter first in the lead?”


Let [math]X_k[/math] describe the outcome of the [math]k[/math]th trial in the game

[[math]] X_k = \left \{ \matrix{ +1, &\mbox{if}\,\, k{\rm th}\,\, \mbox{toss~is~heads}, \cr -1, &\mbox{if}\,\, k{\rm th}\,\, \mbox{toss~is~tails.}\cr}\right. [[/math]]

Then the [math]X_k[/math] are independent random variables describing a Bernoulli process. Let [math]S_0 = 0[/math], and, for [math]n \geq 1[/math], let

[[math]] S_n = X_1 + X_2 + \cdots + X_n\ . [[/math]]

Then [math]S_n[/math] describes Peter's fortune after [math]n[/math] trials, and Peter is first in the lead after [math]n[/math] trials if [math]S_k \leq 0[/math] for [math]1 \leq k \lt n[/math] and [math]S_n = 1[/math].


Now this can happen when [math]n = 1[/math], in which case [math]S_1 = X_1 = 1[/math], or when [math]n \gt 1[/math], in which case [math]S_1 = X_1 = -1[/math]. In the latter case, [math]S_k = 0[/math] for [math]k = n - 1[/math], and perhaps for other [math]k[/math] between 1 and [math]n[/math]. Let [math]m[/math] be the least such value of [math]k[/math]; then [math]S_m = 0[/math] and [math]S_k \lt 0[/math] for [math]1 \leq k \lt m[/math]. In this case Peter loses on the first trial, regains his initial position in the next [math]m - 1[/math] trials, and gains the lead in the next [math]n - m[/math] trials.


Let [math]p[/math] be the probability that the coin comes up heads, and let [math]q = 1-p[/math]. Let [math]r_n[/math] be the probability that Peter is first in the lead after [math]n[/math] trials. Then from the discussion above, we see that

[[math]] \begin{eqnarray*} r_n &=& 0\ , \qquad \mbox{if}\,\, n\,\, \mbox{even}, \\ r_1 &=& p \qquad (= \mbox{probability~of~heads~in~a~single~toss)}, \\ r_n &=& q(r_1r_{n-2} + r_3r_{n-4} +\cdots+ r_{n-2}r_1)\ , \qquad \mbox{if} \ n \gt 1,\ n\ \mbox{odd}. \end{eqnarray*} [[/math]]

Now let [math]T[/math] describe the time (that is, the number of trials) required for Peter to take the lead. Then [math]T[/math] is a random variable, and since [math]P(T = n) = r_n[/math], [math]r[/math] is the distribution function for [math]T[/math]. We introduce the generating function [math]h_T(z)[/math] for [math]T[/math]:

[[math]] h_T(z) = \sum_{n = 0}^\infty r_n z^n\ . [[/math]]

Then, by using the relations above, we can verify the relation

[[math]] h_T(z) = pz + qz(h_T(z))^2\ . [[/math]]

If we solve this quadratic equation for [math]h_T(z)[/math], we get

[[math]] h_T(z) = \frac{1 \pm \sqrt{1 - 4pqz^2}}{2qz} = \frac{2pz}{1 \mp \sqrt{1 - 4pqz^2}}\ . [[/math]]

Of these two solutions, we want the one that has a convergent power series in [math]z[/math] (i.e., that is finite for [math]z = 0[/math]). Hence we choose

[[math]] h_T(z) = \frac{1 - \sqrt{1 - 4pqz^2}}{2qz} = \frac{2pz}{1 + \sqrt{1 - 4pqz^2}}\ . [[/math]]

Now we can ask: What is the probability that Peter is ever in the lead? This probability is given by (see Exercise Exercise)

[[math]] \begin{eqnarray*} \sum_{n = 0}^\infty r_n &=& h_T(1) = \frac{1 - \sqrt{\mathstrut1 - 4pq}}{2q} \\ &=& \frac{1 - |p - q|}{2q} \\ &=& \left \{ \begin{array}{ll} p/q, & \mbox{if $p \lt q$}, \\ 1, & \mbox{if $p \geq q$}, \end{array}\right. \end{eqnarray*} [[/math]]

so that Peter is sure to be in the lead eventually if [math]p \geq q[/math].


How long will it take? That is, what is the expected value of [math]T[/math]? This value is given by

[[math]] E(T) = h_T'(1) = \left \{ \matrix { 1/(p - q), & \mbox{if}\,\, p \gt q, \cr \infty, & \mbox{if}\,\, p = q.\cr}\right. [[/math]]

This says that if [math]p \gt q[/math], then Peter can expect to be in the lead by about [math]1/(p - q)[/math] trials, but if [math]p = q[/math], he can expect to wait a long time.


A related problem, known as the Gambler's Ruin problem, is studied in Exercise \ref{exer 11.2.22} and in Section \ref{sec 12.2}.

\exercises


General references

Doyle, Peter G. (2006). "Grinstead and Snell's Introduction to Probability" (PDF). Retrieved June 6, 2024.