guide:31815919f9: Difference between revisions

From Stochiki
No edit summary
 
No edit summary
Line 1: Line 1:
<div class="d-none"><math>
\newcommand{\NA}{{\rm NA}}
\newcommand{\mat}[1]{{\bf#1}}
\newcommand{\exref}[1]{\ref{##1}}
\newcommand{\secstoprocess}{\all}
\newcommand{\NA}{{\rm NA}}
\newcommand{\mathds}{\mathbb}</math></div>
label{sec
10.3} In the previous section, we introduced
the concepts of moments and moment generating functions for discrete random variables.  These
concepts have natural analogues for continuous random variables, provided some care is taken
in arguments involving convergence.


===Moments===
If <math>X</math> is a continuous random variable defined on the probability
space <math>\Omega</math>, with density function <math>f_X</math>, then we define the <math>n</math>th moment
of <math>X</math> by the formula
<math display="block">
\mu_n = E(X^n) = \int_{-\infty}^{+\infty} x^n f_X(x)\, dx\ ,
</math>
provided the integral
<math display="block">
\mu_n = E(X^n) = \int_{-\infty}^{+\infty} |x|^n f_X(x)\, dx\ ,
</math>
is finite.  Then, just as in the discrete case, we see
that <math>\mu_0 = 1</math>, <math>\mu_1 = \mu</math>, and <math>\mu_2 - \mu_1^2 = \sigma^2</math>.
===Moment Generating Functions===
Now we define the ''moment generating function'' <math>g(t)</math> for <math>X</math> by the
formula
<math display="block">
\begin{eqnarray*}
g(t) &=& \sum_{k = 0}^\infty \frac{\mu_k t^k}{k!} = \sum_{k = 0}^\infty
\frac{E(X^k) t^k}{k!} \\
    &=& E(e^{tX}) = \int_{-\infty}^{+\infty} e^{tx} f_X(x)\, dx\ ,
\end{eqnarray*}
</math>
provided this series converges.  Then, as before, we have
<math display="block">
\mu_n = g^{(n)}(0)\ .
</math>
===Examples===
<span id="exam 10.3.1"/>
'''Example'''
Let <math>X</math> be a continuous random variable with range <math>[0,1]</math> and density
function <math>f_X(x) = 1</math> for <math>0 \leq x \leq 1</math> (uniform density).  Then
<math display="block">
\mu_n = \int_0^1 x^n\, dx = \frac1{n + 1}\ ,
</math>
and
<math display="block">
\begin{eqnarray*}
g(t) &=& \sum_{k = 0}^\infty \frac{t^k}{(k+1)!}\\
    &=& \frac{e^t - 1}t\ .
\end{eqnarray*}
</math>
Here the series converges for all <math>t</math>.  Alternatively, we have
<math display="block">
\begin{eqnarray*}
g(t) &=& \int_{-\infty}^{+\infty} e^{tx} f_X(x)\, dx \\
    &=& \int_0^1 e^{tx}\, dx = \frac{e^t - 1}t\ .
\end{eqnarray*}
</math>
Then (by L'H\^opital's rule)
<math display="block">
\begin{eqnarray*}
\mu_0 &=& g(0) = \lim_{t \to 0} \frac{e^t - 1}t = 1\ , \\
\mu_1 &=& g'(0) = \lim_{t \to 0} \frac{te^t - e^t + 1}{t^2} = \frac12\ , \\
\mu_2 &=& g''(0) = \lim_{t \to 0} \frac{t^3e^t - 2t^2e^t + 2te^t - 2t}{t^4} =
\frac13\ .
\end{eqnarray*}
</math>
In particular, we verify that <math>\mu = g'(0) = 1/2</math> and
<math display="block">
\sigma^2 = g''(0) - (g'(0))^2 = \frac13 - \frac14 = \frac1{12}
</math>
as before (see [[guide:E5be6e0c81#exam 6.18.5 |Example]]).
<span id="exam 10.3.2"/>
'''Example'''
Let <math>X</math> have range <math>[\,0,\infty)</math> and density function <math>f_X(x) = \lambda
e^{-\lambda x}</math> (exponential density with parameter <math>\lambda</math>).  In this case
<math display="block">
\begin{eqnarray*}
\mu_n &=& \int_0^\infty x^n \lambda e^{-\lambda x}\, dx = \lambda(-1)^n
\frac{d^n}{d\lambda^n} \int_0^\infty e^{-\lambda x}\, dx \\
    &=& \lambda(-1)^n \frac{d^n}{d\lambda^n} [\frac1\lambda] = \frac{n!}
{\lambda^n}\ ,
\end{eqnarray*}
</math>
and
<math display="block">
\begin{eqnarray*}
g(t) &=& \sum_{k = 0}^\infty \frac{\mu_k t^k}{k!} \\
    &=& \sum_{k = 0}^\infty [\frac t\lambda]^k = \frac\lambda{\lambda - t}\ .
\end{eqnarray*}
</math>
Here the series converges only for <math>|t|  <  \lambda</math>.  Alternatively, we have
<math display="block">
\begin{eqnarray*}
g(t) &=& \int_0^\infty e^{tx} \lambda e^{-\lambda x}\, dx \\
    &=& \left.\frac{\lambda e^{(t - \lambda)x}}{t - \lambda}\right|_0^\infty =
\frac\lambda{\lambda - t}\ .
\end{eqnarray*}
</math>
Now we can verify directly that
<math display="block">
\mu_n = g^{(n)}(0) = \left.\frac{\lambda n!}{(\lambda - t)^{n + 1}}\right|_{t =
0} = \frac{n!}{\lambda^n}\ .
</math>
<span id="exam 10.3.3"/>
'''Example'''
Let <math>X</math> have range <math>(-\infty,+\infty)</math> and density function
<math display="block">
f_X(x) = \frac1{\sqrt{2\pi}} e^{-x^2/2}
</math>
(normal density).  In this case we have
<math display="block">
\begin{eqnarray*}
\mu_n &=& \frac1{\sqrt{2\pi}} \int_{-\infty}^{+\infty} x^n e^{-x^2/2}\, dx \\
    &=& \left \{ \begin{array}{ll}
                      \frac{(2m)!}{2^{m} m!}, & \mbox{if $ n = 2m$,}\cr
                      0,                      & \mbox{if $ n = 2m+1$.}\end{array}\right.
\end{eqnarray*}
</math>
(These moments are calculated by integrating once by parts to show that <math>\mu_n
= (n - 1)\mu_{n - 2}</math>, and observing that <math>\mu_0 = 1</math> and <math>\mu_1 = 0</math>.)  Hence,
<math display="block">
\begin{eqnarray*}
g(t) &=& \sum_{n = 0}^\infty \frac{\mu_n t^n}{n!} \\
    &=& \sum_{m = 0}^\infty \frac{t^{2m}}{2^{m} m!} = e^{t^2/2}\ .
\end{eqnarray*}
</math>
This series converges for all values of <math>t</math>.  Again we can verify that
<math>g^{(n)}(0) = \mu_n</math>.
Let <math>X</math> be a normal random variable with parameters <math>\mu</math> and <math>\sigma</math>.  It is easy
to show that the moment generating function of <math>X</math> is given by
<math display="block">
e^{t\mu + (\sigma^2/2)t^2}\ .
</math>
Now suppose that <math>X</math> and <math>Y</math> are two independent normal random variables with
parameters <math>\mu_1</math>, <math>\sigma_1</math>, and <math>\mu_2</math>, <math>\sigma_2</math>, respectively.  Then,
the product of the moment generating functions of <math>X</math> and <math>Y</math> is
<math display="block">
e^{t(\mu_1 + \mu_2) + ((\sigma_1^2 + \sigma_2^2)/2)t^2}\ .
</math>
This is the moment generating function for a normal random variable with mean
<math>\mu_1 + \mu_2</math> and variance <math>\sigma_1^2 + \sigma_2^2</math>.  Thus, the sum
of two independent normal random variables is again normal.  (This was proved
for the special case that both summands are standard normal in Example \ref{exam
7.8}.)
In general, the series defining <math>g(t)</math> will not converge for all <math>t</math>.  But in
the important special case where <math>X</math> is bounded (i.e., where the range of <math>X</math>
is contained in a finite interval), we can show that the series does converge
for all <math>t</math>.
{{proofcard|Theorem|thm_10.4|Suppose <math>X</math> is a continuous random variable with range contained in the
interval <math>[-M,M]</math>.  Then the series
<math display="block">
g(t) = \sum_{k = 0}^\infty \frac{\mu_k t^k}{k!}
</math>
converges for all <math>t</math> to an infinitely differentiable function <math>g(t)</math>, and
<math>g^{(n)}(0) = \mu_n</math>.\n|We have
<math display="block">
\mu_k = \int_{-M}^{+M} x^k f_X(x)\, dx\ ,
</math>
so
<math display="block">
\begin{eqnarray*}
|\mu_k| &\leq& \int_{-M}^{+M} |x|^k f_X(x)\, dx \\
      &\leq& M^k \int_{-M}^{+M} f_X(x)\, dx = M^k\ .
\end{eqnarray*}
</math>
Hence, for all <math>N</math> we have
<math display="block">
\sum_{k = 0}^N \left|\frac{\mu_k t^k}{k!}\right| \leq \sum_{k = 0}^N
\frac{(M|t|)^k}{k!} \leq e^{M|t|}\ ,
</math>
which shows that the power series converges for all <math>t</math>.  We know that the sum
of a convergent power series is always differentiable.}}
===Moment Problem===
{{proofcard|Theorem|thm_10.5|If <math>X</math> is a bounded random variable, then the moment generating function
<math>g_X(t)</math> of <math>x</math> determines the density function <math>f_X(x)</math> uniquely.
''Sketch of the Proof.''                                             
We know that
<math display="block">
\begin{eqnarray*}
g_X(t) &=& \sum_{k = 0}^\infty \frac{\mu_k t^k}{k!} \\
      &=& \int_{-\infty}^{+\infty} e^{tx} f(x)\, dx\ .
\end{eqnarray*}
</math>
If we replace <math>t</math> by <math>i\tau</math>, where <math>\tau</math> is real and <math>i = \sqrt{-1}</math>, then
the series converges for all <math>\tau</math>, and we can define the function
<math display="block">
k_X(\tau) = g_X(i\tau) = \int_{-\infty}^{+\infty} e^{i\tau x} f_X(x)\, dx\ .
</math>
The function <math>k_X(\tau)</math> is called the ''characteristic function'' of <math>X</math>, and is defined by the above equation even when the series for <math>g_X</math> does not
converge.  This equation says that <math>k_X</math> is the ''Fourier transform'' of <math>f_X</math>.  It is known that the Fourier transform has an inverse, given by the
formula
<math display="block">
f_X(x) = \frac1{2\pi} \int_{-\infty}^{+\infty} e^{-i\tau x} k_X(\tau)\, d\tau\ ,
</math>
suitably interpreted.<ref group="Notes" >H. Dym and H. P. McKean, ''Fourier Series and
Integrals'' (New York: Academic Press, 1972).</ref>  Here we see that the
characteristic function <math>k_X</math>, and hence the moment generating function <math>g_X</math>,
determines the density function <math>f_X</math> uniquely under our hypotheses.|}}
===Sketch of the Proof of the Central Limit Theorem===
With the above result in mind, we can now sketch a proof of the Central Limit Theorem
for bounded continuous random variables (see [[guide:452fd94468#thm 9.4.7 |Theorem]]).  To this end,
let <math>X</math> be a continuous random variable with density function <math>f_X</math>, mean <math>\mu
= 0</math> and variance <math>\sigma^2 = 1</math>, and moment generating function <math>g(t)</math> defined
by its series for all <math>t</math>.  Let <math>X_1</math>, <math>X_2</math>, \ldots, <math>X_n</math> be an independent
trials process with each <math>X_i</math> having density <math>f_X</math>, and let <math>S_n = X_1 + X_2
+\cdots+ X_n</math>, and <math>S_n^* = (S_n - n\mu)/\sqrt{n\sigma^2} = S_n/\sqrt n</math>.  Then
each <math>X_i</math> has moment generating function <math>g(t)</math>, and since the <math>X_i</math> are
independent, the sum <math>S_n</math>, just as in the discrete case (see Section \ref{sec 10.1}),
has moment generating function
<math display="block">
g_n(t) = (g(t))^n\ ,
</math>
and the standardized sum <math>S_n^*</math> has moment generating function
<math display="block">
g_n^*(t) = \left(g\left(\frac t{\sqrt n}\right)\right)^n\ .
</math>
We now show that, as <math>n \to \infty</math>, <math>g_n^*(t) \to e^{t^2/2}</math>, where
<math>e^{t^2/2}</math> is the moment generating function of the normal density <math>n(x) =
(1/\sqrt{2\pi}) e^{-x^2/2}</math> (see [[#exam 10.3.3 |Example]]).
To show this, we set <math>u(t) = \log g(t)</math>, and
<math display="block">
\begin{eqnarray*}
u_n^*(t) &=& \log g_n^*(t) \\
        &=& n\log g\left(\frac t{\sqrt n}\right) = nu\left(\frac t{\sqrt n}\right)\ ,
\end{eqnarray*}
</math>
and show that <math>u_n^*(t) \to t^2/2</math> as <math>n \to \infty</math>.  First we note that
<math display="block">
\begin{eqnarray*}
u(0) &=& \log g_n(0) = 0\ , \\
u'(0) &=& \frac{g'(0)}{g(0)} = \frac{\mu_1}1 = 0\ , \\
u''(0) &=& \frac{g''(0)g(0) - (g'(0))^2}{(g(0))^2} \\
      &=& \frac{\mu_2 - \mu_1^2}1 = \sigma^2 = 1\ .
\end{eqnarray*}
</math>
Now by using L'H\^opital's rule twice, we get
<math display="block">
\begin{eqnarray*}
\lim_{n \to \infty} u_n^*(t) &=& \lim_{s \to \infty} \frac{u(t/\sqrt s)}{s^{-1}}\\
    &=& \lim_{s \to \infty} \frac{u'(t/\sqrt s) t}{2s^{-1/2}} \\
    &=& \lim_{s \to \infty} u''\left(\frac t{\sqrt s}\right) \frac{t^2}2 = \sigma^2
\frac{t^2}2 = \frac{t^2}2\ .
\end{eqnarray*}
</math>
Hence, <math>g_n^*(t) \to e^{t^2/2}</math> as <math>n \to \infty</math>.  Now to complete the proof
of the Central Limit Theorem, we must show that if <math>g_n^*(t) \to e^{t^2/2}</math>,
then under our hypotheses the distribution functions <math>F_n^*(x)</math> of the <math>S_n^*</math>
must converge to the distribution function <math>F_N^*(x)</math> of the normal
variable <math>N</math>; that is, that
<math display="block">
F_n^*(a) = P(S_n^* \leq a) \to \frac1{\sqrt{2\pi}} \int_{-\infty}^a
e^{-x^2/2}\, dx\ ,
</math>
and furthermore, that the density functions <math>f_n^*(x)</math> of the <math>S_n^*</math> must
converge to the density function for <math>N</math>; that is, that
<math display="block">
f_n^*(x) \to \frac1{\sqrt{2\pi}} e^{-x^2/2}\ ,
</math>
as <math>n \rightarrow \infty</math>.
Since the densities, and hence the distributions, of the <math>S_n^*</math> are uniquely
determined by their moment generating functions under our hypotheses, these
conclusions are certainly plausible, but their proofs involve a detailed
examination of characteristic functions and Fourier transforms, and we shall
not attempt them here.
In the same way, we can prove the Central Limit Theorem for bounded discrete
random variables with integer values (see [[guide:4add108640#thm 9.3.6 |Theorem]]).  Let <math>X</math> be a
discrete random variable with density function <math>p(j)</math>, mean <math>\mu = 0</math>, variance
<math>\sigma^2 = 1</math>, and moment generating function <math>g(t)</math>, and let <math>X_1</math>, <math>X_2</math>,
\ldots, <math>X_n</math> form an independent trials process with common density <math>p</math>.  Let
<math>S_n = X_1 + X_2 +\cdots+ X_n</math> and <math>S_n^* = S_n/\sqrt n</math>, with densities
<math>p_n</math> and <math>p_n^*</math>, and moment generating functions <math>g_n(t)</math> and
<math>g_n^*(t) = \left(g(\frac t{\sqrt n})\right)^n.</math>
Then we have
<math display="block">
g_n^*(t) \to e^{t^2/2}\ ,
</math>
just as in the continuous case, and this implies in the same way that the
distribution functions <math>F_n^*(x)</math> converge to the normal distribution; that is,
that
<math display="block">
F_n^*(a) = P(S_n^* \leq a) \to \frac1{\sqrt{2\pi}} \int_{-\infty}^a
e^{-x^2/2}\, dx\ ,
</math>
as <math>n \rightarrow \infty</math>.
The corresponding statement about the distribution functions <math>p_n^*</math>, however,
requires a little extra care (see [[guide:4add108640#thm 9.3.5 |Theorem]]).  The trouble arises
because the distribution <math>p(x)</math> is not defined for all <math>x</math>, but only for
integer <math>x</math>.  It follows that the distribution <math>p_n^*(x)</math> is defined only for <math>x</math> of
the form <math>j/\sqrt n</math>, and these values change as <math>n</math> changes.
We can fix this, however, by introducing the function <math>\bar p(x)</math>, defined
by the formula
<math display="block">
\bar p(x) =  \left \{ \begin{array}{ll}
                      p(j), & \mbox{if $j - 1/2 \leq x  <  j + 1/2$,} \cr
                        0\ , & \mbox{otherwise}.\end{array}\right.
</math>
Then <math>\bar p(x)</math> is defined for all <math>x</math>, <math>\bar p(j) = p(j)</math>, and the
graph of <math>\bar p(x)</math> is the step function for the distribution <math>p(j)</math> (see Figure 3
of Section \ref{sec 9.1}).
In the same way we introduce the step function <math>\bar p_n(x)</math> and
<math>\bar p_n^*(x)</math> associated with the distributions <math>p_n</math> and <math>p_n^*</math>, and their
moment generating functions <math>\bar g_n(t)</math> and <math>\bar g_n^*(t)</math>.  If we
can show that <math>\bar g_n^*(t) \to e^{t^2/2}</math>, then we can conclude that
<math display="block">
\bar p_n^*(x) \to \frac1{\sqrt{2\pi}} e^{t^2/2}\ ,
</math>
as <math>n \rightarrow \infty</math>, for all <math>x</math>, a conclusion strongly suggested by
Figure \ref{fig 9.2}.
Now <math>\bar g(t)</math> is given by
<math display="block">
\begin{eqnarray*}
\bar g(t) &=& \int_{-\infty}^{+\infty} e^{tx} \bar p(x)\, dx \\
              &=& \sum_{j = -N}^{+N} \int_{j - 1/2}^{j + 1/2} e^{tx} p(j)\, dx\\
              &=& \sum_{j = -N}^{+N} p(j) e^{tj} \frac{e^{t/2} - e^{-t/2}}
{2t/2} \\
              &=& g(t) \frac{\sinh(t/2)}{t/2}\ ,
\end{eqnarray*}
</math>
where we have put
<math display="block">
\sinh(t/2) = \frac{e^{t/2} - e^{-t/2}}2\ .
</math>
In the same way, we find that
<math display="block">
\begin{eqnarray*}
\bar g_n(t)  &=& g_n(t) \frac{\sinh(t/2)}{t/2}\ , \\
\bar g_n^*(t) &=& g_n^*(t) \frac{\sinh(t/2\sqrt n)}{t/2\sqrt n}\ .
\end{eqnarray*}
</math>
Now, as <math>n \to \infty</math>, we know that <math>g_n^*(t) \to e^{t^2/2}</math>, and, by
L'H\^opital's rule,
<math display="block">
\lim_{n \to \infty} \frac{\sinh(t/2\sqrt n)}{t/2\sqrt n} = 1\ .
</math>
It follows that
<math display="block">
\bar g_n^*(t) \to e^{t^2/2}\ ,
</math>
and hence that
<math display="block">
\bar p_n^*(x) \to \frac1{\sqrt{2\pi}} e^{-x^2/2}\ ,
</math>
as <math>n \rightarrow \infty</math>.
The astute reader will note that in this sketch of the proof of [[guide:4add108640#thm 9.3.5 |Theorem]],
we never made use of the hypothesis that the greatest common divisor of the
differences of all the values that the <math>X_i</math> can take on is 1.  This is a technical
point that we choose to ignore.  A complete proof may be found in Gnedenko and
Kolmogorov.<ref group="Notes" >B. V. Gnedenko and A. N. Kolomogorov, ''Limit Distributions
for Sums of Independent Random Variables'' (Reading: Addison-Wesley, 1968), p. 233.</ref>
===Cauchy Density===
The characteristic function of a continuous density is a useful tool even in
cases when the moment series does not converge, or even in cases when the
moments themselves are not finite.  As an example, consider the Cauchy density
with parameter <math>a = 1</math> (see [[guide:D26a5cb8f7#exam 5.20 |Example]])
<math display="block">
f(x) = \frac1{\pi(1 + x^2)}\ .
</math>
If <math>X</math> and <math>Y</math> are independent random variables with Cauchy density <math>f(x)</math>,
then the average <math>Z = (X + Y)/2</math> also has Cauchy density <math>f(x)</math>, that is,
<math display="block">
f_Z(x) = f(x)\ .
</math>
This is hard to check directly, but easy to check by using characteristic
functions.  Note first that
<math display="block">
\mu_2 = E(X^2) = \int_{-\infty}^{+\infty} \frac{x^2}{\pi(1 + x^2)}\, dx = \infty
</math>
so that <math>\mu_2</math> is infinite.  Nevertheless, we can define the characteristic
function <math>k_X(\tau)</math> of <math>x</math> by the formula
<math display="block">
k_X(\tau) = \int_{-\infty}^{+\infty} e^{i\tau x}\frac1{\pi(1 + x^2)}\, dx\ .
</math>
This integral is easy to do by contour methods, and gives us
<math display="block">
k_X(\tau) = k_Y(\tau) = e^{-|\tau|}\ .
</math>
Hence,
<math display="block">
k_{X + Y}(\tau) = (e^{-|\tau|})^2 = e^{-2|\tau|}\ ,
</math>
and since
<math display="block">
k_Z(\tau) = k_{X + Y}(\tau/2)\ ,
</math>
we have
<math display="block">
k_Z(\tau) = e^{-2|\tau/2|} = e^{-|\tau|}\ .
</math>
This shows that <math>k_Z = k_X = k_Y</math>, and leads to the conclusions that <math>f_Z = f_X
= f_Y</math>.
It follows from this that if <math>X_1</math>, <math>X_2</math>, \ldots, <math>X_n</math> is an independent
trials process with common Cauchy density, and if
<math display="block">
A_n = \frac{X_1 + X_2 + \cdots+ X_n}n
</math>
is the average of the <math>X_i</math>, then <math>A_n</math> has the same density as do the <math>X_i</math>.
This means that the Law of Large Numbers fails for this process; the
distribution of the average <math>A_n</math> is exactly the same as for the individual
terms.  Our proof of the Law of Large Numbers fails in this case because the
variance of <math>X_i</math> is not finite.
\exercises
\newdimen\snellbaselineskip
\newdimen\snellskip
\snellskip=1.5ex
\snellbaselineskip=\baselineskip
\def\srule{\omit\kern.5em\vrule\kern-.5em}
\newbox\bigstrutbox
\setbox\bigstrutbox=\hbox{\vrule height14.5pt depth9.5pt width0pt}
\def\bigstrut{\relax\ifmmode\copy\bigstrutbox\else\unhcopy\bigstrutbox\fi}
\def\middlehrule#1#2{\noalign{\kern-\snellbaselineskip\kern\snellskip}
&\multispan#1\strut\hrulefill
&\omit\hbox to.5em{\hrulefill}\vrule
height \snellskip\kern-.5em&\multispan#2\hrulefill\cr}
\makeatletter
\def\bordermatrix#1{\begingroup \m@th
\@tempdima 8.75\p@
\setbox\z@\vbox{    \def\cr{\crcr\noalign{\kern2\p@\global\let\cr\endline}}    \ialign{<math>##</math>\hfil\kern2\p@\kern\@tempdima&\thinspace\hfil<math>##</math>\hfil
    &&\quad\hfil<math>##</math>\hfil\crcr
    \omit\strut\hfil\crcr\noalign{\kern-\snellbaselineskip}      #1\crcr\omit\strut\cr}}  \setbox\tw@\vbox{\unvcopy\z@\global\setbox\@ne\lastbox}  \setbox\tw@\hbox{\unhbox\@ne\unskip\global\setbox\@ne\lastbox}  \setbox\tw@\hbox{<math>\kern\wd\@ne\kern-\@tempdima\left(\kern-\wd\@ne
  \global\setbox\@ne\vbox{\box\@ne\kern2\p@}    \vcenter{\kern-\ht\@ne\unvbox\z@\kern-\snellbaselineskip}\,\right)</math>}  \null\;\vbox{\kern\ht\@ne\box\tw@}\endgroup}
\makeatother==General references==
{{cite web |url=https://math.dartmouth.edu/~prob/prob/prob.pdf |title=Grinstead and Snell’s Introduction to Probability |last=Doyle |first=Peter G.|date=2006 |access-date=June 6, 2024}}
==Notes==
{{Reflist|group=Notes}}

Revision as of 02:37, 9 June 2024

[math] \newcommand{\NA}{{\rm NA}} \newcommand{\mat}[1]{{\bf#1}} \newcommand{\exref}[1]{\ref{##1}} \newcommand{\secstoprocess}{\all} \newcommand{\NA}{{\rm NA}} \newcommand{\mathds}{\mathbb}[/math]

label{sec 10.3} In the previous section, we introduced the concepts of moments and moment generating functions for discrete random variables. These concepts have natural analogues for continuous random variables, provided some care is taken in arguments involving convergence.

Moments

If [math]X[/math] is a continuous random variable defined on the probability space [math]\Omega[/math], with density function [math]f_X[/math], then we define the [math]n[/math]th moment of [math]X[/math] by the formula

[[math]] \mu_n = E(X^n) = \int_{-\infty}^{+\infty} x^n f_X(x)\, dx\ , [[/math]]

provided the integral

[[math]] \mu_n = E(X^n) = \int_{-\infty}^{+\infty} |x|^n f_X(x)\, dx\ , [[/math]]

is finite. Then, just as in the discrete case, we see that [math]\mu_0 = 1[/math], [math]\mu_1 = \mu[/math], and [math]\mu_2 - \mu_1^2 = \sigma^2[/math].

Moment Generating Functions

Now we define the moment generating function [math]g(t)[/math] for [math]X[/math] by the formula

[[math]] \begin{eqnarray*} g(t) &=& \sum_{k = 0}^\infty \frac{\mu_k t^k}{k!} = \sum_{k = 0}^\infty \frac{E(X^k) t^k}{k!} \\ &=& E(e^{tX}) = \int_{-\infty}^{+\infty} e^{tx} f_X(x)\, dx\ , \end{eqnarray*} [[/math]]

provided this series converges. Then, as before, we have

[[math]] \mu_n = g^{(n)}(0)\ . [[/math]]


Examples

Example Let [math]X[/math] be a continuous random variable with range [math][0,1][/math] and density function [math]f_X(x) = 1[/math] for [math]0 \leq x \leq 1[/math] (uniform density). Then

[[math]] \mu_n = \int_0^1 x^n\, dx = \frac1{n + 1}\ , [[/math]]

and

[[math]] \begin{eqnarray*} g(t) &=& \sum_{k = 0}^\infty \frac{t^k}{(k+1)!}\\ &=& \frac{e^t - 1}t\ . \end{eqnarray*} [[/math]]

Here the series converges for all [math]t[/math]. Alternatively, we have

[[math]] \begin{eqnarray*} g(t) &=& \int_{-\infty}^{+\infty} e^{tx} f_X(x)\, dx \\ &=& \int_0^1 e^{tx}\, dx = \frac{e^t - 1}t\ . \end{eqnarray*} [[/math]]

Then (by L'H\^opital's rule)

[[math]] \begin{eqnarray*} \mu_0 &=& g(0) = \lim_{t \to 0} \frac{e^t - 1}t = 1\ , \\ \mu_1 &=& g'(0) = \lim_{t \to 0} \frac{te^t - e^t + 1}{t^2} = \frac12\ , \\ \mu_2 &=& g''(0) = \lim_{t \to 0} \frac{t^3e^t - 2t^2e^t + 2te^t - 2t}{t^4} = \frac13\ . \end{eqnarray*} [[/math]]

In particular, we verify that [math]\mu = g'(0) = 1/2[/math] and

[[math]] \sigma^2 = g''(0) - (g'(0))^2 = \frac13 - \frac14 = \frac1{12} [[/math]]

as before (see Example).

Example Let [math]X[/math] have range [math][\,0,\infty)[/math] and density function [math]f_X(x) = \lambda e^{-\lambda x}[/math] (exponential density with parameter [math]\lambda[/math]). In this case

[[math]] \begin{eqnarray*} \mu_n &=& \int_0^\infty x^n \lambda e^{-\lambda x}\, dx = \lambda(-1)^n \frac{d^n}{d\lambda^n} \int_0^\infty e^{-\lambda x}\, dx \\ &=& \lambda(-1)^n \frac{d^n}{d\lambda^n} [\frac1\lambda] = \frac{n!} {\lambda^n}\ , \end{eqnarray*} [[/math]]

and

[[math]] \begin{eqnarray*} g(t) &=& \sum_{k = 0}^\infty \frac{\mu_k t^k}{k!} \\ &=& \sum_{k = 0}^\infty [\frac t\lambda]^k = \frac\lambda{\lambda - t}\ . \end{eqnarray*} [[/math]]

Here the series converges only for [math]|t| \lt \lambda[/math]. Alternatively, we have

[[math]] \begin{eqnarray*} g(t) &=& \int_0^\infty e^{tx} \lambda e^{-\lambda x}\, dx \\ &=& \left.\frac{\lambda e^{(t - \lambda)x}}{t - \lambda}\right|_0^\infty = \frac\lambda{\lambda - t}\ . \end{eqnarray*} [[/math]]


Now we can verify directly that

[[math]] \mu_n = g^{(n)}(0) = \left.\frac{\lambda n!}{(\lambda - t)^{n + 1}}\right|_{t = 0} = \frac{n!}{\lambda^n}\ . [[/math]]

Example Let [math]X[/math] have range [math](-\infty,+\infty)[/math] and density function

[[math]] f_X(x) = \frac1{\sqrt{2\pi}} e^{-x^2/2} [[/math]]

(normal density). In this case we have

[[math]] \begin{eqnarray*} \mu_n &=& \frac1{\sqrt{2\pi}} \int_{-\infty}^{+\infty} x^n e^{-x^2/2}\, dx \\ &=& \left \{ \begin{array}{ll} \frac{(2m)!}{2^{m} m!}, & \mbox{if $ n = 2m$,}\cr 0, & \mbox{if $ n = 2m+1$.}\end{array}\right. \end{eqnarray*} [[/math]]

(These moments are calculated by integrating once by parts to show that [math]\mu_n = (n - 1)\mu_{n - 2}[/math], and observing that [math]\mu_0 = 1[/math] and [math]\mu_1 = 0[/math].) Hence,

[[math]] \begin{eqnarray*} g(t) &=& \sum_{n = 0}^\infty \frac{\mu_n t^n}{n!} \\ &=& \sum_{m = 0}^\infty \frac{t^{2m}}{2^{m} m!} = e^{t^2/2}\ . \end{eqnarray*} [[/math]]

This series converges for all values of [math]t[/math]. Again we can verify that [math]g^{(n)}(0) = \mu_n[/math].


Let [math]X[/math] be a normal random variable with parameters [math]\mu[/math] and [math]\sigma[/math]. It is easy to show that the moment generating function of [math]X[/math] is given by

[[math]] e^{t\mu + (\sigma^2/2)t^2}\ . [[/math]]

Now suppose that [math]X[/math] and [math]Y[/math] are two independent normal random variables with parameters [math]\mu_1[/math], [math]\sigma_1[/math], and [math]\mu_2[/math], [math]\sigma_2[/math], respectively. Then, the product of the moment generating functions of [math]X[/math] and [math]Y[/math] is

[[math]] e^{t(\mu_1 + \mu_2) + ((\sigma_1^2 + \sigma_2^2)/2)t^2}\ . [[/math]]

This is the moment generating function for a normal random variable with mean [math]\mu_1 + \mu_2[/math] and variance [math]\sigma_1^2 + \sigma_2^2[/math]. Thus, the sum of two independent normal random variables is again normal. (This was proved for the special case that both summands are standard normal in Example \ref{exam 7.8}.)

In general, the series defining [math]g(t)[/math] will not converge for all [math]t[/math]. But in the important special case where [math]X[/math] is bounded (i.e., where the range of [math]X[/math] is contained in a finite interval), we can show that the series does converge for all [math]t[/math].

Theorem

Suppose [math]X[/math] is a continuous random variable with range contained in the interval [math][-M,M][/math]. Then the series

[[math]] g(t) = \sum_{k = 0}^\infty \frac{\mu_k t^k}{k!} [[/math]]
converges for all [math]t[/math] to an infinitely differentiable function [math]g(t)[/math], and [math]g^{(n)}(0) = \mu_n[/math].\n

Show Proof

We have

[[math]] \mu_k = \int_{-M}^{+M} x^k f_X(x)\, dx\ , [[/math]]
so

[[math]] \begin{eqnarray*} |\mu_k| &\leq& \int_{-M}^{+M} |x|^k f_X(x)\, dx \\ &\leq& M^k \int_{-M}^{+M} f_X(x)\, dx = M^k\ . \end{eqnarray*} [[/math]]
Hence, for all [math]N[/math] we have

[[math]] \sum_{k = 0}^N \left|\frac{\mu_k t^k}{k!}\right| \leq \sum_{k = 0}^N \frac{(M|t|)^k}{k!} \leq e^{M|t|}\ , [[/math]]
which shows that the power series converges for all [math]t[/math]. We know that the sum of a convergent power series is always differentiable.

Moment Problem

Theorem

If [math]X[/math] is a bounded random variable, then the moment generating function [math]g_X(t)[/math] of [math]x[/math] determines the density function [math]f_X(x)[/math] uniquely.

Sketch of the Proof. We know that

[[math]] \begin{eqnarray*} g_X(t) &=& \sum_{k = 0}^\infty \frac{\mu_k t^k}{k!} \\ &=& \int_{-\infty}^{+\infty} e^{tx} f(x)\, dx\ . \end{eqnarray*} [[/math]]


If we replace [math]t[/math] by [math]i\tau[/math], where [math]\tau[/math] is real and [math]i = \sqrt{-1}[/math], then the series converges for all [math]\tau[/math], and we can define the function

[[math]] k_X(\tau) = g_X(i\tau) = \int_{-\infty}^{+\infty} e^{i\tau x} f_X(x)\, dx\ . [[/math]]

The function [math]k_X(\tau)[/math] is called the characteristic function of [math]X[/math], and is defined by the above equation even when the series for [math]g_X[/math] does not converge. This equation says that [math]k_X[/math] is the Fourier transform of [math]f_X[/math]. It is known that the Fourier transform has an inverse, given by the formula

[[math]] f_X(x) = \frac1{2\pi} \int_{-\infty}^{+\infty} e^{-i\tau x} k_X(\tau)\, d\tau\ , [[/math]]
suitably interpreted.[Notes 1] Here we see that the characteristic function [math]k_X[/math], and hence the moment generating function [math]g_X[/math], determines the density function [math]f_X[/math] uniquely under our hypotheses.

Sketch of the Proof of the Central Limit Theorem

With the above result in mind, we can now sketch a proof of the Central Limit Theorem for bounded continuous random variables (see Theorem). To this end, let [math]X[/math] be a continuous random variable with density function [math]f_X[/math], mean [math]\mu = 0[/math] and variance [math]\sigma^2 = 1[/math], and moment generating function [math]g(t)[/math] defined by its series for all [math]t[/math]. Let [math]X_1[/math], [math]X_2[/math], \ldots, [math]X_n[/math] be an independent trials process with each [math]X_i[/math] having density [math]f_X[/math], and let [math]S_n = X_1 + X_2 +\cdots+ X_n[/math], and [math]S_n^* = (S_n - n\mu)/\sqrt{n\sigma^2} = S_n/\sqrt n[/math]. Then each [math]X_i[/math] has moment generating function [math]g(t)[/math], and since the [math]X_i[/math] are independent, the sum [math]S_n[/math], just as in the discrete case (see Section \ref{sec 10.1}), has moment generating function

[[math]] g_n(t) = (g(t))^n\ , [[/math]]

and the standardized sum [math]S_n^*[/math] has moment generating function

[[math]] g_n^*(t) = \left(g\left(\frac t{\sqrt n}\right)\right)^n\ . [[/math]]

We now show that, as [math]n \to \infty[/math], [math]g_n^*(t) \to e^{t^2/2}[/math], where [math]e^{t^2/2}[/math] is the moment generating function of the normal density [math]n(x) = (1/\sqrt{2\pi}) e^{-x^2/2}[/math] (see Example). To show this, we set [math]u(t) = \log g(t)[/math], and

[[math]] \begin{eqnarray*} u_n^*(t) &=& \log g_n^*(t) \\ &=& n\log g\left(\frac t{\sqrt n}\right) = nu\left(\frac t{\sqrt n}\right)\ , \end{eqnarray*} [[/math]]

and show that [math]u_n^*(t) \to t^2/2[/math] as [math]n \to \infty[/math]. First we note that

[[math]] \begin{eqnarray*} u(0) &=& \log g_n(0) = 0\ , \\ u'(0) &=& \frac{g'(0)}{g(0)} = \frac{\mu_1}1 = 0\ , \\ u''(0) &=& \frac{g''(0)g(0) - (g'(0))^2}{(g(0))^2} \\ &=& \frac{\mu_2 - \mu_1^2}1 = \sigma^2 = 1\ . \end{eqnarray*} [[/math]]

Now by using L'H\^opital's rule twice, we get

[[math]] \begin{eqnarray*} \lim_{n \to \infty} u_n^*(t) &=& \lim_{s \to \infty} \frac{u(t/\sqrt s)}{s^{-1}}\\ &=& \lim_{s \to \infty} \frac{u'(t/\sqrt s) t}{2s^{-1/2}} \\ &=& \lim_{s \to \infty} u''\left(\frac t{\sqrt s}\right) \frac{t^2}2 = \sigma^2 \frac{t^2}2 = \frac{t^2}2\ . \end{eqnarray*} [[/math]]

Hence, [math]g_n^*(t) \to e^{t^2/2}[/math] as [math]n \to \infty[/math]. Now to complete the proof of the Central Limit Theorem, we must show that if [math]g_n^*(t) \to e^{t^2/2}[/math], then under our hypotheses the distribution functions [math]F_n^*(x)[/math] of the [math]S_n^*[/math] must converge to the distribution function [math]F_N^*(x)[/math] of the normal variable [math]N[/math]; that is, that

[[math]] F_n^*(a) = P(S_n^* \leq a) \to \frac1{\sqrt{2\pi}} \int_{-\infty}^a e^{-x^2/2}\, dx\ , [[/math]]

and furthermore, that the density functions [math]f_n^*(x)[/math] of the [math]S_n^*[/math] must converge to the density function for [math]N[/math]; that is, that

[[math]] f_n^*(x) \to \frac1{\sqrt{2\pi}} e^{-x^2/2}\ , [[/math]]

as [math]n \rightarrow \infty[/math].


Since the densities, and hence the distributions, of the [math]S_n^*[/math] are uniquely determined by their moment generating functions under our hypotheses, these conclusions are certainly plausible, but their proofs involve a detailed examination of characteristic functions and Fourier transforms, and we shall not attempt them here.


In the same way, we can prove the Central Limit Theorem for bounded discrete random variables with integer values (see Theorem). Let [math]X[/math] be a discrete random variable with density function [math]p(j)[/math], mean [math]\mu = 0[/math], variance [math]\sigma^2 = 1[/math], and moment generating function [math]g(t)[/math], and let [math]X_1[/math], [math]X_2[/math], \ldots, [math]X_n[/math] form an independent trials process with common density [math]p[/math]. Let [math]S_n = X_1 + X_2 +\cdots+ X_n[/math] and [math]S_n^* = S_n/\sqrt n[/math], with densities [math]p_n[/math] and [math]p_n^*[/math], and moment generating functions [math]g_n(t)[/math] and [math]g_n^*(t) = \left(g(\frac t{\sqrt n})\right)^n.[/math] Then we have

[[math]] g_n^*(t) \to e^{t^2/2}\ , [[/math]]

just as in the continuous case, and this implies in the same way that the distribution functions [math]F_n^*(x)[/math] converge to the normal distribution; that is, that

[[math]] F_n^*(a) = P(S_n^* \leq a) \to \frac1{\sqrt{2\pi}} \int_{-\infty}^a e^{-x^2/2}\, dx\ , [[/math]]

as [math]n \rightarrow \infty[/math].


The corresponding statement about the distribution functions [math]p_n^*[/math], however, requires a little extra care (see Theorem). The trouble arises because the distribution [math]p(x)[/math] is not defined for all [math]x[/math], but only for integer [math]x[/math]. It follows that the distribution [math]p_n^*(x)[/math] is defined only for [math]x[/math] of the form [math]j/\sqrt n[/math], and these values change as [math]n[/math] changes. We can fix this, however, by introducing the function [math]\bar p(x)[/math], defined by the formula

[[math]] \bar p(x) = \left \{ \begin{array}{ll} p(j), & \mbox{if $j - 1/2 \leq x \lt j + 1/2$,} \cr 0\ , & \mbox{otherwise}.\end{array}\right. [[/math]]

Then [math]\bar p(x)[/math] is defined for all [math]x[/math], [math]\bar p(j) = p(j)[/math], and the graph of [math]\bar p(x)[/math] is the step function for the distribution [math]p(j)[/math] (see Figure 3 of Section \ref{sec 9.1}). In the same way we introduce the step function [math]\bar p_n(x)[/math] and [math]\bar p_n^*(x)[/math] associated with the distributions [math]p_n[/math] and [math]p_n^*[/math], and their moment generating functions [math]\bar g_n(t)[/math] and [math]\bar g_n^*(t)[/math]. If we can show that [math]\bar g_n^*(t) \to e^{t^2/2}[/math], then we can conclude that

[[math]] \bar p_n^*(x) \to \frac1{\sqrt{2\pi}} e^{t^2/2}\ , [[/math]]

as [math]n \rightarrow \infty[/math], for all [math]x[/math], a conclusion strongly suggested by Figure \ref{fig 9.2}.


Now [math]\bar g(t)[/math] is given by

[[math]] \begin{eqnarray*} \bar g(t) &=& \int_{-\infty}^{+\infty} e^{tx} \bar p(x)\, dx \\ &=& \sum_{j = -N}^{+N} \int_{j - 1/2}^{j + 1/2} e^{tx} p(j)\, dx\\ &=& \sum_{j = -N}^{+N} p(j) e^{tj} \frac{e^{t/2} - e^{-t/2}} {2t/2} \\ &=& g(t) \frac{\sinh(t/2)}{t/2}\ , \end{eqnarray*} [[/math]]

where we have put

[[math]] \sinh(t/2) = \frac{e^{t/2} - e^{-t/2}}2\ . [[/math]]


In the same way, we find that

[[math]] \begin{eqnarray*} \bar g_n(t) &=& g_n(t) \frac{\sinh(t/2)}{t/2}\ , \\ \bar g_n^*(t) &=& g_n^*(t) \frac{\sinh(t/2\sqrt n)}{t/2\sqrt n}\ . \end{eqnarray*} [[/math]]

Now, as [math]n \to \infty[/math], we know that [math]g_n^*(t) \to e^{t^2/2}[/math], and, by L'H\^opital's rule,

[[math]] \lim_{n \to \infty} \frac{\sinh(t/2\sqrt n)}{t/2\sqrt n} = 1\ . [[/math]]

It follows that

[[math]] \bar g_n^*(t) \to e^{t^2/2}\ , [[/math]]

and hence that

[[math]] \bar p_n^*(x) \to \frac1{\sqrt{2\pi}} e^{-x^2/2}\ , [[/math]]

as [math]n \rightarrow \infty[/math]. The astute reader will note that in this sketch of the proof of Theorem, we never made use of the hypothesis that the greatest common divisor of the differences of all the values that the [math]X_i[/math] can take on is 1. This is a technical point that we choose to ignore. A complete proof may be found in Gnedenko and Kolmogorov.[Notes 2]

Cauchy Density

The characteristic function of a continuous density is a useful tool even in cases when the moment series does not converge, or even in cases when the moments themselves are not finite. As an example, consider the Cauchy density with parameter [math]a = 1[/math] (see Example)

[[math]] f(x) = \frac1{\pi(1 + x^2)}\ . [[/math]]

If [math]X[/math] and [math]Y[/math] are independent random variables with Cauchy density [math]f(x)[/math], then the average [math]Z = (X + Y)/2[/math] also has Cauchy density [math]f(x)[/math], that is,

[[math]] f_Z(x) = f(x)\ . [[/math]]

This is hard to check directly, but easy to check by using characteristic functions. Note first that

[[math]] \mu_2 = E(X^2) = \int_{-\infty}^{+\infty} \frac{x^2}{\pi(1 + x^2)}\, dx = \infty [[/math]]

so that [math]\mu_2[/math] is infinite. Nevertheless, we can define the characteristic function [math]k_X(\tau)[/math] of [math]x[/math] by the formula

[[math]] k_X(\tau) = \int_{-\infty}^{+\infty} e^{i\tau x}\frac1{\pi(1 + x^2)}\, dx\ . [[/math]]

This integral is easy to do by contour methods, and gives us

[[math]] k_X(\tau) = k_Y(\tau) = e^{-|\tau|}\ . [[/math]]

Hence,

[[math]] k_{X + Y}(\tau) = (e^{-|\tau|})^2 = e^{-2|\tau|}\ , [[/math]]

and since

[[math]] k_Z(\tau) = k_{X + Y}(\tau/2)\ , [[/math]]

we have

[[math]] k_Z(\tau) = e^{-2|\tau/2|} = e^{-|\tau|}\ . [[/math]]

This shows that [math]k_Z = k_X = k_Y[/math], and leads to the conclusions that [math]f_Z = f_X = f_Y[/math]. It follows from this that if [math]X_1[/math], [math]X_2[/math], \ldots, [math]X_n[/math] is an independent trials process with common Cauchy density, and if

[[math]] A_n = \frac{X_1 + X_2 + \cdots+ X_n}n [[/math]]

is the average of the [math]X_i[/math], then [math]A_n[/math] has the same density as do the [math]X_i[/math]. This means that the Law of Large Numbers fails for this process; the distribution of the average [math]A_n[/math] is exactly the same as for the individual terms. Our proof of the Law of Large Numbers fails in this case because the variance of [math]X_i[/math] is not finite. \exercises


\newdimen\snellbaselineskip \newdimen\snellskip \snellskip=1.5ex \snellbaselineskip=\baselineskip \def\srule{\omit\kern.5em\vrule\kern-.5em} \newbox\bigstrutbox \setbox\bigstrutbox=\hbox{\vrule height14.5pt depth9.5pt width0pt} \def\bigstrut{\relax\ifmmode\copy\bigstrutbox\else\unhcopy\bigstrutbox\fi} \def\middlehrule#1#2{\noalign{\kern-\snellbaselineskip\kern\snellskip} &\multispan#1\strut\hrulefill &\omit\hbox to.5em{\hrulefill}\vrule height \snellskip\kern-.5em&\multispan#2\hrulefill\cr} \makeatletter \def\bordermatrix#1{\begingroup \m@th

\@tempdima 8.75\p@
\setbox\z@\vbox{    \def\cr{\crcr\noalign{\kern2\p@\global\let\cr\endline}}    \ialign{[math]##[/math]\hfil\kern2\p@\kern\@tempdima&\thinspace\hfil[math]##[/math]\hfil
    &&\quad\hfil[math]##[/math]\hfil\crcr
    \omit\strut\hfil\crcr\noalign{\kern-\snellbaselineskip}      #1\crcr\omit\strut\cr}}  \setbox\tw@\vbox{\unvcopy\z@\global\setbox\@ne\lastbox}  \setbox\tw@\hbox{\unhbox\@ne\unskip\global\setbox\@ne\lastbox}  \setbox\tw@\hbox{[math]\kern\wd\@ne\kern-\@tempdima\left(\kern-\wd\@ne
   \global\setbox\@ne\vbox{\box\@ne\kern2\p@}    \vcenter{\kern-\ht\@ne\unvbox\z@\kern-\snellbaselineskip}\,\right)[/math]}  \null\;\vbox{\kern\ht\@ne\box\tw@}\endgroup}

\makeatother==General references== Doyle, Peter G. (2006). "Grinstead and Snell's Introduction to Probability" (PDF). Retrieved June 6, 2024.

Notes

  1. H. Dym and H. P. McKean, Fourier Series and Integrals (New York: Academic Press, 1972).
  2. B. V. Gnedenko and A. N. Kolomogorov, Limit Distributions for Sums of Independent Random Variables (Reading: Addison-Wesley, 1968), p. 233.