guide:C631488f9a: Difference between revisions

From Stochiki
No edit summary
 
No edit summary
Line 1: Line 1:
<div class="d-none"><math>
\newcommand{\NA}{{\rm NA}}
\newcommand{\mat}[1]{{\bf#1}}
\newcommand{\exref}[1]{\ref{##1}}
\newcommand{\secstoprocess}{\all}
\newcommand{\NA}{{\rm NA}}
\newcommand{\mathds}{\mathbb}</math></div>
label{sec 6.2}
The usefulness of the expected value as a prediction for the outcome of an experiment
is increased when the outcome is not likely to deviate too much from the expected
value.  In this section we shall introduce a measure of this deviation, called the
variance.


===Variance===
{{defncard|label=|id=def 6.4| Let <math>X</math> be a numerically valued random variable
with expected value <math>\mu = E(X)</math>.  Then the  ''variance'' of <math>X</math>, denoted by
<math>V(X)</math>, is
<math display="block">
V(X) = E((X - \mu)^2)\ .
</math>
}} Note that, by [[guide:E4fd10ce73#thm 6.3.5 |Theorem]], <math>V(X)</math> is given by
<span id{{=}}"eq 6.1"/>
<math display="block">
\begin{equation} V(X) = \sum_x (x - \mu)^2 m(x)\ ,
\label{eq 6.1}
\end{equation}
</math>
where <math>m</math> is the distribution function of <math>X</math>.
===Standard Deviation===
The  ''standard deviation'' of <math>X</math>, denoted by <math>D(X)</math>, is <math>D(X) =
\sqrt {V(X)}</math>.  We often write <math>\sigma</math> for <math>D(X)</math> and <math>\sigma^2</math> for <math>V(X)</math>.
<span id="exam 6.13"/>
'''Example'''
that turns up.  To find
<math>V(X)</math>, we must first find the expected value of <math>X</math>.  This is
<math display="block">
\begin{eqnarray*}
\mu & = & E(X) = 1\Bigl(\frac 16\Bigr) + 2\Bigl(\frac 16\Bigr) + 3\Bigl(\frac
16\Bigr) +  4\Bigl(\frac 16\Bigr) + 5\Bigl(\frac 16\Bigr) + 6\Bigl(\frac 16\Bigr) \\
  & = & \frac 72\ .
\end{eqnarray*}
</math>
To find the variance of <math>X</math>, we form the new random variable <math>(X - \mu)^2</math> and
compute its expectation.  We can easily do this using the following table.
<span id="table 6.6"/>
{|class="table"
|+ Variance calculation.
|-
|||||
|-
|$x$ || <math>m(x)</math> || <math>(x - 7/2)^2</math>
|-
|1 || 1/6 ||      25/4 
|-
|2 || 1/6||      9/4 
|-
|3 || 1/6 ||      1/4 
|-
|4 || 1/6||      1/4 
|-
|5 || 1/6 ||      9/4 
|-
|6 || 1/6 ||      25/4 
|}
From this table we find <math>E((X - \mu)^2)</math> is
<math display="block">
\begin{eqnarray*} V(X) & = & \frac 16 \left( \frac {25}4 + \frac 94 + \frac 14 +
\frac 14 + \frac 94 + \frac {25}4 \right) \\
    & = &\frac {35}{12}\ ,
\end{eqnarray*}
</math>
and the standard deviation <math>D(X) = \sqrt{35/12} \approx 1.707</math>.
===Calculation of Variance===
We next prove a theorem that gives us a useful alternative form for computing the
variance.
{{proofcard|Theorem|thm_6.7|If <math>X</math> is any random variable with <math>E(X) = \mu</math>, then
<math display="block">
V(X) = E(X^2) - \mu^2\ .
</math>\n|We have
<math display="block">
\begin{eqnarray*} V(X) & = & E((X - \mu)^2) = E(X^2 - 2\mu X + \mu^2) \\
    & = & E(X^2) - 2\mu E(X) + \mu^2 = E(X^2) - \mu^2\ .
\end{eqnarray*}
</math>}}
Using [[#thm 6.7 |Theorem]], we can compute the variance of the outcome of a roll of
a die by first computing
<math display="block">
\begin{eqnarray*} E(X^2) & = & 1\Bigl(\frac 16\Bigr) + 4\Bigl(\frac 16\Bigr) +
9\Bigl(\frac 16\Bigr) + 16\Bigl(\frac 16\Bigr) + 25\Bigl(\frac 16\Bigr) +
36\Bigl(\frac 16\Bigr) \\
      & = &\frac {91}6\ ,
\end{eqnarray*}
</math>
and,
<math display="block">
V(X)  =  E(X^2) - \mu^2 = \frac {91}{6} - \Bigl(\frac 72\Bigr)^2
= \frac {35}{12}\ ,
</math>
in agreement with the value obtained directly from the definition of
<math>V(X)</math>.
===Properties of Variance===
The variance has properties very different from those of the expectation.  If
<math>c</math> is any constant, <math>E(cX) = cE(X)</math> and <math>E(X + c) = E(X) + c</math>.  These two statements
imply that the expectation is a linear function.  However, the variance is not
linear, as seen in the next theorem.
{{proofcard|Theorem|thm_6.6|If <math>X</math> is any random variable and <math>c</math> is any constant,
then
<math display="block">
V(cX)  = c^2 V(X)
</math>
and
<math display="block">
V(X + c) = V(X)\ .
</math>\n|Let <math>\mu = E(X)</math>.  Then <math>E(cX) = c\mu</math>, and
<math display="block">
\begin{eqnarray*} V(cX) &=& E((cX - c\mu)^2) = E(c^2(X - \mu)^2) \\
    &=& c^2 E((X - \mu)^2) = c^2 V(X)\ .
\end{eqnarray*}
</math>
To prove the second assertion, we note that, to compute <math>V(X + c)</math>, we would replace
<math>x</math> by <math>x + c</math> and <math>\mu</math> by <math>\mu + c</math> in [[#eq 6.1 |Equation]]. Then the
<math>c</math>'s would cancel, leaving <math>V(X)</math>.}}
We turn now to some general properties of the variance.  Recall that if <math>X</math> and
<math>Y</math> are any two random variables, <math>E(X + Y) = E(X) + E(Y)</math>.  This is not always true
for the case of the variance.  For example, let <math>X</math> be a random variable with
<math>V(X) \ne 0</math>, and define <math>Y = -X</math>.  Then <math>V(X) = V(Y)</math>, so that <math>V(X) + V(Y) =
2V(X)</math>.  But
<math>X + Y</math> is always 0 and hence has variance 0.  Thus <math>V(X + Y) \ne V(X) + V(Y)</math>.
In the important case of mutually independent random variables, however,  ''the
variance of the sum is the sum of the variances.''
{{proofcard|Theorem|thm_6.8|Let <math>X</math> and <math>Y</math> be two  ''independent'' random
variables.  Then
<math display="block">
V(X + Y) = V(X) + V(Y)\ .
</math>\n|Let <math>E(X) = a</math> and <math>E(Y) = b</math>.  Then
<math display="block">
\begin{eqnarray*} V(X + Y) & = & E((X + Y)^2) - (a + b)^2 \\
        & = & E(X^2) + 2E(XY) + E(Y^2) - a^2 - 2ab - b^2\ .
\end{eqnarray*}
</math>
Since <math>X</math> and <math>Y</math> are independent, <math>E(XY) = E(X)E(Y) = ab</math>.  Thus,
<math display="block">
V(X + Y) = E(X^2) - a^2 + E(Y^2) - b^2 = V(X) + V(Y)\ .
</math>}}
It is easy to extend this proof, by mathematical induction, to show that ''
the variance of the sum of any number of mutually independent random variables is the sum
of the individual variances.''  Thus we have the following theorem.
{{proofcard|Theorem|thm_6.9|Let <math>X_1</math>, <math>X_2</math>, \dots, <math>X_n</math> be an independent
trials process with <math>E(X_j) =
\mu</math> and <math>V(X_j) = \sigma^2</math>.  Let
<math display="block">
S_n = X_1 + X_2 +\cdots+ X_n
</math>
be the sum, and
<math display="block">
A_n = \frac {S_n}n
</math>
be the average.  Then
<math display="block">
\begin{eqnarray*} E(S_n) &=& n\mu\ , \\
V(S_n) &=& n\sigma^2\ , \\
\sigma(S_n) &=& \sigma \sqrt{n}\ , \\
E(A_n) &=& \mu\ , \\
V(A_n) &=& \frac {\sigma^2}\ , \\
\sigma(A_n) &=& \frac{\sigma}{\sqrt n}\ .
\end{eqnarray*}
</math>\n|Since all the random variables <math>X_j</math> have the same expected value, we have
<math display="block">
E(S_n) = E(X_1) +\cdots+ E(X_n) = n\mu\ ,
</math>
<math display="block">
V(S_n) = V(X_1) +\cdots+ V(X_n) = n\sigma^2\ ,
</math>
and
<math display="block">
\sigma(S_n) = \sigma \sqrt{n}\ .
</math>
We have seen that, if we multiply a random variable <math>X</math> with mean <math>\mu</math> and variance
<math>\sigma^2</math> by a constant <math>c</math>, the new random variable has expected value <math>c\mu</math> and
variance <math>c^2\sigma^2</math>.  Thus,
<math display="block">
E(A_n) = E\left(\frac {S_n}n \right) = \frac {n\mu}n = \mu\ ,
</math>
and
<math display="block">
V(A_n) = V\left( \frac {S_n}n \right) = \frac {V(S_n)}{n^2} = \frac
{n\sigma^2}{n^2} = \frac {\sigma^2}n\ .
</math>
Finally, the standard deviation of <math>A_n</math> is given by
<math display="block">
\sigma(A_n) = \frac {\sigma}{\sqrt n}\ .
</math>}}
The last equation in the above theorem implies that in an independent trials process,
if the individual summands have finite variance, then the standard deviation of the
average goes to 0 as <math>n \rightarrow \infty</math>.  Since the standard deviation tells us
something about the spread of the distribution around the mean, we see that for large
values of <math>n</math>, the value of <math>A_n</math>  is usually very close to the mean of <math>A_n</math>, which
equals <math>\mu</math>, as shown above.  This statement is made precise in Chapter \ref{chp 8},
where it is called the Law of Large Numbers.  For example, let <math>X</math> represent the roll
of a fair die.  In Figure \ref{fig 6.4.5}, we show the distribution of a random variable
<math>A_n</math> corresponding to <math>X</math>, for <math>n = 10</math> and <math>n = 100</math>.
<div id="PSfig6-4-5" class="d-flex justify-content-center">
[[File:guide_e6d15_PSfig6-4-5.ps | 400px | thumb |  ]]
</div>
<span id="exam 6.14"/>
'''Example'''
<math>X_j</math> is the outcome if the
<math>j</math>th roll, then <math>E(X_j) = 7/2</math> and <math>V(X_j) = 35/12</math>.  Thus, if <math>S_n</math> is the sum of
the outcomes, and <math>A_n = S_n/n</math> is the average of the outcomes, we have
<math>E(A_n) = 7/2</math> and <math>V(A_n) = (35/12)/n</math>.  Therefore, as <math>n</math> increases, the expected
value of the average remains constant, but the variance tends to 0.  If the variance
is a measure of the expected deviation from the mean this would indicate that, for
large <math>n</math>, we can expect the average to be very near the expected value.  This is in
fact the case, and we shall justify it in Chapter \ref{chp 8}.
===Bernoulli Trials===
Consider next the general Bernoulli trials process.  As usual, we let <math>X_j = 1</math> if
the <math>j</math>th outcome is a success and 0 if it is a failure.  If <math>p</math> is the probability
of a success, and <math>q = 1 - p</math>, then
<math display="block">
\begin{eqnarray*} E(X_j) & = & 0q + 1p = p\ , \\ E(X_j^2) & = & 0^2q + 1^2p = p\ ,
\end{eqnarray*}
</math>
and
<math display="block">
V(X_j) = E(X_j^2) - (E(X_j))^2 = p - p^2 = pq\ .
</math>
Thus, for Bernoulli trials, if <math>S_n = X_1 + X_2 +\cdots+ X_n</math> is the number of
successes, then <math>E(S_n) = np</math>, <math>V(S_n) = npq</math>, and <math>D(S_n) = \sqrt{npq}.</math>  If
<math>A_n = S_n/n</math> is the average number of successes, then <math>E(A_n) = p</math>, <math>V(A_n) = pq/n</math>,
and <math>D(A_n) = \sqrt{pq/n}</math>.  We see that the expected proportion of successes remains
<math>p</math> and the variance tends to 0.  This suggests that the frequency interpretation of
probability is a correct one.  We shall make this more precise in Chapter \ref{chp 8}.
<span id="exam 6.15"/>
'''Example'''
success in a Bernoulli trials process.  Then <math>T</math> is geometrically distributed.  What
is the variance of <math>T</math>?  In [[guide:448d2aa013#exam 5.7 |Example]], we saw that
<math display="block">
m_T = \pmatrix{1 &  2 &    3 & \cdots \cr p & qp & q^2p & \cdots \cr}.
</math>
In [[guide:E4fd10ce73#exam 6.8 |Example]], we showed that
<math display="block">
E(T) = 1/p\ .
</math>
Thus,
<math display="block">
V(T) = E(T^2) - 1/p^2\ ,
</math>
so we need only find
<math display="block">
\begin{eqnarray*} E(T^2) & = & 1p + 4qp + 9q^2p + \cdots \\
      & = & p(1 + 4q + 9q^2 + \cdots )\ .
\end{eqnarray*}
</math>
To evaluate this sum, we start again with
<math display="block">
1 + x + x^2 +\cdots= \frac 1{1 - x}\ .
</math>
Differentiating, we obtain
<math display="block">
1 + 2x + 3x^2 +\cdots= \frac 1{(1 - x)^2}\ .
</math>
Multiplying by <math>x</math>,
<math display="block">
x + 2x^2 + 3x^3 +\cdots= \frac x{(1 - x)^2}\ .
</math>
Differentiating again gives
<math display="block">
1 + 4x + 9x^2 +\cdots= \frac {1 + x}{(1 - x)^3}\ .
</math>
Thus,
<math display="block">
E(T^2) = p\frac {1 + q}{(1 - q)^3} = \frac {1 + q}{p^2}
</math>
and
<math display="block">
\begin{eqnarray*} V(T) & = & E(T^2) - (E(T))^2 \\
    & = & \frac {1 + q}{p^2} - \frac 1{p^2} = \frac q{p^2}\ .
\end{eqnarray*}
</math>
For example, the variance for the number of tosses of a coin until the first head
turns up is <math>(1/2)/(1/2)^2 = 2</math>.  The variance for the number of rolls of a die until
the first six turns up is <math>(5/6)/(1/6)^2 = 30</math>.  Note that, as <math>p</math> decreases, the
variance increases rapidly.  This corresponds to the increased spread of the
geometric distribution as <math>p</math> decreases (noted in Figure \ref{fig 5.4}).
===Poisson Distribution===
Just as in the case of expected values, it is easy to guess the variance of the
Poisson distribution with parameter <math>\lambda</math>.  We recall that the variance of a
binomial distribution with parameters <math>n</math> and <math>p</math> equals <math>npq</math>.  We also recall that
the Poisson distribution could be obtained as a limit of binomial distributions, if
<math>n</math> goes to <math>\infty</math> and <math>p</math> goes to 0 in such a way that their product is kept fixed at
the value <math>\lambda</math>.  In this case, <math>npq = \lambda q</math> approaches <math>\lambda</math>, since
<math>q</math> goes to 1.  So, given a Poisson distribution with parameter <math>\lambda</math>, we should
guess that its variance is <math>\lambda</math>.  The reader is asked to show this in
Exercise [[exercise:497ffe72e6 |Exercise]].
\exercises
\choice{}{==General references==
{{cite web |url=https://math.dartmouth.edu/~prob/prob/prob.pdf |title=Grinstead and Snell’s Introduction to Probability |last=Doyle |first=Peter G.|date=2006 |access-date=June 6, 2024}}

Revision as of 03:37, 9 June 2024

[math] \newcommand{\NA}{{\rm NA}} \newcommand{\mat}[1]{{\bf#1}} \newcommand{\exref}[1]{\ref{##1}} \newcommand{\secstoprocess}{\all} \newcommand{\NA}{{\rm NA}} \newcommand{\mathds}{\mathbb}[/math]

label{sec 6.2} The usefulness of the expected value as a prediction for the outcome of an experiment is increased when the outcome is not likely to deviate too much from the expected value. In this section we shall introduce a measure of this deviation, called the variance.

Variance

Definition

Let [math]X[/math] be a numerically valued random variable

with expected value [math]\mu = E(X)[/math]. Then the variance of [math]X[/math], denoted by [math]V(X)[/math], is

[[math]] V(X) = E((X - \mu)^2)\ . [[/math]]

Note that, by Theorem, [math]V(X)[/math] is given by

[[math]] \begin{equation} V(X) = \sum_x (x - \mu)^2 m(x)\ , \label{eq 6.1} \end{equation} [[/math]]

where [math]m[/math] is the distribution function of [math]X[/math].

Standard Deviation

The standard deviation of [math]X[/math], denoted by [math]D(X)[/math], is [math]D(X) = \sqrt {V(X)}[/math]. We often write [math]\sigma[/math] for [math]D(X)[/math] and [math]\sigma^2[/math] for [math]V(X)[/math].

Example that turns up. To find [math]V(X)[/math], we must first find the expected value of [math]X[/math]. This is

[[math]] \begin{eqnarray*} \mu & = & E(X) = 1\Bigl(\frac 16\Bigr) + 2\Bigl(\frac 16\Bigr) + 3\Bigl(\frac 16\Bigr) + 4\Bigl(\frac 16\Bigr) + 5\Bigl(\frac 16\Bigr) + 6\Bigl(\frac 16\Bigr) \\ & = & \frac 72\ . \end{eqnarray*} [[/math]]


To find the variance of [math]X[/math], we form the new random variable [math](X - \mu)^2[/math] and compute its expectation. We can easily do this using the following table.

Variance calculation.
$x$ [math]m(x)[/math] [math](x - 7/2)^2[/math]
1 1/6 25/4
2 1/6 9/4
3 1/6 1/4
4 1/6 1/4
5 1/6 9/4
6 1/6 25/4

From this table we find [math]E((X - \mu)^2)[/math] is

[[math]] \begin{eqnarray*} V(X) & = & \frac 16 \left( \frac {25}4 + \frac 94 + \frac 14 + \frac 14 + \frac 94 + \frac {25}4 \right) \\ & = &\frac {35}{12}\ , \end{eqnarray*} [[/math]]

and the standard deviation [math]D(X) = \sqrt{35/12} \approx 1.707[/math].


Calculation of Variance

We next prove a theorem that gives us a useful alternative form for computing the variance.

Theorem

If [math]X[/math] is any random variable with [math]E(X) = \mu[/math], then

[[math]] V(X) = E(X^2) - \mu^2\ . [[/math]]
\n

Show Proof

We have

[[math]] \begin{eqnarray*} V(X) & = & E((X - \mu)^2) = E(X^2 - 2\mu X + \mu^2) \\ & = & E(X^2) - 2\mu E(X) + \mu^2 = E(X^2) - \mu^2\ . \end{eqnarray*} [[/math]]

Using Theorem, we can compute the variance of the outcome of a roll of a die by first computing

[[math]] \begin{eqnarray*} E(X^2) & = & 1\Bigl(\frac 16\Bigr) + 4\Bigl(\frac 16\Bigr) + 9\Bigl(\frac 16\Bigr) + 16\Bigl(\frac 16\Bigr) + 25\Bigl(\frac 16\Bigr) + 36\Bigl(\frac 16\Bigr) \\ & = &\frac {91}6\ , \end{eqnarray*} [[/math]]

and,

[[math]] V(X) = E(X^2) - \mu^2 = \frac {91}{6} - \Bigl(\frac 72\Bigr)^2 = \frac {35}{12}\ , [[/math]]

in agreement with the value obtained directly from the definition of [math]V(X)[/math].


Properties of Variance

The variance has properties very different from those of the expectation. If [math]c[/math] is any constant, [math]E(cX) = cE(X)[/math] and [math]E(X + c) = E(X) + c[/math]. These two statements imply that the expectation is a linear function. However, the variance is not linear, as seen in the next theorem.

Theorem

If [math]X[/math] is any random variable and [math]c[/math] is any constant, then

[[math]] V(cX) = c^2 V(X) [[/math]]
and

[[math]] V(X + c) = V(X)\ . [[/math]]
\n

Show Proof

Let [math]\mu = E(X)[/math]. Then [math]E(cX) = c\mu[/math], and

[[math]] \begin{eqnarray*} V(cX) &=& E((cX - c\mu)^2) = E(c^2(X - \mu)^2) \\ &=& c^2 E((X - \mu)^2) = c^2 V(X)\ . \end{eqnarray*} [[/math]]


To prove the second assertion, we note that, to compute [math]V(X + c)[/math], we would replace [math]x[/math] by [math]x + c[/math] and [math]\mu[/math] by [math]\mu + c[/math] in Equation. Then the [math]c[/math]'s would cancel, leaving [math]V(X)[/math].

We turn now to some general properties of the variance. Recall that if [math]X[/math] and [math]Y[/math] are any two random variables, [math]E(X + Y) = E(X) + E(Y)[/math]. This is not always true for the case of the variance. For example, let [math]X[/math] be a random variable with [math]V(X) \ne 0[/math], and define [math]Y = -X[/math]. Then [math]V(X) = V(Y)[/math], so that [math]V(X) + V(Y) = 2V(X)[/math]. But [math]X + Y[/math] is always 0 and hence has variance 0. Thus [math]V(X + Y) \ne V(X) + V(Y)[/math].


In the important case of mutually independent random variables, however, the variance of the sum is the sum of the variances.

Theorem

Let [math]X[/math] and [math]Y[/math] be two independent random variables. Then

[[math]] V(X + Y) = V(X) + V(Y)\ . [[/math]]
\n

Show Proof

Let [math]E(X) = a[/math] and [math]E(Y) = b[/math]. Then

[[math]] \begin{eqnarray*} V(X + Y) & = & E((X + Y)^2) - (a + b)^2 \\ & = & E(X^2) + 2E(XY) + E(Y^2) - a^2 - 2ab - b^2\ . \end{eqnarray*} [[/math]]
Since [math]X[/math] and [math]Y[/math] are independent, [math]E(XY) = E(X)E(Y) = ab[/math]. Thus,

[[math]] V(X + Y) = E(X^2) - a^2 + E(Y^2) - b^2 = V(X) + V(Y)\ . [[/math]]

It is easy to extend this proof, by mathematical induction, to show that the variance of the sum of any number of mutually independent random variables is the sum of the individual variances. Thus we have the following theorem.

Theorem

Let [math]X_1[/math], [math]X_2[/math], \dots, [math]X_n[/math] be an independent trials process with [math]E(X_j) = \mu[/math] and [math]V(X_j) = \sigma^2[/math]. Let

[[math]] S_n = X_1 + X_2 +\cdots+ X_n [[/math]]
be the sum, and

[[math]] A_n = \frac {S_n}n [[/math]]
be the average. Then

[[math]] \begin{eqnarray*} E(S_n) &=& n\mu\ , \\ V(S_n) &=& n\sigma^2\ , \\ \sigma(S_n) &=& \sigma \sqrt{n}\ , \\ E(A_n) &=& \mu\ , \\ V(A_n) &=& \frac {\sigma^2}\ , \\ \sigma(A_n) &=& \frac{\sigma}{\sqrt n}\ . \end{eqnarray*} [[/math]]
\n

Show Proof

Since all the random variables [math]X_j[/math] have the same expected value, we have

[[math]] E(S_n) = E(X_1) +\cdots+ E(X_n) = n\mu\ , [[/math]]


[[math]] V(S_n) = V(X_1) +\cdots+ V(X_n) = n\sigma^2\ , [[/math]]
and

[[math]] \sigma(S_n) = \sigma \sqrt{n}\ . [[/math]]

We have seen that, if we multiply a random variable [math]X[/math] with mean [math]\mu[/math] and variance [math]\sigma^2[/math] by a constant [math]c[/math], the new random variable has expected value [math]c\mu[/math] and variance [math]c^2\sigma^2[/math]. Thus,

[[math]] E(A_n) = E\left(\frac {S_n}n \right) = \frac {n\mu}n = \mu\ , [[/math]]
and

[[math]] V(A_n) = V\left( \frac {S_n}n \right) = \frac {V(S_n)}{n^2} = \frac {n\sigma^2}{n^2} = \frac {\sigma^2}n\ . [[/math]]

Finally, the standard deviation of [math]A_n[/math] is given by

[[math]] \sigma(A_n) = \frac {\sigma}{\sqrt n}\ . [[/math]]

The last equation in the above theorem implies that in an independent trials process, if the individual summands have finite variance, then the standard deviation of the average goes to 0 as [math]n \rightarrow \infty[/math]. Since the standard deviation tells us something about the spread of the distribution around the mean, we see that for large values of [math]n[/math], the value of [math]A_n[/math] is usually very close to the mean of [math]A_n[/math], which equals [math]\mu[/math], as shown above. This statement is made precise in Chapter \ref{chp 8}, where it is called the Law of Large Numbers. For example, let [math]X[/math] represent the roll of a fair die. In Figure \ref{fig 6.4.5}, we show the distribution of a random variable [math]A_n[/math] corresponding to [math]X[/math], for [math]n = 10[/math] and [math]n = 100[/math].

Example [math]X_j[/math] is the outcome if the [math]j[/math]th roll, then [math]E(X_j) = 7/2[/math] and [math]V(X_j) = 35/12[/math]. Thus, if [math]S_n[/math] is the sum of the outcomes, and [math]A_n = S_n/n[/math] is the average of the outcomes, we have [math]E(A_n) = 7/2[/math] and [math]V(A_n) = (35/12)/n[/math]. Therefore, as [math]n[/math] increases, the expected value of the average remains constant, but the variance tends to 0. If the variance is a measure of the expected deviation from the mean this would indicate that, for large [math]n[/math], we can expect the average to be very near the expected value. This is in fact the case, and we shall justify it in Chapter \ref{chp 8}.


Bernoulli Trials

Consider next the general Bernoulli trials process. As usual, we let [math]X_j = 1[/math] if the [math]j[/math]th outcome is a success and 0 if it is a failure. If [math]p[/math] is the probability of a success, and [math]q = 1 - p[/math], then

[[math]] \begin{eqnarray*} E(X_j) & = & 0q + 1p = p\ , \\ E(X_j^2) & = & 0^2q + 1^2p = p\ , \end{eqnarray*} [[/math]]

and

[[math]] V(X_j) = E(X_j^2) - (E(X_j))^2 = p - p^2 = pq\ . [[/math]]

Thus, for Bernoulli trials, if [math]S_n = X_1 + X_2 +\cdots+ X_n[/math] is the number of successes, then [math]E(S_n) = np[/math], [math]V(S_n) = npq[/math], and [math]D(S_n) = \sqrt{npq}.[/math] If [math]A_n = S_n/n[/math] is the average number of successes, then [math]E(A_n) = p[/math], [math]V(A_n) = pq/n[/math], and [math]D(A_n) = \sqrt{pq/n}[/math]. We see that the expected proportion of successes remains [math]p[/math] and the variance tends to 0. This suggests that the frequency interpretation of probability is a correct one. We shall make this more precise in Chapter \ref{chp 8}. Example success in a Bernoulli trials process. Then [math]T[/math] is geometrically distributed. What is the variance of [math]T[/math]? In Example, we saw that

[[math]] m_T = \pmatrix{1 & 2 & 3 & \cdots \cr p & qp & q^2p & \cdots \cr}. [[/math]]

In Example, we showed that

[[math]] E(T) = 1/p\ . [[/math]]

Thus,

[[math]] V(T) = E(T^2) - 1/p^2\ , [[/math]]

so we need only find

[[math]] \begin{eqnarray*} E(T^2) & = & 1p + 4qp + 9q^2p + \cdots \\ & = & p(1 + 4q + 9q^2 + \cdots )\ . \end{eqnarray*} [[/math]]

To evaluate this sum, we start again with

[[math]] 1 + x + x^2 +\cdots= \frac 1{1 - x}\ . [[/math]]

Differentiating, we obtain

[[math]] 1 + 2x + 3x^2 +\cdots= \frac 1{(1 - x)^2}\ . [[/math]]

Multiplying by [math]x[/math],

[[math]] x + 2x^2 + 3x^3 +\cdots= \frac x{(1 - x)^2}\ . [[/math]]

Differentiating again gives

[[math]] 1 + 4x + 9x^2 +\cdots= \frac {1 + x}{(1 - x)^3}\ . [[/math]]

Thus,

[[math]] E(T^2) = p\frac {1 + q}{(1 - q)^3} = \frac {1 + q}{p^2} [[/math]]

and

[[math]] \begin{eqnarray*} V(T) & = & E(T^2) - (E(T))^2 \\ & = & \frac {1 + q}{p^2} - \frac 1{p^2} = \frac q{p^2}\ . \end{eqnarray*} [[/math]]


For example, the variance for the number of tosses of a coin until the first head turns up is [math](1/2)/(1/2)^2 = 2[/math]. The variance for the number of rolls of a die until the first six turns up is [math](5/6)/(1/6)^2 = 30[/math]. Note that, as [math]p[/math] decreases, the variance increases rapidly. This corresponds to the increased spread of the geometric distribution as [math]p[/math] decreases (noted in Figure \ref{fig 5.4}).


Poisson Distribution

Just as in the case of expected values, it is easy to guess the variance of the Poisson distribution with parameter [math]\lambda[/math]. We recall that the variance of a binomial distribution with parameters [math]n[/math] and [math]p[/math] equals [math]npq[/math]. We also recall that the Poisson distribution could be obtained as a limit of binomial distributions, if [math]n[/math] goes to [math]\infty[/math] and [math]p[/math] goes to 0 in such a way that their product is kept fixed at the value [math]\lambda[/math]. In this case, [math]npq = \lambda q[/math] approaches [math]\lambda[/math], since [math]q[/math] goes to 1. So, given a Poisson distribution with parameter [math]\lambda[/math], we should guess that its variance is [math]\lambda[/math]. The reader is asked to show this in Exercise Exercise. \exercises


\choice{}{==General references== Doyle, Peter G. (2006). "Grinstead and Snell's Introduction to Probability" (PDF). Retrieved June 6, 2024.