guide:C631488f9a: Difference between revisions

Latest revision as of 01:48, 11 June 2024

The usefulness of the expected value as a prediction for the outcome of an experiment is increased when the outcome is not likely to deviate too much from the expected value. In this section we shall introduce a measure of this deviation, called the variance.

Variance

Definition

Let [math]X[/math] be a numerically valued random variable with expected value [math]\mu = E(X)[/math]. Then the variance of [math]X[/math], denoted by [math]V(X)[/math], is

[[math]] V(X) = E((X - \mu)^2)\ . [[/math]]

Note that, by Theorem, [math]V(X)[/math] is given by

[[math]] \begin{equation} V(X) = \sum_x (x - \mu)^2 m(x)\ , \label{eq 6.1} \end{equation} [[/math]]

where [math]m[/math] is the distribution function of [math]X[/math].

Standard Deviation

The standard deviation of [math]X[/math], denoted by [math]D(X)[/math], is [math]D(X) = \sqrt {V(X)}[/math]. We often write [math]\sigma[/math] for [math]D(X)[/math] and [math]\sigma^2[/math] for [math]V(X)[/math].

Example Consider one roll of a die. Let [math]X[/math] be the number that turns up. To find [math]V(X)[/math], we must first find the expected value of [math]X[/math]. This is

[[math]] \begin{eqnarray*} \mu & = & E(X) = 1\Bigl(\frac 16\Bigr) + 2\Bigl(\frac 16\Bigr) + 3\Bigl(\frac 16\Bigr) + 4\Bigl(\frac 16\Bigr) + 5\Bigl(\frac 16\Bigr) + 6\Bigl(\frac 16\Bigr) \\ & = & \frac 72\ . \end{eqnarray*} [[/math]]

To find the variance of [math]X[/math], we form the new random variable [math](X - \mu)^2[/math] and compute its expectation. We can easily do this using the following table.

Variance calculation.

[math]x[/math]	[math]m(x)[/math]	[math](x - 7/2)^2[/math]
1	1/6	25/4
2	1/6	9/4
3	1/6	1/4
4	1/6	1/4
5	1/6	9/4
6	1/6	25/4

From this table we find [math]E((X - \mu)^2)[/math] is

[[math]] \begin{eqnarray*} V(X) & = & \frac 16 \left( \frac {25}4 + \frac 94 + \frac 14 + \frac 14 + \frac 94 + \frac {25}4 \right) \\ & = &\frac {35}{12}\ , \end{eqnarray*} [[/math]]

and the standard deviation [math]D(X) = \sqrt{35/12} \approx 1.707[/math].

Calculation of Variance

We next prove a theorem that gives us a useful alternative form for computing the variance.

Theorem

If [math]X[/math] is any random variable with [math]E(X) = \mu[/math], then

[[math]] V(X) = E(X^2) - \mu^2\ . [[/math]]

Show Proof

We have

[[math]] \begin{eqnarray*} V(X) & = & E((X - \mu)^2) = E(X^2 - 2\mu X + \mu^2) \\ & = & E(X^2) - 2\mu E(X) + \mu^2 = E(X^2) - \mu^2\ . \end{eqnarray*} [[/math]]

■

Using Theorem, we can compute the variance of the outcome of a roll of a die by first computing

[[math]] \begin{eqnarray*} E(X^2) & = & 1\Bigl(\frac 16\Bigr) + 4\Bigl(\frac 16\Bigr) + 9\Bigl(\frac 16\Bigr) + 16\Bigl(\frac 16\Bigr) + 25\Bigl(\frac 16\Bigr) + 36\Bigl(\frac 16\Bigr) \\ & = &\frac {91}6\ , \end{eqnarray*} [[/math]]

and,

[[math]] V(X) = E(X^2) - \mu^2 = \frac {91}{6} - \Bigl(\frac 72\Bigr)^2 = \frac {35}{12}\ , [[/math]]

in agreement with the value obtained directly from the definition of [math]V(X)[/math].

Properties of Variance

The variance has properties very different from those of the expectation. If [math]c[/math] is any constant, [math]E(cX) = cE(X)[/math] and [math]E(X + c) = E(X) + c[/math]. These two statements imply that the expectation is a linear function. However, the variance is not linear, as seen in the next theorem.

Theorem

If [math]X[/math] is any random variable and [math]c[/math] is any constant, then

[[math]] V(cX) = c^2 V(X) [[/math]]

and

[[math]] V(X + c) = V(X)\ . [[/math]]

Show Proof

Let [math]\mu = E(X)[/math]. Then [math]E(cX) = c\mu[/math], and

[[math]] \begin{eqnarray*} V(cX) &=& E((cX - c\mu)^2) = E(c^2(X - \mu)^2) \\ &=& c^2 E((X - \mu)^2) = c^2 V(X)\ . \end{eqnarray*} [[/math]]

To prove the second assertion, we note that, to compute [math]V(X + c)[/math], we would replace [math]x[/math] by [math]x + c[/math] and [math]\mu[/math] by [math]\mu + c[/math] in Equation. Then the [math]c[/math]'s would cancel, leaving [math]V(X)[/math].

■

We turn now to some general properties of the variance. Recall that if [math]X[/math] and [math]Y[/math] are any two random variables, [math]E(X + Y) = E(X) + E(Y)[/math]. This is not always true for the case of the variance. For example, let [math]X[/math] be a random variable with [math]V(X) \ne 0[/math], and define [math]Y = -X[/math]. Then [math]V(X) = V(Y)[/math], so that [math]V(X) + V(Y) = 2V(X)[/math]. But [math]X + Y[/math] is always 0 and hence has variance 0. Thus [math]V(X + Y) \ne V(X) + V(Y)[/math].

In the important case of mutually independent random variables, however, the variance of the sum is the sum of the variances.

Theorem

Let [math]X[/math] and [math]Y[/math] be two independent random variables. Then

[[math]] V(X + Y) = V(X) + V(Y)\ . [[/math]]

Show Proof

Let [math]E(X) = a[/math] and [math]E(Y) = b[/math]. Then

[[math]] \begin{eqnarray*} V(X + Y) & = & E((X + Y)^2) - (a + b)^2 \\ & = & E(X^2) + 2E(XY) + E(Y^2) - a^2 - 2ab - b^2\ . \end{eqnarray*} [[/math]]

Since [math]X[/math] and [math]Y[/math] are independent, [math]E(XY) = E(X)E(Y) = ab[/math]. Thus,

[[math]] V(X + Y) = E(X^2) - a^2 + E(Y^2) - b^2 = V(X) + V(Y)\ . [[/math]]

■

It is easy to extend this proof, by mathematical induction, to show that the variance of the sum of any number of mutually independent random variables is the sum of the individual variances. Thus we have the following theorem.

Theorem

Let [math]X_1[/math], [math]X_2[/math], \dots, [math]X_n[/math] be an independent trials process with [math]E(X_j) = \mu[/math] and [math]V(X_j) = \sigma^2[/math]. Let

[[math]] S_n = X_1 + X_2 +\cdots+ X_n [[/math]]

be the sum, and

[[math]] A_n = \frac {S_n}n [[/math]]

be the average. Then

[[math]] \begin{eqnarray*} E(S_n) &=& n\mu\ , \\ V(S_n) &=& n\sigma^2\ , \\ \sigma(S_n) &=& \sigma \sqrt{n}\ , \\ E(A_n) &=& \mu\ , \\ V(A_n) &=& \frac {\sigma^2}\ , \\ \sigma(A_n) &=& \frac{\sigma}{\sqrt n}\ . \end{eqnarray*} [[/math]]

Show Proof

Since all the random variables [math]X_j[/math] have the same expected value, we have

[[math]] E(S_n) = E(X_1) +\cdots+ E(X_n) = n\mu\ , [[/math]]

[[math]] V(S_n) = V(X_1) +\cdots+ V(X_n) = n\sigma^2\ , [[/math]]

and

[[math]] \sigma(S_n) = \sigma \sqrt{n}\ . [[/math]]

We have seen that, if we multiply a random variable [math]X[/math] with mean [math]\mu[/math] and variance [math]\sigma^2[/math] by a constant [math]c[/math], the new random variable has expected value [math]c\mu[/math] and variance [math]c^2\sigma^2[/math]. Thus,

[[math]] E(A_n) = E\left(\frac {S_n}n \right) = \frac {n\mu}n = \mu\ , [[/math]]

and

[[math]] V(A_n) = V\left( \frac {S_n}n \right) = \frac {V(S_n)}{n^2} = \frac {n\sigma^2}{n^2} = \frac {\sigma^2}n\ . [[/math]]

Finally, the standard deviation of [math]A_n[/math] is given by

[[math]] \sigma(A_n) = \frac {\sigma}{\sqrt n}\ . [[/math]]

■

The last equation in the above theorem implies that in an independent trials process, if the individual summands have finite variance, then the standard deviation of the average goes to 0 as [math]n \rightarrow \infty[/math]. Since the standard deviation tells us something about the spread of the distribution around the mean, we see that for large values of [math]n[/math], the value of [math]A_n[/math] is usually very close to the mean of [math]A_n[/math], which equals [math]\mu[/math], as shown above. This statement is made precise in Chapter Law of Large Numbers, where it is called the Law of Large Numbers. For example, let [math]X[/math] represent the roll of a fair die. In Figure, we show the distribution of a random variable [math]A_n[/math] corresponding to [math]X[/math], for [math]n = 10[/math] and [math]n = 100[/math].

Empirical distribution of [math]A_n[/math].

Example Consider [math]n[/math] rolls of a die. We have seen that, if [math]X_j[/math] is the outcome if the [math]j[/math]th roll, then [math]E(X_j) = 7/2[/math] and [math]V(X_j) = 35/12[/math]. Thus, if [math]S_n[/math] is the sum of the outcomes, and [math]A_n = S_n/n[/math] is the average of the outcomes, we have [math]E(A_n) = 7/2[/math] and [math]V(A_n) = (35/12)/n[/math]. Therefore, as [math]n[/math] increases, the expected value of the average remains constant, but the variance tends to 0. If the variance is a measure of the expected deviation from the mean this would indicate that, for large [math]n[/math], we can expect the average to be very near the expected value. This is in fact the case, and we shall justify it in Chapter \ref{chp 8}.

Bernoulli Trials

Consider next the general Bernoulli trials process. As usual, we let [math]X_j = 1[/math] if the [math]j[/math]th outcome is a success and 0 if it is a failure. If [math]p[/math] is the probability of a success, and [math]q = 1 - p[/math], then

[[math]] \begin{eqnarray*} E(X_j) & = & 0q + 1p = p\ , \\ E(X_j^2) & = & 0^2q + 1^2p = p\ , \end{eqnarray*} [[/math]]

and

[[math]] V(X_j) = E(X_j^2) - (E(X_j))^2 = p - p^2 = pq\ . [[/math]]

Thus, for Bernoulli trials, if [math]S_n = X_1 + X_2 +\cdots+ X_n[/math] is the number of successes, then [math]E(S_n) = np[/math], [math]V(S_n) = npq[/math], and [math]D(S_n) = \sqrt{npq}.[/math] If [math]A_n = S_n/n[/math] is the average number of successes, then [math]E(A_n) = p[/math], [math]V(A_n) = pq/n[/math], and [math]D(A_n) = \sqrt{pq/n}[/math]. We see that the expected proportion of successes remains [math]p[/math] and the variance tends to 0. This suggests that the frequency interpretation of probability is a correct one. We shall make this more precise in Chapter \ref{chp 8}.

Example Let [math]T[/math] denote the number of trials until the first success in a Bernoulli trials process. Then [math]T[/math] is geometrically distributed. What is the variance of [math]T[/math]? In Example, we saw that

[[math]] m_T = \pmatrix{1 & 2 & 3 & \cdots \cr p & qp & q^2p & \cdots \cr}. [[/math]]

In Example, we showed that

[[math]] E(T) = 1/p\ . [[/math]]

Thus,

[[math]] V(T) = E(T^2) - 1/p^2\ , [[/math]]

so we need only find

[[math]] \begin{eqnarray*} E(T^2) & = & 1p + 4qp + 9q^2p + \cdots \\ & = & p(1 + 4q + 9q^2 + \cdots )\ . \end{eqnarray*} [[/math]]

To evaluate this sum, we start again with

[[math]] 1 + x + x^2 +\cdots= \frac 1{1 - x}\ . [[/math]]

Differentiating, we obtain

[[math]] 1 + 2x + 3x^2 +\cdots= \frac 1{(1 - x)^2}\ . [[/math]]

Multiplying by [math]x[/math],

[[math]] x + 2x^2 + 3x^3 +\cdots= \frac x{(1 - x)^2}\ . [[/math]]

Differentiating again gives

[[math]] 1 + 4x + 9x^2 +\cdots= \frac {1 + x}{(1 - x)^3}\ . [[/math]]

Thus,

[[math]] E(T^2) = p\frac {1 + q}{(1 - q)^3} = \frac {1 + q}{p^2} [[/math]]

and

[[math]] \begin{eqnarray*} V(T) & = & E(T^2) - (E(T))^2 \\ & = & \frac {1 + q}{p^2} - \frac 1{p^2} = \frac q{p^2}\ . \end{eqnarray*} [[/math]]

For example, the variance for the number of tosses of a coin until the first head turns up is [math](1/2)/(1/2)^2 = 2[/math]. The variance for the number of rolls of a die until the first six turns up is [math](5/6)/(1/6)^2 = 30[/math]. Note that, as [math]p[/math] decreases, the variance increases rapidly. This corresponds to the increased spread of the geometric distribution as [math]p[/math] decreases (noted in Figure).

Poisson Distribution

Just as in the case of expected values, it is easy to guess the variance of the Poisson distribution with parameter [math]\lambda[/math]. We recall that the variance of a binomial distribution with parameters [math]n[/math] and [math]p[/math] equals [math]npq[/math]. We also recall that the Poisson distribution could be obtained as a limit of binomial distributions, if [math]n[/math] goes to [math]\infty[/math] and [math]p[/math] goes to 0 in such a way that their product is kept fixed at the value [math]\lambda[/math]. In this case, [math]npq = \lambda q[/math] approaches [math]\lambda[/math], since [math]q[/math] goes to 1. So, given a Poisson distribution with parameter [math]\lambda[/math], we should guess that its variance is [math]\lambda[/math]. The reader is asked to show this in Exercise.

General references

Doyle, Peter G. (2006). "Grinstead and Snell's Introduction to Probability" (PDF). Retrieved June 6, 2024.

@@ Line 6: / Line 6: @@
 \newcommand{\NA}{{\rm NA}}
 \newcommand{\mathds}{\mathbb}</math></div>
-label{sec 6.2}
 The usefulness of the expected value as a prediction for the outcome of an experiment
 is increased when the outcome is not likely to deviate too much from the expected
@@ Line 13: / Line 13: @@
 ===Variance===
-{{defncard|label=|id=def 6.4| Let <math>X</math> be a numerically valued random variable
+{{defncard|label=|id=def 6.4|Let <math>X</math> be a numerically valued random variable
 with expected value <math>\mu = E(X)</math>.  Then the  ''variance'' of <math>X</math>, denoted by
 <math>V(X)</math>, is
@@ Line 35: / Line 35: @@
 <span id="exam 6.13"/>
-'''Example'''
+'''Example'''  Consider one roll of a die.  Let <math>X</math> be the number
 that turns up.  To find
 <math>V(X)</math>, we must first find the expected value of <math>X</math>.  This is
@@ Line 46: / Line 46: @@
 \end{eqnarray*}
 </math>
 To find the variance of <math>X</math>, we form the new random variable <math>(X - \mu)^2</math> and
 compute its expectation.  We can easily do this using the following table.
 <span id="table 6.6"/>
 {|class="table"
@@ Line 56: / Line 56: @@
 |||||
 |-
-|$x$ || <math>m(x)</math> || <math>(x - 7/2)^2</math>
+|<math>x</math> || <math>m(x)</math> || <math>(x - 7/2)^2</math>
 |-
 |1 || 1/6 ||       25/4
@@ Line 70: / Line 70: @@
 |6 || 1/6 ||      25/4
 |}
 From this table we find <math>E((X - \mu)^2)</math> is
@@ Line 88: / Line 89: @@
 <math display="block">
 V(X) = E(X^2) - \mu^2\ .
-</math>\n|We have
+</math>|We have
 <math display="block">
@@ Line 131: / Line 132: @@
 <math display="block">
 V(X + c) = V(X)\ .
-</math>\n|Let <math>\mu = E(X)</math>.  Then <math>E(cX) = c\mu</math>, and
+</math>|Let <math>\mu = E(X)</math>.  Then <math>E(cX) = c\mu</math>, and
 <math display="block">
@@ Line 158: / Line 159: @@
 <math display="block">
 V(X + Y) = V(X) + V(Y)\ .
-</math>\n|Let <math>E(X) = a</math> and <math>E(Y) = b</math>.  Then
+</math>|Let <math>E(X) = a</math> and <math>E(Y) = b</math>.  Then
 <math display="block">
@@ Line 195: / Line 196: @@
 \sigma(A_n) &=& \frac{\sigma}{\sqrt n}\ .
 \end{eqnarray*}
-</math>\n|Since all the random variables <math>X_j</math> have the same expected value, we have
+</math>|Since all the random variables <math>X_j</math> have the same expected value, we have
 <math display="block">
@@ Line 235: / Line 236: @@
 something about the spread of the distribution around the mean, we see that for large
 values of <math>n</math>, the value of <math>A_n</math>  is usually very close to the mean of <math>A_n</math>, which
-equals <math>\mu</math>, as shown above.  This statement is made precise in Chapter \ref{chp 8},
+equals <math>\mu</math>, as shown above.  This statement is made precise in Chapter [[guide:1cf65e65b3|Law of Large Numbers]],
 where it is called the Law of Large Numbers.  For example, let <math>X</math> represent the roll
-of a fair die.  In Figure \ref{fig 6.4.5}, we show the distribution of a random variable
+of a fair die.  In [[#fig 6.4.5|Figure]], we show the distribution of a random variable
 <math>A_n</math> corresponding to <math>X</math>, for <math>n = 10</math> and <math>n = 100</math>.
-<div id="PSfig6-4-5" class="d-flex justify-content-center">
-[[File:guide_e6d15_PSfig6-4-5.ps | 400px | thumb |  ]]
+<div id="fig 6.4.5" class="d-flex justify-content-center">
+[[File:guide_e6d15_PSfig6-4-5.png | 600px | thumb | Empirical distribution of <math>A_n</math>. ]]
 </div>
 <span id="exam 6.14"/>
 '''Example'''
-<math>X_j</math> is the outcome if the
+Consider <math>n</math> rolls of a die.  We have seen that, if <math>X_j</math> is the outcome if the
 <math>j</math>th roll, then <math>E(X_j) = 7/2</math> and <math>V(X_j) = 35/12</math>.  Thus, if <math>S_n</math> is the sum of
 the outcomes, and <math>A_n = S_n/n</math> is the average of the outcomes, we have
@@ Line 252: / Line 255: @@
 large <math>n</math>, we can expect the average to be very near the expected value.  This is in
 fact the case, and we shall justify it in Chapter \ref{chp 8}.
 ===Bernoulli Trials===
@@ Line 275: / Line 277: @@
 <math>p</math> and the variance tends to 0.  This suggests that the frequency interpretation of
 probability is a correct one.  We shall make this more precise in Chapter \ref{chp 8}.
 <span id="exam 6.15"/>
 '''Example'''
+Let <math>T</math> denote the number of trials until the first
 success in a Bernoulli trials process.  Then <math>T</math> is geometrically distributed.  What
 is the variance of <math>T</math>?  In [[guide:448d2aa013#exam 5.7 |Example]], we saw that
@@ Line 340: / Line 344: @@
 the first six turns up is <math>(5/6)/(1/6)^2 = 30</math>.  Note that, as <math>p</math> decreases, the
 variance increases rapidly.  This corresponds to the increased spread of the
-geometric distribution as <math>p</math> decreases (noted in Figure \ref{fig 5.4}).
+geometric distribution as <math>p</math> decreases (noted in [[#fig 5.4|Figure]]).
@@ Line 351: / Line 355: @@
 the value <math>\lambda</math>.  In this case, <math>npq = \lambda q</math> approaches <math>\lambda</math>, since
 <math>q</math> goes to 1.  So, given a Poisson distribution with parameter <math>\lambda</math>, we should
-guess that its variance is <math>\lambda</math>.  The reader is asked to show this in
+guess that its variance is <math>\lambda</math>.  The reader is asked to show this in [[exercise:497ffe72e6 |Exercise]].
-Exercise [[exercise:497ffe72e6 |Exercise]].
-\exercises
-\choice{}{==General references==
+==General references==
 {{cite web |url=https://math.dartmouth.edu/~prob/prob/prob.pdf |title=Grinstead and Snell’s Introduction to Probability |last=Doyle |first=Peter G.|date=2006 |access-date=June 6, 2024}}