Variance of Discrete Random Variables

The usefulness of the expected value as a prediction for the outcome of an experiment is increased when the outcome is not likely to deviate too much from the expected value. In this section we shall introduce a measure of this deviation, called the variance.

Variance

Definition

Let [math]X[/math] be a numerically valued random variable with expected value [math]\mu = E(X)[/math]. Then the variance of [math]X[/math], denoted by [math]V(X)[/math], is

[[math]] V(X) = E((X - \mu)^2)\ . [[/math]]

Note that, by Theorem, [math]V(X)[/math] is given by

[[math]] \begin{equation} V(X) = \sum_x (x - \mu)^2 m(x)\ , \label{eq 6.1} \end{equation} [[/math]]

where [math]m[/math] is the distribution function of [math]X[/math].

Standard Deviation

The standard deviation of [math]X[/math], denoted by [math]D(X)[/math], is [math]D(X) = \sqrt {V(X)}[/math]. We often write [math]\sigma[/math] for [math]D(X)[/math] and [math]\sigma^2[/math] for [math]V(X)[/math].

Example Consider one roll of a die. Let [math]X[/math] be the number that turns up. To find [math]V(X)[/math], we must first find the expected value of [math]X[/math]. This is

[[math]] \begin{eqnarray*} \mu & = & E(X) = 1\Bigl(\frac 16\Bigr) + 2\Bigl(\frac 16\Bigr) + 3\Bigl(\frac 16\Bigr) + 4\Bigl(\frac 16\Bigr) + 5\Bigl(\frac 16\Bigr) + 6\Bigl(\frac 16\Bigr) \\ & = & \frac 72\ . \end{eqnarray*} [[/math]]

To find the variance of [math]X[/math], we form the new random variable [math](X - \mu)^2[/math] and compute its expectation. We can easily do this using the following table.

Variance calculation.

[math]x[/math]	[math]m(x)[/math]	[math](x - 7/2)^2[/math]
1	1/6	25/4
2	1/6	9/4
3	1/6	1/4
4	1/6	1/4
5	1/6	9/4
6	1/6	25/4

From this table we find [math]E((X - \mu)^2)[/math] is

[[math]] \begin{eqnarray*} V(X) & = & \frac 16 \left( \frac {25}4 + \frac 94 + \frac 14 + \frac 14 + \frac 94 + \frac {25}4 \right) \\ & = &\frac {35}{12}\ , \end{eqnarray*} [[/math]]

and the standard deviation [math]D(X) = \sqrt{35/12} \approx 1.707[/math].

Calculation of Variance

We next prove a theorem that gives us a useful alternative form for computing the variance.

Theorem

If [math]X[/math] is any random variable with [math]E(X) = \mu[/math], then

[[math]] V(X) = E(X^2) - \mu^2\ . [[/math]]

Show Proof

We have

[[math]] \begin{eqnarray*} V(X) & = & E((X - \mu)^2) = E(X^2 - 2\mu X + \mu^2) \\ & = & E(X^2) - 2\mu E(X) + \mu^2 = E(X^2) - \mu^2\ . \end{eqnarray*} [[/math]]

■

Using Theorem, we can compute the variance of the outcome of a roll of a die by first computing

[[math]] \begin{eqnarray*} E(X^2) & = & 1\Bigl(\frac 16\Bigr) + 4\Bigl(\frac 16\Bigr) + 9\Bigl(\frac 16\Bigr) + 16\Bigl(\frac 16\Bigr) + 25\Bigl(\frac 16\Bigr) + 36\Bigl(\frac 16\Bigr) \\ & = &\frac {91}6\ , \end{eqnarray*} [[/math]]

and,

[[math]] V(X) = E(X^2) - \mu^2 = \frac {91}{6} - \Bigl(\frac 72\Bigr)^2 = \frac {35}{12}\ , [[/math]]

in agreement with the value obtained directly from the definition of [math]V(X)[/math].

Properties of Variance

The variance has properties very different from those of the expectation. If [math]c[/math] is any constant, [math]E(cX) = cE(X)[/math] and [math]E(X + c) = E(X) + c[/math]. These two statements imply that the expectation is a linear function. However, the variance is not linear, as seen in the next theorem.

Theorem

If [math]X[/math] is any random variable and [math]c[/math] is any constant, then

[[math]] V(cX) = c^2 V(X) [[/math]]

and

[[math]] V(X + c) = V(X)\ . [[/math]]

Show Proof

Let [math]\mu = E(X)[/math]. Then [math]E(cX) = c\mu[/math], and

[[math]] \begin{eqnarray*} V(cX) &=& E((cX - c\mu)^2) = E(c^2(X - \mu)^2) \\ &=& c^2 E((X - \mu)^2) = c^2 V(X)\ . \end{eqnarray*} [[/math]]

To prove the second assertion, we note that, to compute [math]V(X + c)[/math], we would replace [math]x[/math] by [math]x + c[/math] and [math]\mu[/math] by [math]\mu + c[/math] in Equation. Then the [math]c[/math]'s would cancel, leaving [math]V(X)[/math].

■

We turn now to some general properties of the variance. Recall that if [math]X[/math] and [math]Y[/math] are any two random variables, [math]E(X + Y) = E(X) + E(Y)[/math]. This is not always true for the case of the variance. For example, let [math]X[/math] be a random variable with [math]V(X) \ne 0[/math], and define [math]Y = -X[/math]. Then [math]V(X) = V(Y)[/math], so that [math]V(X) + V(Y) = 2V(X)[/math]. But [math]X + Y[/math] is always 0 and hence has variance 0. Thus [math]V(X + Y) \ne V(X) + V(Y)[/math].

In the important case of mutually independent random variables, however, the variance of the sum is the sum of the variances.

Theorem

Let [math]X[/math] and [math]Y[/math] be two independent random variables. Then

[[math]] V(X + Y) = V(X) + V(Y)\ . [[/math]]

Show Proof

Let [math]E(X) = a[/math] and [math]E(Y) = b[/math]. Then

[[math]] \begin{eqnarray*} V(X + Y) & = & E((X + Y)^2) - (a + b)^2 \\ & = & E(X^2) + 2E(XY) + E(Y^2) - a^2 - 2ab - b^2\ . \end{eqnarray*} [[/math]]

Since [math]X[/math] and [math]Y[/math] are independent, [math]E(XY) = E(X)E(Y) = ab[/math]. Thus,

[[math]] V(X + Y) = E(X^2) - a^2 + E(Y^2) - b^2 = V(X) + V(Y)\ . [[/math]]

■

It is easy to extend this proof, by mathematical induction, to show that the variance of the sum of any number of mutually independent random variables is the sum of the individual variances. Thus we have the following theorem.

Theorem

Let [math]X_1[/math], [math]X_2[/math], \dots, [math]X_n[/math] be an independent trials process with [math]E(X_j) = \mu[/math] and [math]V(X_j) = \sigma^2[/math]. Let

[[math]] S_n = X_1 + X_2 +\cdots+ X_n [[/math]]

be the sum, and

[[math]] A_n = \frac {S_n}n [[/math]]

be the average. Then

[[math]] \begin{eqnarray*} E(S_n) &=& n\mu\ , \\ V(S_n) &=& n\sigma^2\ , \\ \sigma(S_n) &=& \sigma \sqrt{n}\ , \\ E(A_n) &=& \mu\ , \\ V(A_n) &=& \frac {\sigma^2}\ , \\ \sigma(A_n) &=& \frac{\sigma}{\sqrt n}\ . \end{eqnarray*} [[/math]]

Show Proof

Since all the random variables [math]X_j[/math] have the same expected value, we have

[[math]] E(S_n) = E(X_1) +\cdots+ E(X_n) = n\mu\ , [[/math]]

[[math]] V(S_n) = V(X_1) +\cdots+ V(X_n) = n\sigma^2\ , [[/math]]

and

[[math]] \sigma(S_n) = \sigma \sqrt{n}\ . [[/math]]

We have seen that, if we multiply a random variable [math]X[/math] with mean [math]\mu[/math] and variance [math]\sigma^2[/math] by a constant [math]c[/math], the new random variable has expected value [math]c\mu[/math] and variance [math]c^2\sigma^2[/math]. Thus,

[[math]] E(A_n) = E\left(\frac {S_n}n \right) = \frac {n\mu}n = \mu\ , [[/math]]

and

[[math]] V(A_n) = V\left( \frac {S_n}n \right) = \frac {V(S_n)}{n^2} = \frac {n\sigma^2}{n^2} = \frac {\sigma^2}n\ . [[/math]]

Finally, the standard deviation of [math]A_n[/math] is given by

[[math]] \sigma(A_n) = \frac {\sigma}{\sqrt n}\ . [[/math]]

■

The last equation in the above theorem implies that in an independent trials process, if the individual summands have finite variance, then the standard deviation of the average goes to 0 as [math]n \rightarrow \infty[/math]. Since the standard deviation tells us something about the spread of the distribution around the mean, we see that for large values of [math]n[/math], the value of [math]A_n[/math] is usually very close to the mean of [math]A_n[/math], which equals [math]\mu[/math], as shown above. This statement is made precise in Chapter Law of Large Numbers, where it is called the Law of Large Numbers. For example, let [math]X[/math] represent the roll of a fair die. In Figure, we show the distribution of a random variable [math]A_n[/math] corresponding to [math]X[/math], for [math]n = 10[/math] and [math]n = 100[/math].

Empirical distribution of [math]A_n[/math].

Example Consider [math]n[/math] rolls of a die. We have seen that, if [math]X_j[/math] is the outcome if the [math]j[/math]th roll, then [math]E(X_j) = 7/2[/math] and [math]V(X_j) = 35/12[/math]. Thus, if [math]S_n[/math] is the sum of the outcomes, and [math]A_n = S_n/n[/math] is the average of the outcomes, we have [math]E(A_n) = 7/2[/math] and [math]V(A_n) = (35/12)/n[/math]. Therefore, as [math]n[/math] increases, the expected value of the average remains constant, but the variance tends to 0. If the variance is a measure of the expected deviation from the mean this would indicate that, for large [math]n[/math], we can expect the average to be very near the expected value. This is in fact the case, and we shall justify it in Chapter \ref{chp 8}.

Bernoulli Trials

Consider next the general Bernoulli trials process. As usual, we let [math]X_j = 1[/math] if the [math]j[/math]th outcome is a success and 0 if it is a failure. If [math]p[/math] is the probability of a success, and [math]q = 1 - p[/math], then

[[math]] \begin{eqnarray*} E(X_j) & = & 0q + 1p = p\ , \\ E(X_j^2) & = & 0^2q + 1^2p = p\ , \end{eqnarray*} [[/math]]

and

[[math]] V(X_j) = E(X_j^2) - (E(X_j))^2 = p - p^2 = pq\ . [[/math]]

Thus, for Bernoulli trials, if [math]S_n = X_1 + X_2 +\cdots+ X_n[/math] is the number of successes, then [math]E(S_n) = np[/math], [math]V(S_n) = npq[/math], and [math]D(S_n) = \sqrt{npq}.[/math] If [math]A_n = S_n/n[/math] is the average number of successes, then [math]E(A_n) = p[/math], [math]V(A_n) = pq/n[/math], and [math]D(A_n) = \sqrt{pq/n}[/math]. We see that the expected proportion of successes remains [math]p[/math] and the variance tends to 0. This suggests that the frequency interpretation of probability is a correct one. We shall make this more precise in Chapter \ref{chp 8}.

Example Let [math]T[/math] denote the number of trials until the first success in a Bernoulli trials process. Then [math]T[/math] is geometrically distributed. What is the variance of [math]T[/math]? In Example, we saw that

[[math]] m_T = \pmatrix{1 & 2 & 3 & \cdots \cr p & qp & q^2p & \cdots \cr}. [[/math]]

In Example, we showed that

[[math]] E(T) = 1/p\ . [[/math]]

Thus,

[[math]] V(T) = E(T^2) - 1/p^2\ , [[/math]]

so we need only find

[[math]] \begin{eqnarray*} E(T^2) & = & 1p + 4qp + 9q^2p + \cdots \\ & = & p(1 + 4q + 9q^2 + \cdots )\ . \end{eqnarray*} [[/math]]

To evaluate this sum, we start again with

[[math]] 1 + x + x^2 +\cdots= \frac 1{1 - x}\ . [[/math]]

Differentiating, we obtain

[[math]] 1 + 2x + 3x^2 +\cdots= \frac 1{(1 - x)^2}\ . [[/math]]

Multiplying by [math]x[/math],

[[math]] x + 2x^2 + 3x^3 +\cdots= \frac x{(1 - x)^2}\ . [[/math]]

Differentiating again gives

[[math]] 1 + 4x + 9x^2 +\cdots= \frac {1 + x}{(1 - x)^3}\ . [[/math]]

Thus,

[[math]] E(T^2) = p\frac {1 + q}{(1 - q)^3} = \frac {1 + q}{p^2} [[/math]]

and

[[math]] \begin{eqnarray*} V(T) & = & E(T^2) - (E(T))^2 \\ & = & \frac {1 + q}{p^2} - \frac 1{p^2} = \frac q{p^2}\ . \end{eqnarray*} [[/math]]

For example, the variance for the number of tosses of a coin until the first head turns up is [math](1/2)/(1/2)^2 = 2[/math]. The variance for the number of rolls of a die until the first six turns up is [math](5/6)/(1/6)^2 = 30[/math]. Note that, as [math]p[/math] decreases, the variance increases rapidly. This corresponds to the increased spread of the geometric distribution as [math]p[/math] decreases (noted in Figure).

Poisson Distribution

Just as in the case of expected values, it is easy to guess the variance of the Poisson distribution with parameter [math]\lambda[/math]. We recall that the variance of a binomial distribution with parameters [math]n[/math] and [math]p[/math] equals [math]npq[/math]. We also recall that the Poisson distribution could be obtained as a limit of binomial distributions, if [math]n[/math] goes to [math]\infty[/math] and [math]p[/math] goes to 0 in such a way that their product is kept fixed at the value [math]\lambda[/math]. In this case, [math]npq = \lambda q[/math] approaches [math]\lambda[/math], since [math]q[/math] goes to 1. So, given a Poisson distribution with parameter [math]\lambda[/math], we should guess that its variance is [math]\lambda[/math]. The reader is asked to show this in Exercise.

General references

Doyle, Peter G. (2006). "Grinstead and Snell's Introduction to Probability" (PDF). Retrieved June 6, 2024.