guide:E05b0a84f3: Difference between revisions

From Stochiki
No edit summary
mNo edit summary
Line 6: Line 6:
\newcommand{\NA}{{\rm NA}}
\newcommand{\NA}{{\rm NA}}
\newcommand{\mathds}{\mathbb}</math></div>
\newcommand{\mathds}{\mathbb}</math></div>
label{sec 4.2}
 
In situations where the sample space is continuous we will follow the same procedure as in the
In situations where the sample space is continuous we will follow the same procedure as in the
previous section.  Thus, for example, if <math>X</math> is a continuous random variable with density function
previous section.  Thus, for example, if <math>X</math> is a continuous random variable with density function
Line 36: Line 36:
size are ''equally likely,'' then the same will be true for the conditional
size are ''equally likely,'' then the same will be true for the conditional
density.
density.
<span id="exam 4.12"/>
<span id="exam 4.12"/>
'''Example'''  
'''Example'''  
In the spinner experiment (cf.\ [[guide:A070937c41#exam 2.1.1 |Example]]),
In the spinner experiment (cf. [[guide:A070937c41#exam 2.1.1 |Example]]),
suppose we know that the spinner has stopped with head in the upper half of the
suppose we know that the spinner has stopped with head in the upper half of the
circle, <math>0 \leq x \leq 1/2</math>.  What is the probability that <math>1/6 \leq x
circle, <math>0 \leq x \leq 1/2</math>.  What is the probability that <math>1/6 \leq x
Line 64: Line 65:
<span id="exam 4.13"/>
<span id="exam 4.13"/>
'''Example'''  
'''Example'''  
In the dart game (cf.\ [[guide:523e6267ef#exam 2.2.2 |Example]]),
In the dart game (cf. [[guide:523e6267ef#exam 2.2.2 |Example]]),
suppose we know that the dart lands in the upper half of the target.  What is
suppose we know that the dart lands in the upper half of the target.  What is
the probability that its distance from the center is less than 1/2?
the probability that its distance from the center is less than 1/2?
Here <math>E = \{\,(x,y) : y \geq 0\,\}</math>, and <math>F = \{\,(x,y) : x^2 + y^2  <  
Here <math>E = \{\,(x,y) : y \geq 0\,\}</math>, and <math>F = \{\,(x,y) : x^2 + y^2  < (1/2)^2\,\}</math>.  Hence,
(1/2)^2\,\}</math>.  Hence,


<math display="block">
<math display="block">
Line 88: Line 88:
<span id="exam 4.14"/>
<span id="exam 4.14"/>
'''Example'''  
'''Example'''  
We return to the exponential density (cf.\ [[guide:523e6267ef#exam 2.2.7.5 |Example]]).  We suppose that we are observing a lump
We return to the exponential density (cf. [[guide:523e6267ef#exam 2.2.7.5 |Example]]).  We suppose that we are observing a lump
of plutonium-239.  Our experiment consists of waiting for an emission, then starting a clock, and
of plutonium-239.  Our experiment consists of waiting for an emission, then starting a clock, and
recording the length of time <math>X</math> that passes until the next emission.  Experience has shown that
recording the length of time <math>X</math> that passes until the next emission.  Experience has shown that
Line 119: Line 119:
This tells us the rather surprising fact that the probability that we have to
This tells us the rather surprising fact that the probability that we have to
wait <math>s</math> seconds more for an emission, given that there has been no emission in <math>r</math> seconds,
wait <math>s</math> seconds more for an emission, given that there has been no emission in <math>r</math> seconds,
is ''independent'' of the time <math>r</math>.  This property (called the ''
is ''independent'' of the time <math>r</math>.  This property (called the ''memoryless'' property) was introduced in [[guide:523e6267ef#exam 2.2.7.5 |Example]].  
memoryless'' property) was introduced in [[guide:523e6267ef#exam 2.2.7.5 |Example]].  
When trying to model various phenomena, this property is helpful in deciding whether the
When trying to model various phenomena, this property is helpful in deciding whether the
exponential density is appropriate.   
exponential density is appropriate.   
Line 135: Line 134:
you will have to wait, on the average, for 30 minutes for a bus.
you will have to wait, on the average, for 30 minutes for a bus.


 
The reader can now see that in Exercises [[exercise:8b408c8df0 |Exercise]], [[exercise:B52419720c |Exercise]], and [[exercise:2f6a78a924|Exercise]], we were asking for simulations of conditional probabilities, under various assumptions on the distribution of the interarrival times.  If one makes a reasonable assumption about this
The reader can now see that in Exercises \ref{sec [[guide:523e6267ef#exer 2.2.8.5 |2.2}.]],  
distribution, such as the one in [[exer 2.2.8.6|Exercise]], then the average waiting time is more nearly one-half the average interarrival time.
\ref{sec [[guide:523e6267ef#exer 2.2.8.6 |2.2}.]], and \ref{sec 2.2}.\ref{exer
2.2.11}, we were asking for simulations of conditional probabilities, under various assumptions
on the distribution of the interarrival times.  If one makes a reasonable assumption about this
distribution, such as the one in Exercise \ref{sec [[guide:523e6267ef#exer 2.2.8.6 |2.2}.]], then the average  
waiting time is more nearly one-half the average interarrival time.
 


===Independent Events===
===Independent Events===
Line 151: Line 144:
other, so that to see whether two events are independent, only one of these equations must be
other, so that to see whether two events are independent, only one of these equations must be
checked.  It is also the case that, if <math>E</math> and <math>F</math> are independent, then <math>P(E \cap F) = P(E)P(F)</math>.
checked.  It is also the case that, if <math>E</math> and <math>F</math> are independent, then <math>P(E \cap F) = P(E)P(F)</math>.
<span id="exam 4.15"/>
<span id="exam 4.15"/>
'''Example'''  
'''Example'''  
Line 178: Line 172:
functions and cumulative distribution functions for multi-dimensional continuous random
functions and cumulative distribution functions for multi-dimensional continuous random
variables.   
variables.   
{{defncard|label=|id=def 4.5|
{{defncard|label=|id=def 4.5|Let <math>X_1,~X_2, \ldots,~X_n</math> be continuous random variables associated with an experiment, and
Let <math>X_1,~X_2, \ldots,~X_n</math> be continuous random variables associated with an experiment, and
let <math>{\bar X} = (X_1,~X_2, \ldots,~X_n)</math>.  Then the joint cumulative
let <math>{\bar X} = (X_1,~X_2, \ldots,~X_n)</math>.  Then the joint cumulative
distribution function of
distribution function of
Line 204: Line 197:
\end{equation}
\end{equation}
</math>
</math>


===Independent Random Variables===
===Independent Random Variables===
As with discrete random variables, we can define mutual independence of continuous
As with discrete random variables, we can define mutual independence of continuous
random variables.
random variables.
{{defncard|label=|id=def 4.6| Let <math>X_1</math>, <math>X_2</math>, \ldots, <math>X_n</math> be continuous random
{{defncard|label=|id=def 4.6|Let <math>X_1, X_2, \ldots, X_n</math> be continuous random
variables with cumulative distribution functions <math>F_1(x),~F_2(x), \ldots,~F_n(x)</math>.  Then
variables with cumulative distribution functions <math>F_1(x),~F_2(x), \ldots,~F_n(x)</math>.  Then
these random variables are ''mutually independent'' if
these random variables are ''mutually independent'' if
Line 224: Line 215:
variables are mutually independent, we shall say more briefly that they are ''
variables are mutually independent, we shall say more briefly that they are ''
independent.''}}
independent.''}}
Using [[#eq 4.4 |Equation]], the following theorem can easily be shown to hold for mutually
Using Equation \ref{eq 4.4}, the following theorem can easily be shown to hold for mutually
independent continuous random variables.
independent continuous random variables.
{{proofcard|Theorem|thm_4.2|Let <math>X_1</math>, <math>X_2</math>, \ldots, <math>X_n</math> be continuous random variables with density
{{proofcard|Theorem|thm_4.2|Let <math>X_1,X_2, \ldots,X_n</math> be continuous random variables with density
functions <math>f_1(x),~f_2(x), \ldots,~f_n(x)</math>.  Then these random variables are '' mutually
functions <math>f_1(x),~f_2(x), \ldots,~f_n(x)</math>.  Then these random variables are '' mutually
independent'' if and only if
independent'' if and only if
Line 233: Line 224:
f(x_1, x_2, \ldots, x_n) = f_1(x_1)f_2(x_2) \cdots  f_n(x_n)
f(x_1, x_2, \ldots, x_n) = f_1(x_1)f_2(x_2) \cdots  f_n(x_n)
</math>
</math>
  for any choice of <math>x_1,
for any choice of <math>x_1,x_2, \ldots, x_n</math>.|}}
x_2, \ldots, x_n</math>.|}}
 
Let's look at some examples.
Let's look at some examples.
'''Example'''  
'''Example'''  
In this example, we define three random variables, <math>X_1,\ X_2</math>, and <math>X_3</math>.  We will show that
In this example, we define three random variables, <math>X_1,\ X_2</math>, and <math>X_3</math>.  We will show that
Line 256: Line 248:
</math>
</math>
if <math>0 \leq r_2 \leq 1</math>.
if <math>0 \leq r_2 \leq 1</math>.
Now we have (see Figure \ref{fig 5.15})
Now we have (see [[#fig 5.15|Figure]])
<div id="PSfig5-15" class="d-flex justify-content-center">
<div id="fig 5.15" class="d-flex justify-content-center">
[[File:guide_e6d15_PSfig5-15.ps | 400px | thumb | ]]
[[File:guide_e6d15_PSfig5-15.png | 400px | thumb |<math>X_1</math> and <math>X_2</math> are independent. ]]
</div>
</div>


Line 272: Line 264:
</math>
</math>
In this case <math>F_{12}(r_1,r_2) = F_1(r_1)F_2(r_2)</math> so that <math>X_1</math> and <math>X_2</math> are
In this case <math>F_{12}(r_1,r_2) = F_1(r_1)F_2(r_2)</math> so that <math>X_1</math> and <math>X_2</math> are
independent.  On the other hand, if <math>r_1 = 1/4</math> and <math>r_3 = 1</math>, then (see Figure \ref{fig
independent.  On the other hand, if <math>r_1 = 1/4</math> and <math>r_3 = 1</math>, then (see [[#fig
5.16})
5.16|Figure]])
<div id="PSfig5-16" class="d-flex justify-content-center">
<div id="fig 5.16" class="d-flex justify-content-center">
[[File:guide_e6d15_PSfig5-16.ps | 400px | thumb | ]]
[[File:guide_e6d15_PSfig5-16.png | 400px | thumb | <math>X_1</math> and <math>X_3</math> are not independent. ]]
</div>
</div>


Line 304: Line 296:
Akadémiai Kiadò, 1970), p. 183.</ref>
Akadémiai Kiadò, 1970), p. 183.</ref>
{{proofcard|Theorem|thm_4.3|Let <math>X_1, X_2, \ldots, X_n</math> be mutually independent continuous random variables and let
{{proofcard|Theorem|thm_4.3|Let <math>X_1, X_2, \ldots, X_n</math> be mutually independent continuous random variables and let
<math>\phi_1(x), \phi_2(x), \ldots, \phi_n(x)</math> be continuous functions.  Then <math>\phi_1(X_1),</math>
<math>\phi_1(x), \phi_2(x), \ldots, \phi_n(x)</math> be continuous functions.  Then <math>\phi_1(X_1),\phi_2(X_2), \ldots, \phi_n(X_n)</math> are mutually independent.|}}
\newline
<math>\phi_2(X_2), \ldots, \phi_n(X_n)</math> are mutually independent.|}}


===Independent Trials===
===Independent Trials===
Using the notion of independence, we can now formulate for continuous sample spaces the notion
Using the notion of independence, we can now formulate for continuous sample spaces the notion
of independent trials (see [[guide:448d2aa013#def 5.5 |Definition]]).
of independent trials (see [[guide:448d2aa013#def 5.5 |Definition]]).
{{defncard|label=|id=def 5.12| A sequence <math>X_1</math>, <math>X_2</math>, \dots, <math>X_n</math> of random variables
{{defncard|label=|id=def 5.12|A sequence <math>X_1, X_2, \dots, X_n</math> of random variables
<math>X_i</math> that are mutually independent and have the same density is called an ''independent
<math>X_i</math> that are mutually independent and have the same density is called an ''independent
trials\linebreak[4] process.''}}
trials process.''}}




Line 342: Line 332:
When <math>\alpha</math> and <math>\beta</math> are greater than 1 the density is bell-shaped, but
When <math>\alpha</math> and <math>\beta</math> are greater than 1 the density is bell-shaped, but
when they are less than 1 it is U-shaped as suggested by the examples in
when they are less than 1 it is U-shaped as suggested by the examples in
Figure \ref{fig 4.6}.
[[#fig 4.6|Figure]].
<div id="PSfig4-6" class="d-flex justify-content-center">
<div id="fig 4.6" class="d-flex justify-content-center">
[[File:guide_e6d15_PSfig4-6.ps | 400px | thumb | ]]
[[File:guide_e6d15_PSfig4-6.png | 400px | thumb | Beta density for <math>\alpha = \beta = .5,1,2.</math> ]]
</div>
</div>
We shall need the values of the beta function only for integer values of
We shall need the values of the beta function only for integer values of
Line 442: Line 432:
Now we wish to calculate the probability that the drug is effective on the next subject.  For any
Now we wish to calculate the probability that the drug is effective on the next subject.  For any
particular real number <math>t</math> between 0 and 1, the probability that <math>x</math> has the value <math>t</math> is given by
particular real number <math>t</math> between 0 and 1, the probability that <math>x</math> has the value <math>t</math> is given by
the expression in [[#eq 4.5 |Equation]].  Given that <math>x</math> has the value <math>t</math>, the probability that the drug
the expression in Equation \ref{eq 4.5}.  Given that <math>x</math> has the value <math>t</math>, the probability that the drug
is effective on the next subject is just <math>t</math>.  Thus, to obtain the probability that the drug is effective
is effective on the next subject is just <math>t</math>.  Thus, to obtain the probability that the drug is effective
on the next subject, we integrate the product of the expression in [[#eq 4.5 |Equation]] and <math>t</math> over all
on the next subject, we integrate the product of the expression in Equation \ref{eq 4.5} and <math>t</math> over all
possible values of <math>t</math>.  We obtain:
possible values of <math>t</math>.  We obtain:


<math display="block">
<math display="block">
\begin{eqnarray*}
\begin{eqnarray*}
\lefteqn{\frac 1{B(\alpha + i,\beta + j)}  
\frac 1{B(\alpha + i,\beta + j)} \int_0^1 t\cdot t^{\alpha + i - 1}(1 - t)^{\beta + j - 1}\,dt
\int_0^1 t\cdot t^{\alpha + i - 1}(1 - t)^{\beta + j - 1}\,dt}
\\
  & = & \frac {B(\alpha + i + 1,\beta + j)}{B(\alpha + i,\beta + j)} \\
  & = & \frac {B(\alpha + i + 1,\beta + j)}{B(\alpha + i,\beta + j)} \\
  & = & \frac {(\alpha + i)!\,(\beta + j - 1)!}{(\alpha + \beta + i + j)!} \cdot  
  & = & \frac {(\alpha + i)!\,(\beta + j - 1)!}{(\alpha + \beta + i + j)!} \cdot  
Line 458: Line 446:
\end{eqnarray*}
\end{eqnarray*}
</math>
</math>
If <math>n</math> is large, then our estimate for the probability of success after the experiment is
If <math>n</math> is large, then our estimate for the probability of success after the experiment is
approximately the proportion of successes observed in the experiment, which is
approximately the proportion of successes observed in the experiment, which is
Line 464: Line 453:
The next example is another in which the true probabilities are unknown and must be
The next example is another in which the true probabilities are unknown and must be
estimated based upon experimental data.
estimated based upon experimental data.
<span id="exam 4.17"/>
<span id="exam 4.17"/>
'''Example'''  
'''Example'''  
Line 477: Line 467:
higher probability.  Let win(<math>i</math>), for <math>i = 1</math> or 2, be the number of times
higher probability.  Let win(<math>i</math>), for <math>i = 1</math> or 2, be the number of times
that you have won on the <math>i</math>th machine.  Similarly, let lose(<math>i</math>) be the number
that you have won on the <math>i</math>th machine.  Similarly, let lose(<math>i</math>) be the number
of times you have lost on the <math>i</math>th machine.  Then, from Example \ref{exam
of times you have lost on the <math>i</math>th machine.  Then, from [[#exam4.16|Example]], the probability <math>p(i)</math> that you win if you choose the <math>i</math>th machine is  
4.16}, the probability <math>p(i)</math> that you win if you choose the <math>i</math>th machine is  


<math display="block">
<math display="block">
Line 494: Line 483:
(dotted line), showing only the current densities.  We have run the program for
(dotted line), showing only the current densities.  We have run the program for
ten plays for the case <math>x = .6</math> and <math>y = .7</math>.  The result is shown in
ten plays for the case <math>x = .6</math> and <math>y = .7</math>.  The result is shown in
Figure \ref{fig 4.7}.
[[#fig 4.7|Figure]].
<div id="PSfig4-7" class="d-flex justify-content-center">
 
[[File:guide_e6d15_PSfig4-7.ps | 400px | thumb | ]]
<div id="fig 4.7" class="d-flex justify-content-center">
[[File:guide_e6d15_PSfig4-7.png | 600px | thumb |Play the best machine.]]
</div>
</div>
The run of the program shows the weakness of this strategy.  Our initial
The run of the program shows the weakness of this strategy.  Our initial
Line 508: Line 498:
well with this strategy, winning seven out of the ten trials, but ten trials
well with this strategy, winning seven out of the ten trials, but ten trials
are not enough to judge whether this is a good strategy in the long run.
are not enough to judge whether this is a good strategy in the long run.
<div id="PSfig4-8" class="d-flex justify-content-center">
 
[[File:guide_e6d15_PSfig4-8.ps | 400px | thumb | ]]
<div id="fig 4.8" class="d-flex justify-content-center">
[[File:guide_e6d15_PSfig4-8.png | 600px | thumb | Play the winner. ]]
</div>
</div>
Another popular strategy  is the ''play-the-winner strategy.''  As the name
Another popular strategy  is the ''play-the-winner strategy.''  As the name
suggests, for this strategy we choose the same machine when we win and switch
suggests, for this strategy we choose the same machine when we win and switch
machines when we lose.  The program ''' TwoArm''' will simulate this
machines when we lose.  The program ''' TwoArm''' will simulate this
strategy as well.  In Figure \ref{fig 4.8}, we show the results of running this program
strategy as well.  In [[#fig 4.8|Figure]], we show the results of running this program
with the play-the-winner strategy and the same true probabilities of .6 and .7
with the play-the-winner strategy and the same true probabilities of .6 and .7
for the two machines.  After ten plays our densities for the unknown
for the two machines.  After ten plays our densities for the unknown
Line 524: Line 515:
example have played an important role in the problem of clinical tests of drugs
example have played an important role in the problem of clinical tests of drugs
where experimenters face a similar situation.
where experimenters face a similar situation.
\exercises


==General references==
==General references==
{{cite web |url=https://math.dartmouth.edu/~prob/prob/prob.pdf |title=Grinstead and Snell’s Introduction to Probability |last=Doyle |first=Peter G.|date=2006 |access-date=June 6, 2024}}
{{cite web |url=https://math.dartmouth.edu/~prob/prob/prob.png |title=Grinstead and Snell’s Introduction to Probability |last=Doyle |first=Peter G.|date=2006 |access-date=June 6, 2024}}
==Notes==
==Notes==
{{Reflist|group=Notes}}
{{Reflist|group=Notes}}

Revision as of 20:10, 10 June 2024

[math] \newcommand{\NA}{{\rm NA}} \newcommand{\mat}[1]{{\bf#1}} \newcommand{\exref}[1]{\ref{##1}} \newcommand{\secstoprocess}{\all} \newcommand{\NA}{{\rm NA}} \newcommand{\mathds}{\mathbb}[/math]

In situations where the sample space is continuous we will follow the same procedure as in the previous section. Thus, for example, if [math]X[/math] is a continuous random variable with density function [math]f(x)[/math], and if [math]E[/math] is an event with positive probability, we define a conditional density function by the formula

[[math]] f(x|E) = \left \{ \matrix{ f(x)/P(E), & \mbox{if} \,\,x \in E, \cr 0, & \mbox{if}\,\,x \not \in E. \cr}\right. [[/math]]

Then for any event [math]F[/math], we have

[[math]] P(F|E) = \int_F f(x|E)\,dx\ . [[/math]]

The expression [math]P(F|E)[/math] is called the conditional probability of [math]F[/math] given [math]E[/math]. As in the previous section, it is easy to obtain an alternative expression for this probability:

[[math]] P(F|E) = \int_F f(x|E)\,dx = \int_{E\cap F} \frac {f(x)}{P(E)}\,dx = \frac {P(E\cap F)}{P(E)}\ . [[/math]]


We can think of the conditional density function as being 0 except on [math]E[/math], and normalized to have integral 1 over [math]E[/math]. Note that if the original density is a uniform density corresponding to an experiment in which all events of equal size are equally likely, then the same will be true for the conditional density.

Example In the spinner experiment (cf. Example), suppose we know that the spinner has stopped with head in the upper half of the circle, [math]0 \leq x \leq 1/2[/math]. What is the probability that [math]1/6 \leq x \leq 1/3[/math]? Here [math]E = [0,1/2][/math], [math]F = [1/6,1/3][/math], and [math]F \cap E = F[/math]. Hence

[[math]] \begin{eqnarray*} P(F|E) &=& \frac {P(F \cap E)}{P(E)} \\ &=& \frac {1/6}{1/2} \\ &=& \frac 13\ , \end{eqnarray*} [[/math]]

which is reasonable, since [math]F[/math] is 1/3 the size of [math]E[/math]. The conditional density function here is given by

[[math]] f(x|E) = \left \{ \matrix{ 2, & \mbox{if}\,\,\, 0 \leq x \lt 1/2, \cr 0, & \mbox{if}\,\,\, 1/2 \leq x \lt 1.\cr}\right. [[/math]]

Thus the conditional density function is nonzero only on [math][0,1/2][/math], and is uniform there.

Example In the dart game (cf. Example), suppose we know that the dart lands in the upper half of the target. What is the probability that its distance from the center is less than 1/2? Here [math]E = \{\,(x,y) : y \geq 0\,\}[/math], and [math]F = \{\,(x,y) : x^2 + y^2 \lt (1/2)^2\,\}[/math]. Hence,

[[math]] \begin{eqnarray*} P(F|E) & = & \frac {P(F \cap E)}{P(E)} = \frac {(1/\pi)[(1/2)(\pi/4)]} {(1/\pi)(\pi/2)} \\ & = & 1/4\ . \end{eqnarray*} [[/math]]

Here again, the size of [math]F \cap E[/math] is 1/4 the size of [math]E[/math]. The conditional density function is

[[math]] f((x,y)|E) = \left \{ \matrix{ f(x,y)/P(E) = 2/\pi, &\mbox{if}\,\,\,(x,y) \in E, \cr 0, &\mbox{if}\,\,\,(x,y) \not \in E.\cr}\right. [[/math]]

Example We return to the exponential density (cf. Example). We suppose that we are observing a lump of plutonium-239. Our experiment consists of waiting for an emission, then starting a clock, and recording the length of time [math]X[/math] that passes until the next emission. Experience has shown that [math]X[/math] has an exponential density with some parameter [math]\lambda[/math], which depends upon the size of the lump. Suppose that when we perform this experiment, we notice that the clock reads [math]r[/math] seconds, and is still running. What is the probability that there is no emission in a further [math]s[/math] seconds? Let [math]G(t)[/math] be the probability that the next particle is emitted after time [math]t[/math]. Then

[[math]] \begin{eqnarray*} G(t) & = & \int_t^\infty \lambda e^{-\lambda x}\,dx \\ & = & \left.-e^{-\lambda x}\right|_t^\infty = e^{-\lambda t}\ . \end{eqnarray*} [[/math]]


Let [math]E[/math] be the event “the next particle is emitted after time [math]r[/math]” and [math]F[/math] the event “the next particle is emitted after time [math]r + s[/math].” Then

[[math]] \begin{eqnarray*} P(F|E) & = & \frac {P(F \cap E)}{P(E)} \\ & = & \frac {G(r + s)}{G(r)} \\ & = & \frac {e^{-\lambda(r + s)}}{e^{-\lambda r}} \\ & = & e^{-\lambda s}\ . \end{eqnarray*} [[/math]]


This tells us the rather surprising fact that the probability that we have to wait [math]s[/math] seconds more for an emission, given that there has been no emission in [math]r[/math] seconds, is independent of the time [math]r[/math]. This property (called the memoryless property) was introduced in Example. When trying to model various phenomena, this property is helpful in deciding whether the exponential density is appropriate.


The fact that the exponential density is memoryless means that it is reasonable to assume if one comes upon a lump of a radioactive isotope at some random time, then the amount of time until the next emission has an exponential density with the same parameter as the time between emissions. A well-known example, known as the “bus paradox,” replaces the emissions by buses. The apparent paradox arises from the following two facts: 1) If you know that, on the average, the buses come by every 30 minutes, then if you come to the bus stop at a random time, you should only have to wait, on the average, for 15 minutes for a bus, and 2) Since the buses arrival times are being modelled by the exponential density, then no matter when you arrive, you will have to wait, on the average, for 30 minutes for a bus.

The reader can now see that in Exercises Exercise, Exercise, and Exercise, we were asking for simulations of conditional probabilities, under various assumptions on the distribution of the interarrival times. If one makes a reasonable assumption about this distribution, such as the one in Exercise, then the average waiting time is more nearly one-half the average interarrival time.

Independent Events

If [math]E[/math] and [math]F[/math] are two events with positive probability in a continuous sample space, then, as in the case of discrete sample spaces, we define [math]E[/math] and [math]F[/math] to be independent if [math]P(E|F) = P(E)[/math] and [math]P(F|E) = P(F)[/math]. As before, each of the above equations imply the other, so that to see whether two events are independent, only one of these equations must be checked. It is also the case that, if [math]E[/math] and [math]F[/math] are independent, then [math]P(E \cap F) = P(E)P(F)[/math].

Example In the dart game (see Example), let [math]E[/math] be the event that the dart lands in the upper half of the target ([math]y \geq 0[/math]) and [math]F[/math] the event that the dart lands in the right half of the target ([math]x \geq 0[/math]). Then [math]P(E \cap F)[/math] is the probability that the dart lies in the first quadrant of the target, and

[[math]] \begin{eqnarray*} P(E \cap F) & = & \frac 1\pi \int_{E \cap F} 1\,dxdy \\ & = & \mbox{Area}\,(E\cap F) \\ & = & \mbox{Area}\,(E)\,\mbox{Area}\,(F) \\ & = & \left(\frac 1\pi \int_E 1\,dxdy\right) \left(\frac 1\pi \int_F 1\,dxdy\right) \\ & = & P(E)P(F) \end{eqnarray*} [[/math]]

so that [math]E[/math] and [math]F[/math] are independent. What makes this work is that the events [math]E[/math] and [math]F[/math] are described by restricting different coordinates. This idea is made more precise\linebreak[4] below.


Joint Density and Cumulative Distribution Functions

In a manner analogous with discrete random variables, we can define joint density functions and cumulative distribution functions for multi-dimensional continuous random variables.

Definition

Let [math]X_1,~X_2, \ldots,~X_n[/math] be continuous random variables associated with an experiment, and let [math]{\bar X} = (X_1,~X_2, \ldots,~X_n)[/math]. Then the joint cumulative distribution function of [math]{\bar X}[/math] is defined by

[[math]] F(x_1, x_2, \ldots, x_n) = P(X_1 \le x_1, X_2 \le x_2, \ldots, X_n \le x_n)\ . [[/math]]
The joint density function of [math]{\bar X}[/math] satisfies the following equation:

[[math]] F(x_1, x_2, \ldots, x_n) = \int_{-\infty}^{x_1} \int_{-\infty}^{x_2} \cdots \int_{-\infty}^{x_n} f(t_1, t_2, \ldots t_n)\,dt_ndt_{n-1}\ldots dt_1\ . [[/math]]

It is straightforward to show that, in the above notation,

[[math]] \begin{equation} f(x_1, x_2, \ldots, x_n) = {{\partial^n F(x_1, x_2, \ldots, x_n)}\over {\partial x_1 \partial x_2 \cdots \partial x_n}}\ .\label{eq 4.4} \end{equation} [[/math]]

Independent Random Variables

As with discrete random variables, we can define mutual independence of continuous random variables.

Definition

Let [math]X_1, X_2, \ldots, X_n[/math] be continuous random variables with cumulative distribution functions [math]F_1(x),~F_2(x), \ldots,~F_n(x)[/math]. Then these random variables are mutually independent if

[[math]] F(x_1, x_2, \ldots, x_n) = F_1(x_1)F_2(x_2) \cdots F_n(x_n) [[/math]]

for any choice of [math]x_1, x_2, \ldots, x_n[/math]. Thus, if [math]X_1,~X_2, \ldots,~X_n[/math] are mutually independent, then the joint cumulative distribution function of the random variable [math]{\bar X} = (X_1, X_2, \ldots, X_n)[/math] is just the product of the individual cumulative distribution functions. When two random variables are mutually independent, we shall say more briefly that they are independent.

Using Equation \ref{eq 4.4}, the following theorem can easily be shown to hold for mutually independent continuous random variables.

Theorem

Let [math]X_1,X_2, \ldots,X_n[/math] be continuous random variables with density functions [math]f_1(x),~f_2(x), \ldots,~f_n(x)[/math]. Then these random variables are mutually independent if and only if

[[math]] f(x_1, x_2, \ldots, x_n) = f_1(x_1)f_2(x_2) \cdots f_n(x_n) [[/math]]
for any choice of [math]x_1,x_2, \ldots, x_n[/math].

Let's look at some examples.

Example In this example, we define three random variables, [math]X_1,\ X_2[/math], and [math]X_3[/math]. We will show that [math]X_1[/math] and [math]X_2[/math] are independent, and that [math]X_1[/math] and [math]X_3[/math] are not independent. Choose a point [math]\omega = (\omega_1,\omega_2)[/math] at random from the unit square. Set [math]X_1 = \omega_1^2[/math], [math]X_2 = \omega_2^2[/math], and [math]X_3 = \omega_1 + \omega_2[/math]. Find the joint distributions [math]F_{12}(r_1,r_2)[/math] and [math]F_{23}(r_2,r_3)[/math]. We have already seen (see Example) that

[[math]] \begin{eqnarray*} F_1(r_1) & = & P(-\infty \lt X_1 \leq r_1) \\ & = & \sqrt{r_1}, \qquad \mbox{if} \,\,0 \leq r_1 \leq 1\ , \end{eqnarray*} [[/math]]

and similarly,

[[math]] F_2(r_2) = \sqrt{r_2}\ , [[/math]]

if [math]0 \leq r_2 \leq 1[/math]. Now we have (see Figure)

[math]X_1[/math] and [math]X_2[/math] are independent.

[[math]] \begin{eqnarray*} F_{12}(r_1,r_2) & = & P(X_1 \leq r_1 \,\, \mbox{and}\,\, X_2 \leq r_2) \\ & = & P(\omega_1 \leq \sqrt{r_1} \,\,\mbox{and}\,\, \omega_2 \leq \sqrt{r_2}) \\ & = & \mbox{Area}\,(E_1)\\ & = & \sqrt{r_1} \sqrt{r_2} \\ & = &F_1(r_1)F_2(r_2)\ . \end{eqnarray*} [[/math]]

In this case [math]F_{12}(r_1,r_2) = F_1(r_1)F_2(r_2)[/math] so that [math]X_1[/math] and [math]X_2[/math] are independent. On the other hand, if [math]r_1 = 1/4[/math] and [math]r_3 = 1[/math], then (see [[#fig 5.16|Figure]])

[math]X_1[/math] and [math]X_3[/math] are not independent.

[[math]] \begin{eqnarray*} F_{13}(1/4,1) & = & P(X_1 \leq 1/4,\ X_3 \leq 1) \\ & = & P(\omega_1 \leq 1/2,\ \omega_1 + \omega_2 \leq 1) \\ & = & \mbox{Area}\,(E_2) \\ & = & \frac 12 - \frac 18 = \frac 38\ . \end{eqnarray*} [[/math]]

Now recalling that

[[math]] F_3(r_3) = \left \{ \matrix{ 0, & \mbox{if} \,\,r_3 \lt 0, \cr (1/2)r_3^2, & \mbox{if} \,\,0 \leq r_3 \leq 1, \cr 1-(1/2)(2-r_3)^2, & \mbox{if} \,\,1 \leq r_3 \leq 2, \cr 1, & \mbox{if} \,\,2 \lt r_3,\cr}\right. [[/math]]

(see Example), we have [math]F_1(1/4)F_3(1) = (1/2)(1/2) = 1/4[/math]. Hence, [math]X_1[/math] and [math]X_3[/math] are not independent random variables. A similar calculation shows that [math]X_2[/math] and [math]X_3[/math] are not independent either.

Although we shall not prove it here, the following theorem is a useful one. The statement also holds for mutually independent discrete random variables. A proof may be found in Rényi.[Notes 1]

Theorem

Let [math]X_1, X_2, \ldots, X_n[/math] be mutually independent continuous random variables and let [math]\phi_1(x), \phi_2(x), \ldots, \phi_n(x)[/math] be continuous functions. Then [math]\phi_1(X_1),\phi_2(X_2), \ldots, \phi_n(X_n)[/math] are mutually independent.

Independent Trials

Using the notion of independence, we can now formulate for continuous sample spaces the notion of independent trials (see Definition).

Definition

A sequence [math]X_1, X_2, \dots, X_n[/math] of random variables [math]X_i[/math] that are mutually independent and have the same density is called an independent trials process.


As in the case of discrete random variables, these independent trials processes arise naturally in situations where an experiment described by a single random variable is repeated [math]n[/math] times.

Beta Density

We consider next an example which involves a sample space with both discrete and continuous coordinates. For this example we shall need a new density function called the beta density. This density has two parameters [math]\alpha[/math], [math]\beta[/math] and is defined by

[[math]] B(\alpha,\beta,x) = \left \{ \matrix{ (1/B(\alpha,\beta))x^{\alpha - 1}(1 - x)^{\beta - 1}, & {\mbox{if}}\,\, 0 \leq x \leq 1, \cr 0, & {\mbox{otherwise}}.\cr}\right. [[/math]]

Here [math]\alpha[/math] and [math]\beta[/math] are any positive numbers, and the beta function [math]B(\alpha,\beta)[/math] is given by the area under the graph of [math]x^{\alpha - 1}(1 - x)^{\beta - 1}[/math] between 0 and 1:

[[math]] B(\alpha,\beta) = \int_0^1 x^{\alpha - 1}(1 - x)^{\beta - 1}\,dx\ . [[/math]]

Note that when [math]\alpha = \beta = 1[/math] the beta density if the uniform density. When [math]\alpha[/math] and [math]\beta[/math] are greater than 1 the density is bell-shaped, but when they are less than 1 it is U-shaped as suggested by the examples in Figure.

Beta density for [math]\alpha = \beta = .5,1,2.[/math]

We shall need the values of the beta function only for integer values of [math]\alpha[/math] and [math]\beta[/math], and in this case

[[math]] B(\alpha,\beta) = \frac{(\alpha - 1)!\,(\beta - 1)!}{(\alpha + \beta - 1)!}\ . [[/math]]

Example In medical problems it is often assumed that a drug is effective with a probability [math]x[/math] each time it is used and the various trials are independent, so that one is, in effect, tossing a biased coin with probability [math]x[/math] for heads. Before further experimentation, you do not know the value [math]x[/math] but past experience might give some information about its possible values. It is natural to represent this information by sketching a density function to determine a distribution for [math]x[/math]. Thus, we are considering [math]x[/math] to be a continuous random variable, which takes on values between 0 and 1. If you have no knowledge at all, you would sketch the uniform density. If past experience suggests that [math]x[/math] is very likely to be near 2/3 you would sketch a density with maximum at 2/3 and a spread reflecting your uncertainly in the estimate of 2/3. You would then want to find a density function that reasonably fits your sketch. The beta densities provide a class of densities that can be fit to most sketches you might make. For example, for [math]\alpha \gt 1[/math] and [math]\beta \gt 1[/math] it is bell-shaped with the parameters [math]\alpha[/math] and [math]\beta[/math] determining its peak and its spread.


Assume that the experimenter has chosen a beta density to describe the state of his knowledge about [math]x[/math] before the experiment. Then he gives the drug to [math]n[/math] subjects and records the number [math]i[/math] of successes. The number [math]i[/math] is a discrete random variable, so we may conveniently describe the set of possible outcomes of this experiment by referring to the ordered pair [math](x, i)[/math].


We let [math]m(i|x)[/math] denote the probability that we observe [math]i[/math] successes given the value of [math]x[/math]. By our assumptions, [math]m(i|x)[/math] is the binomial distribution with probability [math]x[/math] for success:

[[math]] m(i|x) = b(n,x,i) = {n \choose i} x^i(1 - x)^j\ , [[/math]]

where [math]j = n - i[/math].


If [math]x[/math] is chosen at random from [math][0,1][/math] with a beta density [math]B(\alpha,\beta,x)[/math], then the density function for the outcome of the pair [math](x,i)[/math] is

[[math]] \begin{eqnarray*} f(x,i) & = & m(i|x)B(\alpha,\beta,x) \\ & = & {n \choose i} x^i(1 - x)^j \frac 1{B(\alpha,\beta)} x^{\alpha - 1}(1 - x)^{\beta - 1} \\ & = & {n \choose i} \frac 1{B(\alpha,\beta)} x^{\alpha + i - 1}(1 - x)^{\beta + j - 1}\ . \end{eqnarray*} [[/math]]

Now let [math]m(i)[/math] be the probability that we observe [math]i[/math] successes not knowing the value of [math]x[/math]. Then

[[math]] \begin{eqnarray*} m(i) & = & \int_0^1 m(i|x) B(\alpha,\beta,x)\,dx \\ & = & {n \choose i} \frac 1{B(\alpha,\beta)} \int_0^1 x^{\alpha + i - 1}(1 - x)^{\beta + j - 1}\,dx \\ & = & {n \choose i} \frac {B(\alpha + i,\beta + j)}{B(\alpha,\beta)}\ . \end{eqnarray*} [[/math]]

Hence, the probability density [math]f(x|i)[/math] for [math]x[/math], given that [math]i[/math] successes were observed, is

[[math]] f(x|i) = \frac {f(x,i)}{m(i)} [[/math]]

[[math]] \begin{equation} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ = \frac {x^{\alpha + i - 1}(1 - x)^{\beta + j - 1}}{B(\alpha + i,\beta + j)}\ ,\label{eq 4.5} \end{equation} [[/math]]

that is, [math]f(x|i)[/math] is another beta density. This says that if we observe [math]i[/math] successes and [math]j[/math] failures in [math]n[/math] subjects, then the new density for the probability that the drug is effective is again a beta density but with parameters [math]\alpha + i[/math], [math]\beta + j[/math].


Now we assume that before the experiment we choose a beta density with parameters [math]\alpha[/math] and [math]\beta[/math], and that in the experiment we obtain [math]i[/math] successes in [math]n[/math] trials. We have just seen that in this case, the new density for [math]x[/math] is a beta density with parameters [math]\alpha + i[/math] and [math]\beta + j[/math].


Now we wish to calculate the probability that the drug is effective on the next subject. For any particular real number [math]t[/math] between 0 and 1, the probability that [math]x[/math] has the value [math]t[/math] is given by the expression in Equation \ref{eq 4.5}. Given that [math]x[/math] has the value [math]t[/math], the probability that the drug is effective on the next subject is just [math]t[/math]. Thus, to obtain the probability that the drug is effective on the next subject, we integrate the product of the expression in Equation \ref{eq 4.5} and [math]t[/math] over all possible values of [math]t[/math]. We obtain:

[[math]] \begin{eqnarray*} \frac 1{B(\alpha + i,\beta + j)} \int_0^1 t\cdot t^{\alpha + i - 1}(1 - t)^{\beta + j - 1}\,dt & = & \frac {B(\alpha + i + 1,\beta + j)}{B(\alpha + i,\beta + j)} \\ & = & \frac {(\alpha + i)!\,(\beta + j - 1)!}{(\alpha + \beta + i + j)!} \cdot \frac {(\alpha + \beta + i + j - 1)!}{(\alpha + i - 1)!\,(\beta + j - 1)!} \\ & = & \frac {\alpha + i}{\alpha + \beta + n}\ . \end{eqnarray*} [[/math]]

If [math]n[/math] is large, then our estimate for the probability of success after the experiment is approximately the proportion of successes observed in the experiment, which is certainly a reasonable conclusion.

The next example is another in which the true probabilities are unknown and must be estimated based upon experimental data.

Example You are in a casino and confronted by two slot machines. Each machine pays off either 1 dollar or nothing. The probability that the first machine pays off a dollar is [math]x[/math] and that the second machine pays off a dollar is [math]y[/math]. We assume that [math]x[/math] and [math]y[/math] are random numbers chosen independently from the interval [math][0,1][/math] and unknown to you. You are permitted to make a series of ten plays, each time choosing one machine or the other. How should you choose to maximize the number of times that you win? One strategy that sounds reasonable is to calculate, at every stage, the probability that each machine will pay off and choose the machine with the higher probability. Let win([math]i[/math]), for [math]i = 1[/math] or 2, be the number of times that you have won on the [math]i[/math]th machine. Similarly, let lose([math]i[/math]) be the number of times you have lost on the [math]i[/math]th machine. Then, from Example, the probability [math]p(i)[/math] that you win if you choose the [math]i[/math]th machine is

[[math]] p(i) = \frac {{\mbox{win}}(i) + 1} {{\mbox{win}}(i) + {\mbox{lose}}(i) + 2}\ . [[/math]]

Thus, if [math]p(1) \gt p(2)[/math] you would play machine 1 and otherwise you would play machine 2. We have written a program TwoArm to simulate this experiment. In the program, the user specifies the initial values for [math]x[/math] and [math]y[/math] (but these are unknown to the experimenter). The program calculates at each stage the two conditional densities for [math]x[/math] and [math]y[/math], given the outcomes of the previous trials, and then computes [math]p(i)[/math], for [math]i = 1[/math], 2. It then chooses the machine with the highest value for the probability of winning for the next play. The program prints the machine chosen on each play and the outcome of this play. It also plots the new densities for [math]x[/math] (solid line) and [math]y[/math] (dotted line), showing only the current densities. We have run the program for ten plays for the case [math]x = .6[/math] and [math]y = .7[/math]. The result is shown in Figure.

Play the best machine.

The run of the program shows the weakness of this strategy. Our initial probability for winning on the better of the two machines is .7. We start with the poorer machine and our outcomes are such that we always have a probability greater than .6 of winning and so we just keep playing this machine even though the other machine is better. If we had lost on the first play we would have switched machines. Our final density for [math]y[/math] is the same as our initial density, namely, the uniform density. Our final density for [math]x[/math] is different and reflects a much more accurate knowledge about [math]x[/math]. The computer did pretty well with this strategy, winning seven out of the ten trials, but ten trials are not enough to judge whether this is a good strategy in the long run.

Play the winner.

Another popular strategy is the play-the-winner strategy. As the name suggests, for this strategy we choose the same machine when we win and switch machines when we lose. The program TwoArm will simulate this strategy as well. In Figure, we show the results of running this program with the play-the-winner strategy and the same true probabilities of .6 and .7 for the two machines. After ten plays our densities for the unknown probabilities of winning suggest to us that the second machine is indeed the better of the two. We again won seven out of the ten trials. Neither of the strategies that we simulated is the best one in terms of maximizing our average winnings. This best strategy is very complicated but is reasonably approximated by the play-the-winner strategy. Variations on this example have played an important role in the problem of clinical tests of drugs where experimenters face a similar situation.

General references

Doyle, Peter G. (2006). "Grinstead and Snell's Introduction to Probability". Retrieved June 6, 2024.

Notes

  1. A. Rényi, Probability Theory (Budapest: Akadémiai Kiadò, 1970), p. 183.