guide:70de84392b: Difference between revisions
No edit summary |
mNo edit summary |
||
(2 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
In [[probability theory|probability theory]], '''conditional probability''' is a measure of the probability of an [[Event (probability theory)|event]] given that (by assumption, presumption, assertion or evidence) another event has occurred.<ref name="Allan Gut 2013">{{cite book |last=Gut |first=Allan |title=Probability: A Graduate Course |year=2013 |publisher=Springer |location=New York, NY |isbn=978-1-4614-4707-8 |edition=Second }}</ref> If the event of interest is <math>A</math> and the event <math>B</math> is known or assumed to have occurred, "the conditional probability of <math>A</math> given <math>B</math>", or "the probability of <math>A</math> under the condition <math>B</math>", is usually written as <math>\operatorname{P}</math>(<math>A</math>|<math>B</math>). For example, the probability that any given person has a cough on any given day may be only 5%. But if we know or assume that the person has a [[Common cold|cold]], then they are much more likely to be coughing. The conditional probability of coughing given that you have a cold might be a much higher 75%. | |||
==Definition == | |||
Given two [[event (probability theory)|events]] <math>A</math> and <math>B</math> from the sigma-field of a probability space with <math>\operatorname{P}(B)>0</math>, the conditional probability of <math>A</math> given <math>B</math> is defined as the quotient of the probability of the joint of events <math>A</math> and <math>B</math>, and the probability of <math>B</math>:<math display="block">\operatorname{P}(A|B) = \frac{\operatorname{P}(A \cap B)}{\operatorname{P}(B)}</math> | |||
This may be visualized as restricting the sample space to <math>B</math>. The logic behind this equation is that if the outcomes are restricted to <math>B</math>, this set serves as the new sample space. | |||
Note that this is a definition but not a theoretical result. We just denote the quantity ''<math>\operatorname{P}(A\cap B)/\operatorname{P}(B)</math>'' as ''<math>\operatorname{P}(A|B)</math>'' and call it the conditional probability of <math>A</math> given <math>B</math>. | |||
== Example == | |||
Suppose that somebody secretly rolls two fair six-sided dice, and we must predict the outcome. Let <math>A</math> be the value rolled on dice 1 and let <math>B</math> be the value rolled on dice 2. | |||
===What is the probability that <math>A=2 </math> ?=== | |||
Table 1 shows the [[sample space|sample space]] of 36 outcomes. Clearly, <math>A =2 </math> in exactly 6 of the 36 outcomes, thus <math>\operatorname{P}(A=2)=1/6</math>. | |||
:{| class="table table-bordered" style="text-align:center; width:80%" | |||
|+ Table 1 | |||
! rowspan=2 colspan=2 | + | |||
! colspan=6 | B | |||
|- | |||
! scope="col" | 1 | |||
! scope="col" | 2 | |||
! scope="col" | 3 | |||
! scope="col" | 4 | |||
! scope="col" | 5 | |||
! scope="col" | 6 | |||
|- | |||
! rowspan=6 scope="row" | A | |||
! scope="row" | 1 | |||
| 2 || 3 || 4 || 5 || 6 || 7 | |||
|- style="background: #5bc0de;" | |||
! scope="row" | 2 | |||
| 3 || 4 || 5 || 6 || 7 || 8 | |||
|- | |||
! scope="row" | 3 | |||
| 4 || 5 || 6 || 7 || 8 || 9 | |||
|- | |||
! scope="row" | 4 | |||
| 5 || 6 || 7 || 8 || 9 || 10 | |||
|- | |||
! scope="row" | 5 | |||
| 6 || 7 || 8 || 9 || 10 || 11 | |||
|- | |||
! scope="row" | 6 | |||
| 7 || 8 || 9 || 10 || 11 || 12 | |||
|} | |||
===What is the probability <math>A+B \leq 5 </math> ? === | |||
Table 2 shows that <math>A+B \leq 5</math> for exactly 10 of the same 36 outcomes, thus <math>\operatorname{P}(A +B \leq 5) = 10/36</math>. | |||
{| class="table table-bordered" style=" text-align:center; width:80%" | |||
! rowspan=2 colspan=2 | + | |||
! colspan=6 | B | |||
|- | |||
! scope="col" | 1 | |||
! scope="col" | 2 | |||
! scope="col" | 3 | |||
! scope="col" | 4 | |||
! scope="col" | 5 | |||
! scope="col" | 6 | |||
|- | |||
! rowspan=6 scope="row" | A | |||
! 1 | |||
| style="background:#5bc0de;" | 2 || style="background:#5bc0de;" | 3 || style="background:#5bc0de;" | 4 || style="background:#5bc0de;" | 5 || 6 || 7 | |||
|- | |||
! scope="row" | 2 | |||
| style="background:#5bc0de;" | 3 || style="background:#5bc0de;" | 4 || style="background:#5bc0de;" | 5 || 6 || 7 || 8 | |||
|- | |||
! scope="row" | 3 | |||
| style="background:#5bc0de;" | 4 || style="background:#5bc0de;" | 5 || 6 || 7 || 8 || 9 | |||
|- | |||
! scope="row" | 4 | |||
| style="background:#5bc0de;" | 5 || 6 || 7 || 8 || 9 || 10 | |||
|- | |||
! scope="row" | 5 | |||
| 6 || 7 || 8 || 9 || 10 || 11 | |||
|- style="background: #5bc0de;" | |||
|- | |||
! scope="row" | 6 | |||
| 7 || 8 || 9 || 10 || 11 || 12 | |||
|- style="background: #5bc0de;" | |||
|} | |||
===What is the probability that <math>A = 2 </math> given that <math>A + B \leq 5</math> ? === | |||
Table 3 shows that for 3 of these 10 outcomes, <math>A</math> = 2, thus the conditional probability <math>\operatorname{P}</math>(<math>A =2 | A + B \leq 5) = 3/10 </math>. | |||
{| class="table table-bordered" style="text-align:center; width:80%" | |||
|+ Table 3 | |||
! rowspan=2 colspan=2 | + | |||
! colspan=6 | B | |||
|- | |||
! scope="col" | 1 | |||
! scope="col" | 2 | |||
! scope="col" | 3 | |||
! scope="col" | 4 | |||
! scope="col" | 5 | |||
! scope="col" | 6 | |||
|- | |||
! rowspan=6 scope="row" | A | |||
! 1 | |||
| style="background:lightgrey;" | 2 || style="background:lightgrey;" | 3 || style="background:lightgrey;" | 4 || style="background:lightgrey;" | 5 || 6 || 7 | |||
|- | |||
! scope="row" | 2 | |||
| style="background:#5bc0de;" | 3 || style="background:#5bc0de;" | 4 || style="background:#5bc0de;" | 5 || 6 || 7 || 8 | |||
|- | |||
! scope="row" | 3 | |||
| style="background:lightgrey;" | 4 || style="background:lightgrey;" | 5 || 6 || 7 || 8 || 9 | |||
|- | |||
! scope="row" | 4 | |||
| style="background:lightgrey;" | 5 || 6 || 7 || 8 || 9 || 10 | |||
|- | |||
! scope="row" | 5 | |||
| 6 || 7 || 8 || 9 || 10 || 11 | |||
|- | |||
! scope="row" | 6 | |||
| 7 || 8 || 9 || 10 || 11 || 12 | |||
|} | |||
== Use in inference == | |||
In statistical inference, the conditional probability is an update of the probability of an [[Event (probability theory)|event]] based on new information.<ref name="Casella and Berger 2002">{{cite book|last1=Casella|first1=George|title=Statistical Inference|last2=Berger|first2=Roger L.|publisher=Duxbury Press|year=2002|isbn=0-534-24312-6}}</ref> Incorporating the new information can be done as follows <ref name="Allan Gut 2013" /> | |||
* Let <math>A</math> the event of interest be in the [[sample space|sample space]]. | |||
* The occurrence of the event <math>A</math> knowing that event <math>B</math> has or will have occurred, means the occurrence of <math>A</math> as it is restricted to <math>B</math>, i.e. <math>A \cap B</math>. | |||
* Without the knowledge of the occurrence of <math>B</math>, the information about the occurrence of <math>A</math> would simply be <math>\operatorname{P}</math>(<math>A</math>) | |||
* The probability of <math>A</math> knowing that event <math>B</math> has or will have occurred, will be the probability of <math>A \cap B</math> compared with <math>\operatorname{P}(B)</math>, the probability <math>B</math> has occurred. | |||
* This results in <math>\operatorname{P}(A|B) = \operatorname{P}(A \cap B )/\operatorname{P}(B)</math> whenever <math>\operatorname{P}(B)>0</math> and 0 otherwise. | |||
{{alert-info|The phraseology "evidence" or "information" is generally used in the [[Bayesian probability|Bayesian interpretation of probability]]. The conditioning event is interpreted as evidence for the conditioned event. That is, <math>\operatorname{P}(A)</math> is the probability of <math>A</math> before accounting for evidence <math>E</math>, and <math>\operatorname{P}(A|E)</math> is the probability of <math>A</math> after having accounted for evidence <math>E</math> or after having updated <math>\operatorname{P}(A)</math>.}} | |||
== Common fallacies == | |||
=== Assuming conditional probability is of similar size to its inverse === | |||
In general, it cannot be assumed that <math>\operatorname{P}</math>(<math>A</math>|<math>B</math>) ≈ <math>\operatorname{P}</math>(<math>B</math>|<math>A</math>). This can be an insidious error, even for those who are highly conversant with statistics.<ref>Paulos, J.A. (1988) ''Innumeracy: Mathematical Illiteracy and its Consequences'', Hill and Wang. ISBN 0-8090-7447-8 (p. 63 ''et seq.'')</ref> The relationship between <math>\operatorname{P}</math>(<math>A</math>|<math>B</math>) and <math>\operatorname{P}</math>(<math>B</math>|<math>A</math>) is given by [[Bayes' theorem|Bayes' theorem]]:<math display="block"> | |||
\operatorname{P}(B|A) = \frac{\operatorname{P}(A|B) \operatorname{P}(B)}{\operatorname{P}(A)} \Leftrightarrow \frac{\operatorname{P}(B|A)}{\operatorname{P}(A|B)} = \frac{\operatorname{P}(B)}{\operatorname{P}(A)}. | |||
</math> | |||
That is, <math>\operatorname{P}(A|B)≈\operatorname{P}(B|A)</math> only if <math>\operatorname{P}(B)/\operatorname{P}(A)≈1</math>, or equivalently, <math>\operatorname{P}(A)≈\operatorname{P}(B)</math>. Alternatively, noting that <math>A \cap B = B \cap A</math>, and applying conditional probability:<math display="block">\operatorname{P}(A|B)\operatorname{P}(B) = \operatorname{P}(A \cap B) = \operatorname{P}(B \cap A) = \operatorname{P}(B|A)\operatorname{P}(A)</math> | |||
Rearranging gives the result. | |||
=== Assuming marginal and conditional probabilities are of similar size === | |||
In general, it cannot be assumed that <math>\operatorname{P}(A)</math> ≈ <math>\operatorname{P}</math>(<math>A</math>|<math>B</math>). These probabilities are linked through the [[law of total probability|law of total probability]]:<math display="block">\operatorname{P}(A) = \sum_n \operatorname{P}(A \cap B_n) = \sum_n \operatorname{P}(A|B_n)\operatorname{P}(B_n)</math> | |||
where the events <math>(B_n)</math> form a countable [[Partition of a set|partition]] of <math>A</math>. | |||
This fallacy may arise through [[selection bias|selection bias]].<ref>Thomas Bruss, F; Der Wyatt Earp Effekt; Spektrum der Wissenschaft; March 2007</ref> For example, in the context of a medical claim, let <math>S_C</math> be the event that a [[sequelae|sequela]] (chronic disease) <math>S</math> occurs as a consequence of circumstance (acute condition) <math>C</math>. Let <math>H</math> be the event that an individual seeks medical help. Suppose that in most cases, <math>C</math> does not cause <math>S</math> so <math>\operatorname{P}(S_{C})</math> is low. Suppose also that medical attention is only sought if <math>S</math> has occurred due to <math>C</math>. From experience of patients, a doctor may therefore erroneously conclude that <math>\operatorname{P}(S_C)</math> is high. The actual probability observed by the doctor is <math>\operatorname{P}(S_C|H)</math>. | |||
== Notes == | |||
{{Reflist}} | |||
==References== | |||
*{{cite web |url= https://en.wikipedia.org/w/index.php?title=Conditional_probability&oldid=1068038253 |title= Conditional probability |author = Wikipedia contributors |website= Wikipedia |publisher= Wikipedia |access-date = 28 January 2022 }} |
Latest revision as of 15:23, 4 April 2024
In probability theory, conditional probability is a measure of the probability of an event given that (by assumption, presumption, assertion or evidence) another event has occurred.[1] If the event of interest is [math]A[/math] and the event [math]B[/math] is known or assumed to have occurred, "the conditional probability of [math]A[/math] given [math]B[/math]", or "the probability of [math]A[/math] under the condition [math]B[/math]", is usually written as [math]\operatorname{P}[/math]([math]A[/math]|[math]B[/math]). For example, the probability that any given person has a cough on any given day may be only 5%. But if we know or assume that the person has a cold, then they are much more likely to be coughing. The conditional probability of coughing given that you have a cold might be a much higher 75%.
Definition
Given two events [math]A[/math] and [math]B[/math] from the sigma-field of a probability space with [math]\operatorname{P}(B)\gt0[/math], the conditional probability of [math]A[/math] given [math]B[/math] is defined as the quotient of the probability of the joint of events [math]A[/math] and [math]B[/math], and the probability of [math]B[/math]:
This may be visualized as restricting the sample space to [math]B[/math]. The logic behind this equation is that if the outcomes are restricted to [math]B[/math], this set serves as the new sample space.
Note that this is a definition but not a theoretical result. We just denote the quantity [math]\operatorname{P}(A\cap B)/\operatorname{P}(B)[/math] as [math]\operatorname{P}(A|B)[/math] and call it the conditional probability of [math]A[/math] given [math]B[/math].
Example
Suppose that somebody secretly rolls two fair six-sided dice, and we must predict the outcome. Let [math]A[/math] be the value rolled on dice 1 and let [math]B[/math] be the value rolled on dice 2.
What is the probability that [math]A=2 [/math] ?
Table 1 shows the sample space of 36 outcomes. Clearly, [math]A =2 [/math] in exactly 6 of the 36 outcomes, thus [math]\operatorname{P}(A=2)=1/6[/math].
Table 1 + B 1 2 3 4 5 6 A 1 2 3 4 5 6 7 2 3 4 5 6 7 8 3 4 5 6 7 8 9 4 5 6 7 8 9 10 5 6 7 8 9 10 11 6 7 8 9 10 11 12
What is the probability [math]A+B \leq 5 [/math] ?
Table 2 shows that [math]A+B \leq 5[/math] for exactly 10 of the same 36 outcomes, thus [math]\operatorname{P}(A +B \leq 5) = 10/36[/math].
+ | B | ||||||
---|---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | 6 | ||
A | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
2 | 3 | 4 | 5 | 6 | 7 | 8 | |
3 | 4 | 5 | 6 | 7 | 8 | 9 | |
4 | 5 | 6 | 7 | 8 | 9 | 10 | |
5 | 6 | 7 | 8 | 9 | 10 | 11 | |
6 | 7 | 8 | 9 | 10 | 11 | 12 |
What is the probability that [math]A = 2 [/math] given that [math]A + B \leq 5[/math] ?
Table 3 shows that for 3 of these 10 outcomes, [math]A[/math] = 2, thus the conditional probability [math]\operatorname{P}[/math]([math]A =2 | A + B \leq 5) = 3/10 [/math].
+ | B | ||||||
---|---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | 6 | ||
A | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
2 | 3 | 4 | 5 | 6 | 7 | 8 | |
3 | 4 | 5 | 6 | 7 | 8 | 9 | |
4 | 5 | 6 | 7 | 8 | 9 | 10 | |
5 | 6 | 7 | 8 | 9 | 10 | 11 | |
6 | 7 | 8 | 9 | 10 | 11 | 12 |
Use in inference
In statistical inference, the conditional probability is an update of the probability of an event based on new information.[2] Incorporating the new information can be done as follows [1]
- Let [math]A[/math] the event of interest be in the sample space.
- The occurrence of the event [math]A[/math] knowing that event [math]B[/math] has or will have occurred, means the occurrence of [math]A[/math] as it is restricted to [math]B[/math], i.e. [math]A \cap B[/math].
- Without the knowledge of the occurrence of [math]B[/math], the information about the occurrence of [math]A[/math] would simply be [math]\operatorname{P}[/math]([math]A[/math])
- The probability of [math]A[/math] knowing that event [math]B[/math] has or will have occurred, will be the probability of [math]A \cap B[/math] compared with [math]\operatorname{P}(B)[/math], the probability [math]B[/math] has occurred.
- This results in [math]\operatorname{P}(A|B) = \operatorname{P}(A \cap B )/\operatorname{P}(B)[/math] whenever [math]\operatorname{P}(B)\gt0[/math] and 0 otherwise.
Common fallacies
Assuming conditional probability is of similar size to its inverse
In general, it cannot be assumed that [math]\operatorname{P}[/math]([math]A[/math]|[math]B[/math]) ≈ [math]\operatorname{P}[/math]([math]B[/math]|[math]A[/math]). This can be an insidious error, even for those who are highly conversant with statistics.[3] The relationship between [math]\operatorname{P}[/math]([math]A[/math]|[math]B[/math]) and [math]\operatorname{P}[/math]([math]B[/math]|[math]A[/math]) is given by Bayes' theorem:
That is, [math]\operatorname{P}(A|B)≈\operatorname{P}(B|A)[/math] only if [math]\operatorname{P}(B)/\operatorname{P}(A)≈1[/math], or equivalently, [math]\operatorname{P}(A)≈\operatorname{P}(B)[/math]. Alternatively, noting that [math]A \cap B = B \cap A[/math], and applying conditional probability:
Rearranging gives the result.
Assuming marginal and conditional probabilities are of similar size
In general, it cannot be assumed that [math]\operatorname{P}(A)[/math] ≈ [math]\operatorname{P}[/math]([math]A[/math]|[math]B[/math]). These probabilities are linked through the law of total probability:
where the events [math](B_n)[/math] form a countable partition of [math]A[/math].
This fallacy may arise through selection bias.[4] For example, in the context of a medical claim, let [math]S_C[/math] be the event that a sequela (chronic disease) [math]S[/math] occurs as a consequence of circumstance (acute condition) [math]C[/math]. Let [math]H[/math] be the event that an individual seeks medical help. Suppose that in most cases, [math]C[/math] does not cause [math]S[/math] so [math]\operatorname{P}(S_{C})[/math] is low. Suppose also that medical attention is only sought if [math]S[/math] has occurred due to [math]C[/math]. From experience of patients, a doctor may therefore erroneously conclude that [math]\operatorname{P}(S_C)[/math] is high. The actual probability observed by the doctor is [math]\operatorname{P}(S_C|H)[/math].
Notes
- 1.0 1.1 Gut, Allan (2013). Probability: A Graduate Course (Second ed.). New York, NY: Springer. ISBN 978-1-4614-4707-8.
- Casella, George; Berger, Roger L. (2002). Statistical Inference. Duxbury Press. ISBN 0-534-24312-6.
- Paulos, J.A. (1988) Innumeracy: Mathematical Illiteracy and its Consequences, Hill and Wang. ISBN 0-8090-7447-8 (p. 63 et seq.)
- Thomas Bruss, F; Der Wyatt Earp Effekt; Spektrum der Wissenschaft; March 2007
References
- Wikipedia contributors. "Conditional probability". Wikipedia. Wikipedia. Retrieved 28 January 2022.