guide:A8456382bb: Difference between revisions
No edit summary |
mNo edit summary |
||
(2 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
Given at least two random variables <math>X</math>, <math>Y</math>, ..., the '''joint probability distribution''' for <math>X</math>, <math>Y</math>, ... is a [[guide:82d603b116#Continuous probability distribution|probability density function]] that gives the probability that each of <math>X</math>, <math>Y</math>, ... falls in any particular range or discrete set of values specified for that variable. In the case of only two random variables, this is called a '''bivariate distribution''', but the concept generalizes to any number of random variables, giving a '''multivariate distribution'''. | |||
The joint probability distribution can be expressed either in terms of a joint cumulative distribution function or in terms of a joint [[guide:82d603b116#Continuous probability distribution|probability density function]] (in the case of [[guide:82d603b116#Continuous probability distribution|continuous variable]]s) or joint [[guide:82d603b116#Probability_Mass_Function|probability mass function]](in the case of [[guide:B5ab48c211|discrete]] variables). | |||
==Examples== | |||
===Coin Flips=== | |||
Consider the flip of two [[fair coin|fair coin]]s; let <math>A</math> and <math>B</math> be discrete random variables associated with the outcomes first and second coin flips respectively. If a coin displays "heads" then associated random variable is 1, and is 0 otherwise. The joint probability mass function of <math>A</math> and <math>B</math> defines probabilities for each pair of outcomes. All possible outcomes are<math display="block"> | |||
(A=0,B=0), | |||
(A=0,B=1), | |||
(A=1,B=0), | |||
(A=1,B=1) | |||
</math> | |||
Since each outcome is equally likely the joint probability mass function becomes | |||
<math display="block">\operatorname{P}(A,B)=1/4</math> | |||
when <math>A,B\in\{0,1\}</math>. Since the coin flips are independent, the joint probability mass function is the product | |||
of the marginals: | |||
<math display="block">\operatorname{P}(A,B)=\operatorname{P}(A)\operatorname{P}(B)</math>. | |||
In general, each coin flip is a [[Bernoulli trial|Bernoulli trial]] and the sequence of flips follows a [[guide:B5ab48c211#Bernoulli_distribution|Bernoulli distribution]]. | |||
===Dice Rolls=== | |||
Consider the roll of a fair [[dice|dice]] and let <math>A</math> = 1 if the number is even (i.e. 2, 4, or 6) and <math>A</math> = 0 otherwise. Furthermore, let <math>B</math> = 1 if the number is prime (i.e. 2, 3, or 5) and <math>B</math> = 0 otherwise. | |||
{| class="table" | |||
|- | |||
! !! 1 !! 2 !! 3 !! 4 !! 5 !! 6 | |||
|- | |||
| A || 0 || 1 || 0 || 1 || 0 || 1 | |||
|- | |||
| B || 0 || 1 || 1 || 0 || 1 || 0 | |||
|} | |||
Then, the joint distribution of <math>A</math> and <math>B</math>, expressed as a probability mass function, is<math display="block"> | |||
\mathrm{P}(A=0,B=0)=P\{1\}=\frac{1}{6},\; \mathrm{P}(A=1,B=0)=P\{4,6\}=\frac{2}{6}, | |||
</math> | |||
<math display="block"> | |||
\mathrm{P}(A=0,B=1)=P\{3,5\}=\frac{2}{6},\; \mathrm{P}(A=1,B=1)=P\{2\}=\frac{1}{6}. | |||
</math> | |||
These probabilities necessarily sum to 1, since the probability of ''some'' combination of <math>A</math> and <math>B</math> occurring is 1. | |||
==Cumulative Distribution Function == | |||
When dealing simultaneously with more than one random variable the ''joint'' cumulative distribution function can also be defined. For example, for a pair of random variables ''X,Y'', the joint CDF <math>F</math> is given by<math display="block">F(x,y) = \operatorname{P}(X\leq x,Y\leq y),</math> | |||
where the right-hand side represents the [[probability|probability]] that the random variable <math>X</math> takes on a value less than or | |||
equal to <math>X</math> and that <math>Y</math> takes on a value less than or | |||
equal to <math>Y</math>. | |||
Every multivariate CDF is: | |||
# Monotonically non-decreasing for each of its variables | |||
# Right-continuous for each of its variables. | |||
# <math>0\leq F(x_{1},...,x_{n})\leq 1</math> | |||
# <math>\lim_{x_{1},...,x_{n}\rightarrow+\infty}F(x_{1},...,x_{n})=1</math> and <math>\lim_{x_{i}\rightarrow-\infty}F(x_{1},...,x_{n})=0,\quad \mbox{for all }i</math> | |||
==Density function or mass function== | |||
===Discrete case=== | |||
The joint [[guide:82d603b116#Probability_Mass_Function|probability mass function]] of a sequence of random variables <math>X_1,\ldots,X_n</math> is the multivariate function<math display="block"> | |||
\begin{equation} | |||
\operatorname{P}(X_1=x_1,\dots,X_n=x_n). | |||
\end{equation} | |||
</math> | |||
Since these are probabilities, we must have<math display="block">\sum_{i} \sum_{j} \dots \sum_{k} \mathrm{P}(X_1=x_{1i},X_2=x_{2j}, \dots, X_n=x_{nk}) = 1.\;</math> | |||
===Continuous case=== | |||
If <math>X_1,\ldots,X_n</math> are continuous random variables with | |||
<math display="block"> | |||
F_{X_1,\ldots,X_n}(x_1,\ldots,x_n) = \int_{-\infty}^{x_1}\cdots \int_{-\infty}^{x_n} f_{X_1,\ldots,X_n}(z_1,\ldots,z_n) \,\, dz_1 \cdots dz_n | |||
</math> | |||
,then <math>f_{X_1,\ldots,X_n} </math> is said to be a joint density function function for the sequence of random variables. | |||
==Marginal Distribution Functions == | |||
In [[probability theory|probability theory]] and [[statistics|statistics]], the '''marginal distribution''' of a [[subset|subset]] of a collection of [[guide:1b8642f694|random variable]]s is the [[guide:82d603b116#Continuous probability distribution|probability density function]] of the variables contained in the subset. It gives the probabilities of various values of the variables in the subset without reference to the values of the other variables. This contrasts with a [[conditional distribution|conditional distribution]], which gives the probabilities contingent upon the values of the other variables. | |||
The term '''marginal variable''' is used to refer to those variables in the subset of variables being retained. These terms are dubbed "marginal" because they used to be found by summing values in a table along rows or columns, and writing the sum in the margins of the table.<ref>Trumpler and Weaver (1962), pp. 32–33.</ref> The distribution of the marginal variables (the marginal distribution) is obtained by '''marginalizing''' over the distribution of the variables being discarded, and the discarded variables are said to have been '''marginalized out'''. | |||
The context here is that the theoretical studies being undertaken, or the [[data analysis|data analysis]] being done, involves a wider set of random variables but that attention is being limited to a reduced number of those variables. In many applications an analysis may start with a given collection of random variables, then first extend the set by defining new ones (such as the sum of the original random variables) and finally reduce the number by placing interest in the marginal distribution of a subset (such as the sum). Several different analyses may be done, each treating a different subset of variables as the marginal variables. | |||
===Two-variable case=== | |||
{| class="table table-bordered" | |||
! {{diagonal split header|Y|X}} ||x<sub>1</sub>||x<sub>2</sub>||x<sub>3</sub>||x<sub>4</sub>||p<sub>y</sub>(Y)↓ | |||
|- | |||
!|y<sub>1</sub> | |||
||<sup>4</sup>⁄<sub>32</sub> || <sup>2</sup>⁄<sub>32</sub> || <sup>1</sup>⁄<sub>32</sub> || <sup>1</sup>⁄<sub>32</sub> | |||
!| <sup>8</sup>⁄<sub>32</sub> | |||
|- | |||
!|y<sub>2</sub> | |||
|| <sup>2</sup>⁄<sub>32</sub> || <sup>4</sup>⁄<sub>32</sub> || <sup>1</sup>⁄<sub>32</sub> || <sup>1</sup>⁄<sub>32</sub> | |||
!| <sup>8</sup>⁄<sub>32</sub> | |||
|- | |||
!|y<sub>3</sub> | |||
|| <sup>2</sup>⁄<sub>32</sub> || <sup>2</sup>⁄<sub>32</sub> || <sup>2</sup>⁄<sub>32</sub> || <sup>2</sup>⁄<sub>32</sub> | |||
!| <sup>8</sup>⁄<sub>32</sub> | |||
|- | |||
!|y<sub>4</sub> | |||
|| <sup>8</sup>⁄<sub>32</sub> || 0 || 0 || 0 | |||
!| <sup>8</sup>⁄<sub>32</sub> | |||
|- | |||
!p<sub>x</sub>(X) → | |||
!| <sup>16</sup>⁄<sub>32</sub> || <sup>8</sup>⁄<sub>32</sub> || <sup>4</sup>⁄<sub>32</sub> || <sup>4</sup>⁄<sub>32</sub> | |||
!| <sup>32</sup><sup></sup>⁄<sub>32</sub> | |||
|- | |||
|colspan=6|Joint and marginal distributions of a pair of discrete, random variables X,Y having nonzero [[mutual information|mutual information]] I(X; Y). The values of the joint distribution are in the 4×4 square, and the values of the marginal distributions are along the right and bottom margins. | |||
|} | |||
Given two [[guide:1b8642f694|random variable]]s <math>X</math> and <math>Y</math> whose [[joint distribution|joint distribution]] is known, the marginal distribution of <math>X</math> is simply the [[guide:82d603b116#Continuous probability distribution|probability density function]] of <math>X</math> averaging over information about <math>Y</math>. It is the probability distribution of <math>X</math> when the value of <math>Y</math> is not known. This is typically calculated by summing or integrating the [[joint probability|joint probability]] distribution over <math>Y</math>. | |||
For [[guide:B5ab48c211|discrete random variable]]s, the marginal [[guide:82d603b116#Probability_Mass_Function|probability mass function]] can be written as <math>\operatorname{P}(X=x)</math>. This is | |||
<math display="block">\operatorname{P}(X=x) = \sum_{y} \operatorname{P}(X=x,Y=y)</math> | |||
where <math>\operatorname{P}(x,y)</math> is the [[joint distribution|joint distribution]] of <math>X</math> and <math>Y</math>. In this case, the variable <math>Y</math> has been marginalized out. | |||
Similarly for [[guide:269af6cf67|continuous random variable]]s, the marginal [[guide:82d603b116#Continuous probability distribution|probability density function]] can be written as <math>f_X(x)</math>. This is | |||
<math display="block">f_{X}(x) = \int_y f_{X,Y}(x,y) \, \operatorname{d}\!y</math> | |||
where <math>f_{X,Y}(x,y)</math> gives the joint density function of <math>X</math> and <math>Y</math>. Again, the variable <math>Y</math> has been marginalized out. | |||
===More than two variables=== | |||
For <math>i=1,\ldots,n</math>, let <math>f_X(i)(x_i)</math> be the probability density function associated with variable <math>X_i</math> alone. This is called the “marginal” density function, and can be deduced from the probability density associated with the random variables <math>X_1,\ldots,X_n</math> by integrating on all values of the <math>n-1</math> other variables: | |||
<math display="block">f_{X_i}(x_i) = \int f(x_1,\ldots,x_n)\, dx_1 \cdots dx_{i-1}\,dx_{i+1}\cdots dx_n .</math> | |||
== Independence == | |||
A set of random variables is '''pairwise independent''' if and only if every pair of random variables is independent. | |||
A set of random variables is '''mutually independent''' if and only if for any finite subset <math>X_1, \ldots, X_n</math> and any finite sequence of numbers <math>a_1, \ldots, a_n</math>, the events | |||
<math display="block">\{X_1 \le a_1\}, \ldots, \{X_n \le a_n\}</math> | |||
are mutually independent events. | |||
If the joint probability density function of a vector of <math>n</math> random variables can be factored into a product of <math>n</math> functions of one variable<math display="block">f_{X_1,\ldots,X_n}(x_1,\ldots,x_n) = f_1(x_1)\cdots f_n(x_n),</math> | |||
(where each <math>f_i</math> is not necessarily a density) then the <math>n</math> variables in the set are all independent from each other, and the marginal probability density function of each of them is given by<math display="block">f_{X_i}(x_i) = \frac{f_i(x_i)}{\int f_i(x)\,dx}.</math> | |||
=== i.i.d. Sequences === | |||
In [[probability theory|probability theory]] and [[statistics|statistics]], a [[sequence|sequence]] or other collection of [[guide:1b8642f694|random variable]]s is '''independent and identically distributed''' ('''i.i.d.''') if each random variable has the same [[guide:82d603b116#Continuous probability distribution|probability density function]] as the others and all are mutually [[guide:Af39987afc|independent]].<ref>{{cite web | url = http://tuvalu.santafe.edu/~aaronc/courses/7000/csci7000-001_2011_L0.pdf | title = A brief primer on probability distributions | author = Aaron Clauset | publisher = [[Santa Fe Institute|Santa Fe Institute]]}}</ref> | |||
The [[abbreviation|abbreviation]] ''i.i.d.'' is particularly common in [[statistics|statistics]] (often as ''iid'', sometimes written ''IID''), where observations in a [[statistical sample|sample]] are often assumed to be effectively i.i.d. for the purposes of [[statistical inference|statistical inference]]. The assumption (or requirement) that observations be i.i.d. tends to simplify the underlying mathematics of many statistical methods (see [[mathematical statistics|mathematical statistics]] and [[statistical theory|statistical theory]]). However, in practical applications of [[statistical modeling|statistical modeling]] the assumption may or may not be realistic. To test how realistic the assumption is on a given data set, the [[autocorrelation|autocorrelation]] can be computed, [[lag plot|lag plot]]s drawn or [[turning point test|turning point test]] performed.<ref>{{cite book | title = Performance Evaluation Of Computer And Communication Systems | first = Jean-Yves | last = Le Boudec | isbn = 978-2-940222-40-7 | year=2010 | publisher =[[EPFL Press|EPFL Press]]| url = http://infoscience.epfl.ch/record/146812/files/perfPublisherVersion.pdf |pages=46-47}}</ref> | |||
The generalization of [[exchangeable random variables|exchangeable random variables]] is often sufficient and more easily met. | |||
The assumption is important in the classical form of the [[guide:4b840b5280|central limit theorem]], which states that the probability distribution of the sum (or average) of i.i.d. variables with finite [[guide:E4d753a3b5|variance]] approaches a [[guide:269af6cf67#Normal Distribution|normal distribution]]. | |||
==Notes== | |||
{{reflist}} | |||
==References== | |||
*{{cite web |url= https://en.wikipedia.org/w/index.php?title=Joint_probability_distribution&oldid=1062976600 |title= Joint probability distribution |author = Wikipedia contributors |website= Wikipedia |publisher= Wikipedia |access-date = 28 January 2022 }} |
Latest revision as of 01:15, 5 April 2024
Given at least two random variables [math]X[/math], [math]Y[/math], ..., the joint probability distribution for [math]X[/math], [math]Y[/math], ... is a probability density function that gives the probability that each of [math]X[/math], [math]Y[/math], ... falls in any particular range or discrete set of values specified for that variable. In the case of only two random variables, this is called a bivariate distribution, but the concept generalizes to any number of random variables, giving a multivariate distribution.
The joint probability distribution can be expressed either in terms of a joint cumulative distribution function or in terms of a joint probability density function (in the case of continuous variables) or joint probability mass function(in the case of discrete variables).
Examples
Coin Flips
Consider the flip of two fair coins; let [math]A[/math] and [math]B[/math] be discrete random variables associated with the outcomes first and second coin flips respectively. If a coin displays "heads" then associated random variable is 1, and is 0 otherwise. The joint probability mass function of [math]A[/math] and [math]B[/math] defines probabilities for each pair of outcomes. All possible outcomes are
Since each outcome is equally likely the joint probability mass function becomes
when [math]A,B\in\{0,1\}[/math]. Since the coin flips are independent, the joint probability mass function is the product of the marginals:
.
In general, each coin flip is a Bernoulli trial and the sequence of flips follows a Bernoulli distribution.
Dice Rolls
Consider the roll of a fair dice and let [math]A[/math] = 1 if the number is even (i.e. 2, 4, or 6) and [math]A[/math] = 0 otherwise. Furthermore, let [math]B[/math] = 1 if the number is prime (i.e. 2, 3, or 5) and [math]B[/math] = 0 otherwise.
1 | 2 | 3 | 4 | 5 | 6 | |
---|---|---|---|---|---|---|
A | 0 | 1 | 0 | 1 | 0 | 1 |
B | 0 | 1 | 1 | 0 | 1 | 0 |
Then, the joint distribution of [math]A[/math] and [math]B[/math], expressed as a probability mass function, is
These probabilities necessarily sum to 1, since the probability of some combination of [math]A[/math] and [math]B[/math] occurring is 1.
Cumulative Distribution Function
When dealing simultaneously with more than one random variable the joint cumulative distribution function can also be defined. For example, for a pair of random variables X,Y, the joint CDF [math]F[/math] is given by
where the right-hand side represents the probability that the random variable [math]X[/math] takes on a value less than or equal to [math]X[/math] and that [math]Y[/math] takes on a value less than or equal to [math]Y[/math].
Every multivariate CDF is:
- Monotonically non-decreasing for each of its variables
- Right-continuous for each of its variables.
- [math]0\leq F(x_{1},...,x_{n})\leq 1[/math]
- [math]\lim_{x_{1},...,x_{n}\rightarrow+\infty}F(x_{1},...,x_{n})=1[/math] and [math]\lim_{x_{i}\rightarrow-\infty}F(x_{1},...,x_{n})=0,\quad \mbox{for all }i[/math]
Density function or mass function
Discrete case
The joint probability mass function of a sequence of random variables [math]X_1,\ldots,X_n[/math] is the multivariate function
Since these are probabilities, we must have
Continuous case
If [math]X_1,\ldots,X_n[/math] are continuous random variables with
,then [math]f_{X_1,\ldots,X_n} [/math] is said to be a joint density function function for the sequence of random variables.
Marginal Distribution Functions
In probability theory and statistics, the marginal distribution of a subset of a collection of random variables is the probability density function of the variables contained in the subset. It gives the probabilities of various values of the variables in the subset without reference to the values of the other variables. This contrasts with a conditional distribution, which gives the probabilities contingent upon the values of the other variables.
The term marginal variable is used to refer to those variables in the subset of variables being retained. These terms are dubbed "marginal" because they used to be found by summing values in a table along rows or columns, and writing the sum in the margins of the table.[1] The distribution of the marginal variables (the marginal distribution) is obtained by marginalizing over the distribution of the variables being discarded, and the discarded variables are said to have been marginalized out.
The context here is that the theoretical studies being undertaken, or the data analysis being done, involves a wider set of random variables but that attention is being limited to a reduced number of those variables. In many applications an analysis may start with a given collection of random variables, then first extend the set by defining new ones (such as the sum of the original random variables) and finally reduce the number by placing interest in the marginal distribution of a subset (such as the sum). Several different analyses may be done, each treating a different subset of variables as the marginal variables.
Two-variable case
X Y |
x1 | x2 | x3 | x4 | py(Y)↓ |
---|---|---|---|---|---|
y1 | 4⁄32 | 2⁄32 | 1⁄32 | 1⁄32 | 8⁄32 |
y2 | 2⁄32 | 4⁄32 | 1⁄32 | 1⁄32 | 8⁄32 |
y3 | 2⁄32 | 2⁄32 | 2⁄32 | 2⁄32 | 8⁄32 |
y4 | 8⁄32 | 0 | 0 | 0 | 8⁄32 |
px(X) → | 16⁄32 | 8⁄32 | 4⁄32 | 4⁄32 | 32⁄32 |
Joint and marginal distributions of a pair of discrete, random variables X,Y having nonzero mutual information I(X; Y). The values of the joint distribution are in the 4×4 square, and the values of the marginal distributions are along the right and bottom margins. |
Given two random variables [math]X[/math] and [math]Y[/math] whose joint distribution is known, the marginal distribution of [math]X[/math] is simply the probability density function of [math]X[/math] averaging over information about [math]Y[/math]. It is the probability distribution of [math]X[/math] when the value of [math]Y[/math] is not known. This is typically calculated by summing or integrating the joint probability distribution over [math]Y[/math].
For discrete random variables, the marginal probability mass function can be written as [math]\operatorname{P}(X=x)[/math]. This is
where [math]\operatorname{P}(x,y)[/math] is the joint distribution of [math]X[/math] and [math]Y[/math]. In this case, the variable [math]Y[/math] has been marginalized out.
Similarly for continuous random variables, the marginal probability density function can be written as [math]f_X(x)[/math]. This is
where [math]f_{X,Y}(x,y)[/math] gives the joint density function of [math]X[/math] and [math]Y[/math]. Again, the variable [math]Y[/math] has been marginalized out.
More than two variables
For [math]i=1,\ldots,n[/math], let [math]f_X(i)(x_i)[/math] be the probability density function associated with variable [math]X_i[/math] alone. This is called the “marginal” density function, and can be deduced from the probability density associated with the random variables [math]X_1,\ldots,X_n[/math] by integrating on all values of the [math]n-1[/math] other variables:
Independence
A set of random variables is pairwise independent if and only if every pair of random variables is independent.
A set of random variables is mutually independent if and only if for any finite subset [math]X_1, \ldots, X_n[/math] and any finite sequence of numbers [math]a_1, \ldots, a_n[/math], the events
are mutually independent events.
If the joint probability density function of a vector of [math]n[/math] random variables can be factored into a product of [math]n[/math] functions of one variable
(where each [math]f_i[/math] is not necessarily a density) then the [math]n[/math] variables in the set are all independent from each other, and the marginal probability density function of each of them is given by
i.i.d. Sequences
In probability theory and statistics, a sequence or other collection of random variables is independent and identically distributed (i.i.d.) if each random variable has the same probability density function as the others and all are mutually independent.[2]
The abbreviation i.i.d. is particularly common in statistics (often as iid, sometimes written IID), where observations in a sample are often assumed to be effectively i.i.d. for the purposes of statistical inference. The assumption (or requirement) that observations be i.i.d. tends to simplify the underlying mathematics of many statistical methods (see mathematical statistics and statistical theory). However, in practical applications of statistical modeling the assumption may or may not be realistic. To test how realistic the assumption is on a given data set, the autocorrelation can be computed, lag plots drawn or turning point test performed.[3] The generalization of exchangeable random variables is often sufficient and more easily met.
The assumption is important in the classical form of the central limit theorem, which states that the probability distribution of the sum (or average) of i.i.d. variables with finite variance approaches a normal distribution.
Notes
- Trumpler and Weaver (1962), pp. 32–33.
- Aaron Clauset. "A brief primer on probability distributions" (PDF). Santa Fe Institute.
- Le Boudec, Jean-Yves (2010). Performance Evaluation Of Computer And Communication Systems (PDF). EPFL Press. pp. 46–47. ISBN 978-2-940222-40-7.
References
- Wikipedia contributors. "Joint probability distribution". Wikipedia. Wikipedia. Retrieved 28 January 2022.