guide:A8456382bb: Difference between revisions

From Stochiki
No edit summary
mNo edit summary
Line 1: Line 1:
Given at least two random variables <math>X</math>, <math>Y</math>, ..., the '''joint probability distribution''' for <math>X</math>, <math>Y</math>, ... is a [[wikipedia:probability distribution|probability distribution]] that gives the probability that each of <math>X</math>, <math>Y</math>, ... falls in any particular range or discrete set of values specified for that variable. In the case of only two random variables, this is called a '''bivariate distribution''', but the concept generalizes to any number of random variables, giving a '''multivariate distribution'''.
Given at least two random variables <math>X</math>, <math>Y</math>, ..., the '''joint probability distribution''' for <math>X</math>, <math>Y</math>, ... is a [[guide:82d603b116#Continuous probability distribution|probability density function]] that gives the probability that each of <math>X</math>, <math>Y</math>, ... falls in any particular range or discrete set of values specified for that variable. In the case of only two random variables, this is called a '''bivariate distribution''', but the concept generalizes to any number of random variables, giving a '''multivariate distribution'''.


The joint probability distribution can be expressed either in terms of a joint cumulative distribution function or in terms of a joint [[wikipedia:probability density function|probability density function]] (in the case of [[wikipedia:continuous variable|continuous variable]]s) or joint [[wikipedia:probability mass function|probability mass function]] (in the case of [[wikipedia:discrete|discrete]] variables).  
The joint probability distribution can be expressed either in terms of a joint cumulative distribution function or in terms of a joint [[guide:82d603b116#Continuous probability distribution|probability density function]] (in the case of [[guide:82d603b116#Continuous probability distribution|continuous variable]]s) or joint [[guide:82d603b116#Probability_Mass_Function|probability mass function]](in the case of [[guide:B5ab48c211|discrete]] variables).  


==Examples==
==Examples==


===Coin Flips===
===Coin Flips===
Consider the flip of two [[wikipedia:fair coin|fair coin]]s; let <math>A</math> and <math>B</math> be discrete random variables associated with the outcomes first and second coin flips respectively. If a coin displays "heads" then associated random variable is 1, and is 0 otherwise. The joint probability mass function of <math>A</math> and <math>B</math> defines probabilities for each pair of outcomes. All possible outcomes are<math display="block">
Consider the flip of two [[fair coin|fair coin]]s; let <math>A</math> and <math>B</math> be discrete random variables associated with the outcomes first and second coin flips respectively. If a coin displays "heads" then associated random variable is 1, and is 0 otherwise. The joint probability mass function of <math>A</math> and <math>B</math> defines probabilities for each pair of outcomes. All possible outcomes are<math display="block">
(A=0,B=0),
(A=0,B=0),
(A=0,B=1),
(A=0,B=1),
Line 21: Line 21:
<math display="block">\operatorname{P}(A,B)=\operatorname{P}(A)\operatorname{P}(B)</math>.
<math display="block">\operatorname{P}(A,B)=\operatorname{P}(A)\operatorname{P}(B)</math>.


In general, each coin flip is a [[wikipedia:Bernoulli trial|Bernoulli trial]] and the sequence of flips follows a [[wikipedia:Bernoulli distribution|Bernoulli distribution]].
In general, each coin flip is a [[Bernoulli trial|Bernoulli trial]] and the sequence of flips follows a [[guide:B5ab48c211#Bernoulli_distribution|Bernoulli distribution]].


===Dice Rolls===
===Dice Rolls===
Consider the roll of a fair [[wikipedia:dice|dice]] and let <math>A</math> = 1 if the number is even (i.e. 2, 4, or 6) and <math>A</math> = 0 otherwise. Furthermore, let <math>B</math> = 1 if the number is prime (i.e. 2, 3, or 5) and <math>B</math> = 0 otherwise.
Consider the roll of a fair [[dice|dice]] and let <math>A</math> = 1 if the number is even (i.e. 2, 4, or 6) and <math>A</math> = 0 otherwise. Furthermore, let <math>B</math> = 1 if the number is prime (i.e. 2, 3, or 5) and <math>B</math> = 0 otherwise.
{| class="table"
{| class="table"
|-
|-
Line 47: Line 47:
When dealing simultaneously with more than one random variable the ''joint'' cumulative distribution function can also be defined. For example, for a pair of random variables ''X,Y'', the joint CDF <math>F</math> is given by<math display="block">F(x,y) = \operatorname{P}(X\leq x,Y\leq y),</math>
When dealing simultaneously with more than one random variable the ''joint'' cumulative distribution function can also be defined. For example, for a pair of random variables ''X,Y'', the joint CDF <math>F</math> is given by<math display="block">F(x,y) = \operatorname{P}(X\leq x,Y\leq y),</math>


where the right-hand side represents the [[wikipedia:probability|probability]] that the random variable <math>X</math> takes on a value less than or
where the right-hand side represents the [[probability|probability]] that the random variable <math>X</math> takes on a value less than or
equal to <math>X</math> and that <math>Y</math> takes on a value less than or
equal to <math>X</math> and that <math>Y</math> takes on a value less than or
equal to <math>Y</math>.
equal to <math>Y</math>.
Line 61: Line 61:
===Discrete case===
===Discrete case===


The joint [[wikipedia:probability mass function|probability mass function]] of a sequence of random variables <math>X_1,\ldots,X_n</math> is the multivariate function<math display="block">
The joint [[guide:82d603b116#Probability_Mass_Function|probability mass function]]of a sequence of random variables <math>X_1,\ldots,X_n</math> is the multivariate function<math display="block">
\begin{equation}
\begin{equation}


Line 85: Line 85:
==Marginal Distribution Functions ==
==Marginal Distribution Functions ==


In [[wikipedia:probability theory|probability theory]] and [[wikipedia:statistics|statistics]], the '''marginal distribution''' of a [[wikipedia:subset|subset]] of a collection of [[wikipedia:random variable|random variable]]s is the [[wikipedia:probability distribution|probability distribution]] of the variables contained in the subset. It gives the probabilities of various values of the variables in the subset without reference to the values of the other variables. This contrasts with a [[wikipedia:conditional distribution|conditional distribution]], which gives the probabilities contingent upon the values of the other variables.
In [[probability theory|probability theory]] and [[statistics|statistics]], the '''marginal distribution''' of a [[subset|subset]] of a collection of [[guide:1b8642f694|random variable]]s is the [[guide:82d603b116#Continuous probability distribution|probability density function]] of the variables contained in the subset. It gives the probabilities of various values of the variables in the subset without reference to the values of the other variables. This contrasts with a [[conditional distribution|conditional distribution]], which gives the probabilities contingent upon the values of the other variables.


The term '''marginal variable''' is used to refer to those variables in the subset of variables being retained. These terms are dubbed "marginal" because they used to be found by summing values in a table along rows or columns, and writing the sum in the margins of the table.<ref>Trumpler and Weaver (1962), pp. 32–33.</ref> The distribution of the marginal variables (the marginal distribution) is obtained by '''marginalizing''' over the distribution of the variables being discarded, and the discarded variables are said to have been '''marginalized out'''.
The term '''marginal variable''' is used to refer to those variables in the subset of variables being retained. These terms are dubbed "marginal" because they used to be found by summing values in a table along rows or columns, and writing the sum in the margins of the table.<ref>Trumpler and Weaver (1962), pp. 32–33.</ref> The distribution of the marginal variables (the marginal distribution) is obtained by '''marginalizing''' over the distribution of the variables being discarded, and the discarded variables are said to have been '''marginalized out'''.


The context here is that the theoretical studies being undertaken, or the [[wikipedia:data analysis|data analysis]] being done, involves a wider set of random variables but that attention is being limited to a reduced number of those variables. In many applications an analysis may start with a given collection of random variables, then first extend the set by defining new ones (such as the sum of the original random variables) and finally reduce the number by placing interest in the marginal distribution of a subset (such as the sum). Several different analyses may be done, each treating a different subset of variables as the marginal variables.
The context here is that the theoretical studies being undertaken, or the [[data analysis|data analysis]] being done, involves a wider set of random variables but that attention is being limited to a reduced number of those variables. In many applications an analysis may start with a given collection of random variables, then first extend the set by defining new ones (such as the sum of the original random variables) and finally reduce the number by placing interest in the marginal distribution of a subset (such as the sum). Several different analyses may be done, each treating a different subset of variables as the marginal variables.


===Two-variable case===
===Two-variable case===
Line 116: Line 116:
!| <sup>32</sup><sup></sup>⁄<sub>32</sub>
!| <sup>32</sup><sup></sup>⁄<sub>32</sub>
|-
|-
|colspan=6|Joint and marginal distributions of a pair of discrete, random variables X,Y having nonzero [[wikipedia:mutual information|mutual information]] I(X; Y). The values of the joint distribution are in the 4×4 square, and the values of the marginal distributions are along the right and bottom margins.
|colspan=6|Joint and marginal distributions of a pair of discrete, random variables X,Y having nonzero [[mutual information|mutual information]] I(X; Y). The values of the joint distribution are in the 4×4 square, and the values of the marginal distributions are along the right and bottom margins.
|}
|}


Given two [[wikipedia:random variable|random variable]]s <math>X</math> and <math>Y</math> whose [[wikipedia:joint distribution|joint distribution]] is known, the marginal distribution of <math>X</math> is simply the [[wikipedia:probability distribution|probability distribution]] of <math>X</math> averaging over information about <math>Y</math>. It is the probability distribution of <math>X</math> when the value of <math>Y</math> is not known. This is typically calculated by summing or integrating the [[wikipedia:joint probability|joint probability]] distribution over <math>Y</math>.
Given two [[guide:1b8642f694|random variable]]s <math>X</math> and <math>Y</math> whose [[joint distribution|joint distribution]] is known, the marginal distribution of <math>X</math> is simply the [[guide:82d603b116#Continuous probability distribution|probability density function]] of <math>X</math> averaging over information about <math>Y</math>. It is the probability distribution of <math>X</math> when the value of <math>Y</math> is not known. This is typically calculated by summing or integrating the [[joint probability|joint probability]] distribution over <math>Y</math>.


For [[wikipedia:discrete random variable|discrete random variable]]s, the marginal [[wikipedia:probability mass function|probability mass function]] can be written as <math>\operatorname{P}(X=x)</math>.  This is
For [[guide:B5ab48c211|discrete random variable]]s, the marginal [[guide:82d603b116#Probability_Mass_Function|probability mass function]] can be written as <math>\operatorname{P}(X=x)</math>.  This is


<math display="block">\operatorname{P}(X=x) = \sum_{y} \operatorname{P}(X=x,Y=y)</math>
<math display="block">\operatorname{P}(X=x) = \sum_{y} \operatorname{P}(X=x,Y=y)</math>


where <math>\operatorname{P}(x,y)</math> is the [[wikipedia:joint distribution|joint distribution]] of <math>X</math> and <math>Y</math>. In this case, the variable <math>Y</math> has been marginalized out.  
where <math>\operatorname{P}(x,y)</math> is the [[joint distribution|joint distribution]] of <math>X</math> and <math>Y</math>. In this case, the variable <math>Y</math> has been marginalized out.  


Similarly for [[wikipedia:continuous random variable|continuous random variable]]s, the marginal [[wikipedia:probability density function|probability density function]] can be written as <math>f_X(x)</math>.  This is
Similarly for [[guide:269af6cf67|continuous random variable]]s, the marginal [[guide:82d603b116#Continuous probability distribution|probability density function]] can be written as <math>f_X(x)</math>.  This is


<math display="block">f_{X}(x) = \int_y f_{X,Y}(x,y) \, \operatorname{d}\!y</math>
<math display="block">f_{X}(x) = \int_y f_{X,Y}(x,y) \, \operatorname{d}\!y</math>
Line 155: Line 155:
=== i.i.d. Sequences ===
=== i.i.d. Sequences ===


In [[wikipedia:probability theory|probability theory]] and  [[wikipedia:statistics|statistics]], a [[wikipedia:sequence|sequence]] or other collection of [[wikipedia:random variable|random variable]]s is '''independent and identically distributed''' ('''i.i.d.''') if each random variable has the same [[wikipedia:probability distribution|probability distribution]] as the others and all are mutually [[wikipedia:independence (probability theory)|independent]].<ref>{{cite web | url = http://tuvalu.santafe.edu/~aaronc/courses/7000/csci7000-001_2011_L0.pdf | title = A brief primer on probability distributions | author = Aaron Clauset | publisher = [[wikipedia:Santa Fe Institute|Santa Fe Institute]]}}</ref>
In [[probability theory|probability theory]] and  [[statistics|statistics]], a [[sequence|sequence]] or other collection of [[guide:1b8642f694|random variable]]s is '''independent and identically distributed''' ('''i.i.d.''') if each random variable has the same [[guide:82d603b116#Continuous probability distribution|probability density function]] as the others and all are mutually [[guide:Af39987afc|independent]].<ref>{{cite web | url = http://tuvalu.santafe.edu/~aaronc/courses/7000/csci7000-001_2011_L0.pdf | title = A brief primer on probability distributions | author = Aaron Clauset | publisher = [[Santa Fe Institute|Santa Fe Institute]]}}</ref>


The [[wikipedia:abbreviation|abbreviation]] ''i.i.d.'' is particularly common in [[wikipedia:statistics|statistics]] (often as ''iid'', sometimes written ''IID''), where observations in a [[wikipedia:statistical sample|sample]] are often assumed to be effectively i.i.d. for the purposes of [[wikipedia:statistical inference|statistical inference]].  The assumption (or requirement) that observations be i.i.d. tends to simplify the underlying mathematics of many statistical methods (see [[wikipedia:mathematical statistics|mathematical statistics]] and [[wikipedia:statistical theory|statistical theory]]).  However, in practical applications of [[wikipedia:statistical modeling|statistical modeling]] the assumption may or may not be realistic. To test how realistic the assumption is on a given data set, the [[wikipedia:autocorrelation|autocorrelation]] can be computed, [[wikipedia:lag plot|lag plot]]s drawn or [[wikipedia:turning point test|turning point test]] performed.<ref>{{cite book | title = Performance Evaluation Of Computer And Communication Systems | first = Jean-Yves | last = Le Boudec | isbn = 978-2-940222-40-7 | year=2010 | publisher =[[wikipedia:EPFL Press|EPFL Press]]| url = http://infoscience.epfl.ch/record/146812/files/perfPublisherVersion.pdf |pages=46-47}}</ref>
The [[abbreviation|abbreviation]] ''i.i.d.'' is particularly common in [[statistics|statistics]] (often as ''iid'', sometimes written ''IID''), where observations in a [[statistical sample|sample]] are often assumed to be effectively i.i.d. for the purposes of [[statistical inference|statistical inference]].  The assumption (or requirement) that observations be i.i.d. tends to simplify the underlying mathematics of many statistical methods (see [[mathematical statistics|mathematical statistics]] and [[statistical theory|statistical theory]]).  However, in practical applications of [[statistical modeling|statistical modeling]] the assumption may or may not be realistic. To test how realistic the assumption is on a given data set, the [[autocorrelation|autocorrelation]] can be computed, [[lag plot|lag plot]]s drawn or [[turning point test|turning point test]] performed.<ref>{{cite book | title = Performance Evaluation Of Computer And Communication Systems | first = Jean-Yves | last = Le Boudec | isbn = 978-2-940222-40-7 | year=2010 | publisher =[[EPFL Press|EPFL Press]]| url = http://infoscience.epfl.ch/record/146812/files/perfPublisherVersion.pdf |pages=46-47}}</ref>
The generalization of [[wikipedia:exchangeable random variables|exchangeable random variables]] is often sufficient and more easily met.
The generalization of [[exchangeable random variables|exchangeable random variables]] is often sufficient and more easily met.


The assumption is important in the classical form of the [[wikipedia:central limit theorem|central limit theorem]], which states that the probability distribution of the sum (or average) of i.i.d. variables with finite [[wikipedia:variance|variance]] approaches a [[wikipedia:normal distribution|normal distribution]].
The assumption is important in the classical form of the [[guide:4b840b5280|central limit theorem]], which states that the probability distribution of the sum (or average) of i.i.d. variables with finite [[guide:E4d753a3b5|variance]] approaches a [[guide:269af6cf67#Normal Distribution|normal distribution]].


==Notes==
==Notes==

Revision as of 01:15, 5 April 2024

Given at least two random variables [math]X[/math], [math]Y[/math], ..., the joint probability distribution for [math]X[/math], [math]Y[/math], ... is a probability density function that gives the probability that each of [math]X[/math], [math]Y[/math], ... falls in any particular range or discrete set of values specified for that variable. In the case of only two random variables, this is called a bivariate distribution, but the concept generalizes to any number of random variables, giving a multivariate distribution.

The joint probability distribution can be expressed either in terms of a joint cumulative distribution function or in terms of a joint probability density function (in the case of continuous variables) or joint probability mass function(in the case of discrete variables).

Examples

Coin Flips

Consider the flip of two fair coins; let [math]A[/math] and [math]B[/math] be discrete random variables associated with the outcomes first and second coin flips respectively. If a coin displays "heads" then associated random variable is 1, and is 0 otherwise. The joint probability mass function of [math]A[/math] and [math]B[/math] defines probabilities for each pair of outcomes. All possible outcomes are

[[math]] (A=0,B=0), (A=0,B=1), (A=1,B=0), (A=1,B=1) [[/math]]

Since each outcome is equally likely the joint probability mass function becomes

[[math]]\operatorname{P}(A,B)=1/4[[/math]]

when [math]A,B\in\{0,1\}[/math]. Since the coin flips are independent, the joint probability mass function is the product of the marginals:

[[math]]\operatorname{P}(A,B)=\operatorname{P}(A)\operatorname{P}(B)[[/math]]

.

In general, each coin flip is a Bernoulli trial and the sequence of flips follows a Bernoulli distribution.

Dice Rolls

Consider the roll of a fair dice and let [math]A[/math] = 1 if the number is even (i.e. 2, 4, or 6) and [math]A[/math] = 0 otherwise. Furthermore, let [math]B[/math] = 1 if the number is prime (i.e. 2, 3, or 5) and [math]B[/math] = 0 otherwise.

1 2 3 4 5 6
A 0 1 0 1 0 1
B 0 1 1 0 1 0

Then, the joint distribution of [math]A[/math] and [math]B[/math], expressed as a probability mass function, is

[[math]] \mathrm{P}(A=0,B=0)=P\{1\}=\frac{1}{6},\; \mathrm{P}(A=1,B=0)=P\{4,6\}=\frac{2}{6}, [[/math]]
[[math]] \mathrm{P}(A=0,B=1)=P\{3,5\}=\frac{2}{6},\; \mathrm{P}(A=1,B=1)=P\{2\}=\frac{1}{6}. [[/math]]

These probabilities necessarily sum to 1, since the probability of some combination of [math]A[/math] and [math]B[/math] occurring is 1.

Cumulative Distribution Function

When dealing simultaneously with more than one random variable the joint cumulative distribution function can also be defined. For example, for a pair of random variables X,Y, the joint CDF [math]F[/math] is given by

[[math]]F(x,y) = \operatorname{P}(X\leq x,Y\leq y),[[/math]]

where the right-hand side represents the probability that the random variable [math]X[/math] takes on a value less than or equal to [math]X[/math] and that [math]Y[/math] takes on a value less than or equal to [math]Y[/math].

Every multivariate CDF is:

  1. Monotonically non-decreasing for each of its variables
  2. Right-continuous for each of its variables.
  3. [math]0\leq F(x_{1},...,x_{n})\leq 1[/math]
  4. [math]\lim_{x_{1},...,x_{n}\rightarrow+\infty}F(x_{1},...,x_{n})=1[/math] and [math]\lim_{x_{i}\rightarrow-\infty}F(x_{1},...,x_{n})=0,\quad \mbox{for all }i[/math]

Density function or mass function

Discrete case

The joint probability mass functionof a sequence of random variables [math]X_1,\ldots,X_n[/math] is the multivariate function

[[math]] \begin{equation} \operatorname{P}(X_1=x_1,\dots,X_n=x_n). \end{equation} [[/math]]

Since these are probabilities, we must have

[[math]]\sum_{i} \sum_{j} \dots \sum_{k} \mathrm{P}(X_1=x_{1i},X_2=x_{2j}, \dots, X_n=x_{nk}) = 1.\;[[/math]]

Continuous case

If [math]X_1,\ldots,X_n[/math] are continuous random variables with

[[math]] F_{X_1,\ldots,X_n}(x_1,\ldots,x_n) = \int_{-\infty}^{x_1}\cdots \int_{-\infty}^{x_n} f_{X_1,\ldots,X_n}(z_1,\ldots,z_n) \,\, dz_1 \cdots dz_n [[/math]]

,then [math]f_{X_1,\ldots,X_n} [/math] is said to be a joint density function function for the sequence of random variables.

Marginal Distribution Functions

In probability theory and statistics, the marginal distribution of a subset of a collection of random variables is the probability density function of the variables contained in the subset. It gives the probabilities of various values of the variables in the subset without reference to the values of the other variables. This contrasts with a conditional distribution, which gives the probabilities contingent upon the values of the other variables.

The term marginal variable is used to refer to those variables in the subset of variables being retained. These terms are dubbed "marginal" because they used to be found by summing values in a table along rows or columns, and writing the sum in the margins of the table.[1] The distribution of the marginal variables (the marginal distribution) is obtained by marginalizing over the distribution of the variables being discarded, and the discarded variables are said to have been marginalized out.

The context here is that the theoretical studies being undertaken, or the data analysis being done, involves a wider set of random variables but that attention is being limited to a reduced number of those variables. In many applications an analysis may start with a given collection of random variables, then first extend the set by defining new ones (such as the sum of the original random variables) and finally reduce the number by placing interest in the marginal distribution of a subset (such as the sum). Several different analyses may be done, each treating a different subset of variables as the marginal variables.

Two-variable case

X
Y
x1 x2 x3 x4 py(Y)↓
y1 432 232 132 132 832
y2 232 432 132 132 832
y3 232 232 232 232 832
y4 832 0 0 0 832
px(X) → 1632 832 432 432 3232
Joint and marginal distributions of a pair of discrete, random variables X,Y having nonzero mutual information I(X; Y). The values of the joint distribution are in the 4×4 square, and the values of the marginal distributions are along the right and bottom margins.

Given two random variables [math]X[/math] and [math]Y[/math] whose joint distribution is known, the marginal distribution of [math]X[/math] is simply the probability density function of [math]X[/math] averaging over information about [math]Y[/math]. It is the probability distribution of [math]X[/math] when the value of [math]Y[/math] is not known. This is typically calculated by summing or integrating the joint probability distribution over [math]Y[/math].

For discrete random variables, the marginal probability mass function can be written as [math]\operatorname{P}(X=x)[/math]. This is

[[math]]\operatorname{P}(X=x) = \sum_{y} \operatorname{P}(X=x,Y=y)[[/math]]

where [math]\operatorname{P}(x,y)[/math] is the joint distribution of [math]X[/math] and [math]Y[/math]. In this case, the variable [math]Y[/math] has been marginalized out.

Similarly for continuous random variables, the marginal probability density function can be written as [math]f_X(x)[/math]. This is

[[math]]f_{X}(x) = \int_y f_{X,Y}(x,y) \, \operatorname{d}\!y[[/math]]

where [math]f_{X,Y}(x,y)[/math] gives the joint density function of [math]X[/math] and [math]Y[/math]. Again, the variable [math]Y[/math] has been marginalized out.

More than two variables

For [math]i=1,\ldots,n[/math], let [math]f_X(i)(x_i)[/math] be the probability density function associated with variable [math]X_i[/math] alone. This is called the “marginal” density function, and can be deduced from the probability density associated with the random variables [math]X_1,\ldots,X_n[/math] by integrating on all values of the [math]n-1[/math] other variables:

[[math]]f_{X_i}(x_i) = \int f(x_1,\ldots,x_n)\, dx_1 \cdots dx_{i-1}\,dx_{i+1}\cdots dx_n .[[/math]]

Independence

A set of random variables is pairwise independent if and only if every pair of random variables is independent.

A set of random variables is mutually independent if and only if for any finite subset [math]X_1, \ldots, X_n[/math] and any finite sequence of numbers [math]a_1, \ldots, a_n[/math], the events

[[math]]\{X_1 \le a_1\}, \ldots, \{X_n \le a_n\}[[/math]]

are mutually independent events.

If the joint probability density function of a vector of [math]n[/math] random variables can be factored into a product of [math]n[/math] functions of one variable

[[math]]f_{X_1,\ldots,X_n}(x_1,\ldots,x_n) = f_1(x_1)\cdots f_n(x_n),[[/math]]

(where each [math]f_i[/math] is not necessarily a density) then the [math]n[/math] variables in the set are all independent from each other, and the marginal probability density function of each of them is given by

[[math]]f_{X_i}(x_i) = \frac{f_i(x_i)}{\int f_i(x)\,dx}.[[/math]]

i.i.d. Sequences

In probability theory and statistics, a sequence or other collection of random variables is independent and identically distributed (i.i.d.) if each random variable has the same probability density function as the others and all are mutually independent.[2]

The abbreviation i.i.d. is particularly common in statistics (often as iid, sometimes written IID), where observations in a sample are often assumed to be effectively i.i.d. for the purposes of statistical inference. The assumption (or requirement) that observations be i.i.d. tends to simplify the underlying mathematics of many statistical methods (see mathematical statistics and statistical theory). However, in practical applications of statistical modeling the assumption may or may not be realistic. To test how realistic the assumption is on a given data set, the autocorrelation can be computed, lag plots drawn or turning point test performed.[3] The generalization of exchangeable random variables is often sufficient and more easily met.

The assumption is important in the classical form of the central limit theorem, which states that the probability distribution of the sum (or average) of i.i.d. variables with finite variance approaches a normal distribution.

Notes

  1. Trumpler and Weaver (1962), pp. 32–33.
  2. Aaron Clauset. "A brief primer on probability distributions" (PDF). Santa Fe Institute.
  3. Le Boudec, Jean-Yves (2010). Performance Evaluation Of Computer And Communication Systems (PDF). EPFL Press. pp. 46–47. ISBN 978-2-940222-40-7.

References