guide:82d603b116: Difference between revisions

From Stochiki
No edit summary
mNo edit summary
Line 1: Line 1:
A '''probability distribution''' assigns a probability to each [[wikipedia:measure (mathematics)|measurable subset]] of the possible outcomes of a random [[wikipedia:Experiment (probability theory)|experiment]], [[wikipedia:Survey methodology|survey]], or procedure of [[wikipedia:statistical inference|statistical inference]]. Examples are found in experiments whose [[wikipedia:sample space|sample space]] is non-numerical, where the distribution would be a [[wikipedia:categorical distribution|categorical distribution]]; experiments whose sample space is encoded by discrete [[wikipedia:random_variable|random variable]], where the distribution can be specified by a [[wikipedia:probability_mass_function |probability mas function]]; and experiments with sample spaces encoded by continuous random variables, where the distribution can be specified by a [[#Continuous probability distribution|probability density function]]
A '''probability distribution''' assigns a probability to each [[measure (mathematics)|measurable subset]] of the possible outcomes of a random [[Experiment (probability theory)|experiment]], [[Survey methodology|survey]], or procedure of [[statistical inference|statistical inference]]. Examples are found in experiments whose [[sample space|sample space]] is non-numerical, where the distribution would be a [[categorical distribution|categorical distribution]]; experiments whose sample space is encoded by discrete [[random_variable|random variable]], where the distribution can be specified by a [[#probability Mass Function|probability mass function]]; and experiments with sample spaces encoded by continuous random variables, where the distribution can be specified by a [[#Continuous probability distribution|probability density function]]


A univariate distribution gives the probabilities of a single [[wikipedia:random_variable|random variable]] taking on various alternative values. Important and commonly encountered univariate probability distributions include the [[wikipedia:binomial_distribution| binomial distribution]], the [[wikipedia:hypergeometric distribution|hypergeometric distribution]], and the [[wikipedia:normal_distribution|normal distribution]].
A univariate distribution gives the probabilities of a single random variable taking on various alternative values. Important and commonly encountered univariate probability distributions include the [[guide:B5ab48c211#Binomial| binomial distribution]], the [[guide:B5ab48c211#Hypergeometric_ Distribution|hypergeometric distribution]], and the [[guide:269af6cf67#Normal_Distribution|normal distribution]].


==Introduction==
==Introduction==
Line 7: Line 7:
To define probability distributions for the simplest cases, one needs to distinguish between discrete and continuous random variables. In the discrete case, one can easily assign a probability to each possible value: for example, when throwing a fair dice, each of the six values ''1'' to ''6'' has the probability 1/6. In contrast, when a random variable takes values from a continuum then, typically, probabilities can be nonzero only if they refer to intervals: in quality control one might demand that the probability of a "500 g" package containing between 490 g and 510 g should be no less than 98%.
To define probability distributions for the simplest cases, one needs to distinguish between discrete and continuous random variables. In the discrete case, one can easily assign a probability to each possible value: for example, when throwing a fair dice, each of the six values ''1'' to ''6'' has the probability 1/6. In contrast, when a random variable takes values from a continuum then, typically, probabilities can be nonzero only if they refer to intervals: in quality control one might demand that the probability of a "500 g" package containing between 490 g and 510 g should be no less than 98%.


If the random variable is real-valued (or more generally, if a [[wikipedia:total order|total order]] is defined for its possible values), the cumulative distribution function (CDF) gives the probability that the random variable is no larger than a given value; in the real-valued case, the CDF is the [[wikipedia:integral|integral]] of the [[wikipedia:probability_density_function | probability density function ]] (pdf) provided that this function exists.
If the random variable is real-valued (or more generally, if a [[total order|total order]] is defined for its possible values), the cumulative distribution function (CDF) gives the probability that the random variable is no larger than a given value; in the real-valued case, the CDF is the [[integral|integral]] of the [[#Continuous probability distribution|probability density function]] (pdf) provided that this function exists.


==Cumulative distribution function==
==Cumulative distribution function==
Because a probability distribution P on the real line is determined by the probability of a [[wikipedia:Scalar (mathematics)|scalar]] random variable <math>X</math> being in a half-open interval <nowiki>(</nowiki>-∞,&nbsp;<math>X</math><nowiki>]</nowiki>, the probability distribution is completely characterized by its cumulative distribution function:
Because a probability distribution P on the real line is determined by the probability of a [[Scalar (mathematics)|scalar]] random variable <math>X</math> being in a half-open interval <nowiki>(</nowiki>-∞,&nbsp;<math>X</math><nowiki>]</nowiki>, the probability distribution is completely characterized by its cumulative distribution function:


<math display="block"> F(x) = \operatorname{P} \left[ X \le x \right] \qquad \text{ for all } x \in \mathbb{R}.</math>
<math display="block"> F(x) = \operatorname{P} \left[ X \le x \right] \qquad \text{ for all } x \in \mathbb{R}.</math>
Line 16: Line 16:
==Discrete probability distribution==
==Discrete probability distribution==


A '''discrete probability distribution''' is a ''probability distribution'' characterized by a [[#probability Mass Function|probability mass function]].  Thus, the distribution of a [[wikipedia:random_variable|random variable]] <math>X</math> is discrete, and <math>X</math> is called a '''discrete random variable''', if<math display="block">\sum_u \operatorname{P}(X=u) = 1</math>
A '''discrete probability distribution''' is a ''probability distribution'' characterized by a [[#probability Mass Function|probability mass function]].  Thus, the distribution of a [[random_variable|random variable]] <math>X</math> is discrete, and <math>X</math> is called a '''discrete random variable''', if<math display="block">\sum_u \operatorname{P}(X=u) = 1</math>


as <math>u</math> runs through the set of all possible values of <math>X</math>. Hence, a random variable can assume only a [[wikipedia:finite set|finite]] or [[wikipedia:countable|countably infinite]] number of values&mdash;the random variable is a [[wikipedia:discrete variable|discrete variable]]. For the number of potential values to be countably infinite, even though their probabilities sum to 1, the probabilities have to decline to zero fast enough. For example, if <math>\operatorname{P}(X=n) = \tfrac{1}{2^n}</math> for <math>n = 1, 2, \ldots,</math> we have the sum of probabilities 1/2 + 1/4 + 1/8 + ... = 1.
as <math>u</math> runs through the set of all possible values of <math>X</math>. Hence, a random variable can assume only a [[finite set|finite]] or [[countable|countably infinite]] number of values&mdash;the random variable is a [[discrete variable|discrete variable]]. For the number of potential values to be countably infinite, even though their probabilities sum to 1, the probabilities have to decline to zero fast enough. For example, if <math>\operatorname{P}(X=n) = \tfrac{1}{2^n}</math> for <math>n = 1, 2, \ldots,</math> we have the sum of probabilities 1/2 + 1/4 + 1/8 + ... = 1.


Well-known discrete probability distributions used in statistical modeling include the [[wikipedia:poisson_distribution | poisson distribution]], the [[wikipedia:Bernoulli distribution|Bernoulli distribution]], the [[wikipedia:binomial_distribution|binomial distribution]], the [[wikipedia:geometric_distribution|geometric distribution]], and the [[wikipedia:negative binomial distribution|negative binomial distribution]]. Additionally, the [[wikipedia:Uniform distribution (discrete)|discrete uniform distribution]] is commonly used in computer programs that make equal-probability random selections between a number of choices.
Well-known discrete probability distributions used in statistical modeling include the [[guide:B5ab48c211#Poisson_Distribution | poisson distribution]], the [[guide:B5ab48c211#Bernoulli distribution|Bernoulli distribution]], the [[guide:B5ab48c211#binomial_distribution|binomial distribution]], the [[guide:B5ab48c211#Geometric|geometric distribution]], and the [[guide:B5ab48c211#Negative_Binomial|negative binomial distribution]]. Additionally, the [[Uniform distribution (discrete)|discrete uniform distribution]] is commonly used in computer programs that make equal-probability random selections between a number of choices.


===Probability Mass Function ===
===Probability Mass Function ===


Suppose that <math>X</math> is a [[wikipedia:discrete random variable|discrete random variable]] defined on a [[wikipedia:sample space|sample space]] <math>S</math>. Then the probability mass function <math>f_X</math> is defined as<ref>{{cite book|author=Kumar, Dinesh|title=Reliability & Six Sigma|publisher=Birkhäuser|year=2006|isbn=978-0-387-30255-3|page=22|url=https://books.google.com/books?id=XsX20uCFJbYC&pg=PA22}}</ref><ref>{{cite book|author=Rao, S.S.|title=Engineering optimization: theory and practice|publisher=John Wiley & Sons|year=1996|isbn=978-0-471-55034-1|page=717|url=https://books.google.com/books?id=nuoryE4IwMoC&pg=PA717}}</ref><math display="block">f_X(x) = \operatorname{P}(X = x) = \operatorname{P}(\{s \in S: X(s) = x\}).</math>
Suppose that <math>X</math> is a [[discrete random variable|discrete random variable]] defined on a [[sample space|sample space]] <math>S</math>. Then the probability mass function <math>f_X</math> is defined as<ref>{{cite book|author=Kumar, Dinesh|title=Reliability & Six Sigma|publisher=Birkhäuser|year=2006|isbn=978-0-387-30255-3|page=22|url=https://books.google.com/books?id=XsX20uCFJbYC&pg=PA22}}</ref><ref>{{cite book|author=Rao, S.S.|title=Engineering optimization: theory and practice|publisher=John Wiley & Sons|year=1996|isbn=978-0-471-55034-1|page=717|url=https://books.google.com/books?id=nuoryE4IwMoC&pg=PA717}}</ref><math display="block">f_X(x) = \operatorname{P}(X = x) = \operatorname{P}(\{s \in S: X(s) = x\}).</math>


Thinking of probability as mass helps to avoid mistakes since the physical mass is conserved as is the total probability for all hypothetical outcomes <math>X</math>:<math display="block">\sum_{x\in A} f_X(x) = 1</math>
Thinking of probability as mass helps to avoid mistakes since the physical mass is conserved as is the total probability for all hypothetical outcomes <math>X</math>:<math display="block">\sum_{x\in A} f_X(x) = 1</math>


When there is a natural order among the hypotheses <math>X</math>, it may be convenient to assign numerical values to them (or ''n''-tuples in case of a discrete [[wikipedia:multivariate random variable|multivariate random variable]]) and to consider also values not in the [[wikipedia:Image (mathematics)|image]] of <math>X</math>. That is, <math>f_X</math> may be defined for all [[wikipedia:real number|real number]]s and <math>f_X = 0</math> for all <math>X</math> not in <math> X(S)</math>.  
When there is a natural order among the hypotheses <math>X</math>, it may be convenient to assign numerical values to them (or ''n''-tuples in case of a discrete [[multivariate random variable|multivariate random variable]]) and to consider also values not in the [[Image (mathematics)|image]] of <math>X</math>. That is, <math>f_X</math> may be defined for all [[real number|real number]]s and <math>f_X = 0</math> for all <math>X</math> not in <math> X(S)</math>.  


Since the image of <math>X</math> is [[wikipedia:countable|countable]], the probability mass function <math>f_X(x)</math> is zero for all but a countable number of values of <math>X</math>. The discontinuity of probability mass functions is related to the fact that the cumulative distribution function of a discrete random variable is also discontinuous.  Where it is differentiable, the derivative is zero, just as the probability mass function is zero at all such points.
Since the image of <math>X</math> is [[countable|countable]], the probability mass function <math>f_X(x)</math> is zero for all but a countable number of values of <math>X</math>. The discontinuity of probability mass functions is related to the fact that the cumulative distribution function of a discrete random variable is also discontinuous.  Where it is differentiable, the derivative is zero, just as the probability mass function is zero at all such points.


===Cumulative Distribution===
===Cumulative Distribution===


Equivalently to the above, a discrete random variable can be defined as a random variable whose cumulative distribution function (cdf) increases only by [[wikipedia:jump discontinuity|jump discontinuities]]—that is, its cdf increases only where it "jumps" to a higher value, and is constant between those jumps. The points where jumps occur are precisely the values which the random variable may take.
Equivalently to the above, a discrete random variable can be defined as a random variable whose cumulative distribution function (cdf) increases only by [[jump discontinuity|jump discontinuities]]—that is, its cdf increases only where it "jumps" to a higher value, and is constant between those jumps. The points where jumps occur are precisely the values which the random variable may take.


===Indicator-function representation===
===Indicator-function representation===
Line 41: Line 41:
<math display="block">\Omega_i=X^{-1}(u_i)= \{\omega: X(\omega)=u_i\},\, i=0, 1, 2, \dots</math>
<math display="block">\Omega_i=X^{-1}(u_i)= \{\omega: X(\omega)=u_i\},\, i=0, 1, 2, \dots</math>


These are [[wikipedia:disjoint set|disjoint set]]s, and by formula (1)<math display="block">\operatorname{P}\left(\bigcup_i \Omega_i\right)=\sum_i \operatorname{P}(\Omega_i)=\sum_i\operatorname{P}(X=u_i)=1.</math>
These are [[disjoint set|disjoint set]]s, and by formula (1)<math display="block">\operatorname{P}\left(\bigcup_i \Omega_i\right)=\sum_i \operatorname{P}(\Omega_i)=\sum_i\operatorname{P}(X=u_i)=1.</math>


It follows that the probability that <math>X</math> takes any value except for <math>u_0, u_1, \ldots </math> is zero, and thus one can write <math>X</math> as<math display="block">X=\sum_i u_i 1_{\Omega_i}</math>
It follows that the probability that <math>X</math> takes any value except for <math>u_0, u_1, \ldots </math> is zero, and thus one can write <math>X</math> as<math display="block">X=\sum_i u_i 1_{\Omega_i}</math>


except on a set of probability zero, where <math>1_A</math> is the [[wikipedia:indicator function|indicator function]] of <math>A</math>. This may serve as an alternative definition of discrete random variables.
except on a set of probability zero, where <math>1_A</math> is the [[indicator function|indicator function]] of <math>A</math>. This may serve as an alternative definition of discrete random variables.


==Continuous probability distribution==
==Continuous probability distribution==


A '''continuous probability distribution''' is a ''probability distribution'' that has a cumulative distribution function that is continuous.  Most often they are generated by having a [[wikipedia:probability_density_function|probability density function]]. Mathematicians call distributions with probability density functions '''absolutely continuous''', since their cumulative distribution function is [[wikipedia:absolute continuity|absolutely continuous]] with respect to the [[wikipedia:Lebesgue measure|Lebesgue measure]]. If the distribution of <math>X</math> is continuous, then <math>X</math> is called a '''continuous random variable'''.  
A '''continuous probability distribution''' is a ''probability distribution'' that has a cumulative distribution function that is continuous.  Most often they are generated by having a probability density function. Mathematicians call distributions with probability density functions '''absolutely continuous''', since their cumulative distribution function is [[absolute continuity|absolutely continuous]] with respect to the [[Lebesgue measure|Lebesgue measure]]. If the distribution of <math>X</math> is continuous, then <math>X</math> is called a '''continuous random variable'''.  


Intuitively, a continuous random variable is the one which can take a [[wikipedia:Continuous and discrete variables|continuous range of values]]—as opposed to a discrete distribution, where the set of possible values for the random variable is at most [[wikipedia:countable set|countable]]. While for a discrete distribution an [[wikipedia:event (probability theory)|event]] with probability zero is impossible (e.g., rolling 3.5 on a standard dice is impossible, and has probability zero), this is not so in the case of a continuous random variable.  
Intuitively, a continuous random variable is the one which can take a [[Continuous and discrete variables|continuous range of values]]—as opposed to a discrete distribution, where the set of possible values for the random variable is at most [[countable set|countable]]. While for a discrete distribution an [[event (probability theory)|event]] with probability zero is impossible (e.g., rolling 3.5 on a standard dice is impossible, and has probability zero), this is not so in the case of a continuous random variable.  


Formally, if <math>X</math> is a continuous random variable, then it has a [[wikipedia:probability_density_function|probability density function]] <math>f_X</math>, and therefore its probability of falling into a given interval, say <math>[a,b]</math>, is given by the integral<math display="block">
Formally, if <math>X</math> is a continuous random variable, then it has a probability density function <math>f_X</math>, and therefore its probability of falling into a given interval, say <math>[a,b]</math>, is given by the integral<math display="block">
\operatorname{P}[a\le X\le b] = \int_a^b f(x) \, dx
\operatorname{P}[a\le X\le b] = \int_a^b f(x) \, dx
   </math>
   </math>
In particular, the probability for <math>X</math> to take any single value is zero, because an [[wikipedia:integral|integral]] with coinciding upper and lower limits is always equal to zero.
In particular, the probability for <math>X</math> to take any single value is zero, because an [[integral|integral]] with coinciding upper and lower limits is always equal to zero.


The definition states that a continuous probability distribution must possess a density, or equivalently, its cumulative distribution function be absolutely continuous. This requirement is stronger than simple continuity of the cumulative distribution function, and there is a special class of distributions, [[wikipedia:singular distribution|singular distribution]]s, which are neither continuous nor discrete nor a mixture of those. An example is given by the [[wikipedia:Cantor distribution|Cantor distribution]]. Such singular distributions however are never encountered in practice.
The definition states that a continuous probability distribution must possess a density, or equivalently, its cumulative distribution function be absolutely continuous. This requirement is stronger than simple continuity of the cumulative distribution function, and there is a special class of distributions, [[singular distribution|singular distribution]]s, which are neither continuous nor discrete nor a mixture of those. An example is given by the [[Cantor distribution|Cantor distribution]]. Such singular distributions however are never encountered in practice.


Note on terminology: some authors use the term "continuous distribution" to denote the distribution with continuous cumulative distribution function. Thus, their definition includes both the (absolutely) continuous and singular distributions.
Note on terminology: some authors use the term "continuous distribution" to denote the distribution with continuous cumulative distribution function. Thus, their definition includes both the (absolutely) continuous and singular distributions.


By one convention, a probability distribution <math>\,\mu</math> is called ''continuous'' if its cumulative distribution function <math>F(x)=\mu(-\infty,x]</math> is [[wikipedia:continuous function|continuous]] and, therefore, the probability measure of singletons <math>\mu\{x\}\,=\,0</math> for all <math>\,x</math>.
By one convention, a probability distribution <math>\,\mu</math> is called ''continuous'' if its cumulative distribution function <math>F(x)=\mu(-\infty,x]</math> is [[continuous function|continuous]] and, therefore, the probability measure of singletons <math>\mu\{x\}\,=\,0</math> for all <math>\,x</math>.


Another convention reserves the term ''continuous probability distribution'' for [[wikipedia:absolute continuity|absolutely continuous]] distributions. These distributions can be characterized by a probability density function: a non-negative [[wikipedia:Lebesgue integration|Lebesgue integrable]] function <math>\,f</math> defined on the real numbers such that<math display="block">
Another convention reserves the term ''continuous probability distribution'' for [[absolute continuity|absolutely continuous]] distributions. These distributions can be characterized by a probability density function: a non-negative [[Lebesgue integration|Lebesgue integrable]] function <math>\,f</math> defined on the real numbers such that<math display="block">
F(x) = \mu(-\infty,x] = \int_{-\infty}^x f(t)\,dt.
F(x) = \mu(-\infty,x] = \int_{-\infty}^x f(t)\,dt.
</math>
</math>


Discrete distributions and some continuous distributions (like the [[wikipedia:Cantor distribution|Cantor distribution]]) do not admit such a density.
Discrete distributions and some continuous distributions (like the [[Cantor distribution|Cantor distribution]]) do not admit such a density.


===Further details===
===Further details===


Not every probability distribution has a density function: the distributions of [[wikipedia:discrete random variable|discrete random variable]]s do not; nor does the [[wikipedia:Cantor distribution|Cantor distribution]], even though it has no discrete component, i.e., does not assign positive probability to any individual point.
Not every probability distribution has a density function: the distributions of [[discrete random variable|discrete random variable]]s do not; nor does the [[Cantor distribution|Cantor distribution]], even though it has no discrete component, i.e., does not assign positive probability to any individual point.


A distribution has a density function if and only if its [[wikipedia:cumulative_distribution_function|cumulative distribution function]] <math>F(x)</math> is [[wikipedia:absolute continuity|absolutely continuous]]. In this case: <math>F</math>  is [[wikipedia:almost everywhere|almost everywhere]] [[wikipedia:derivative|differentiable]], and its derivative can be used as probability density:<math display="block">
A distribution has a density function if and only if its cumulative distribution function <math>F(x)</math> is [[absolute continuity|absolutely continuous]]. In this case: <math>F</math>  is [[almost everywhere|almost everywhere]] [[derivative|differentiable]], and its derivative can be used as probability density:<math display="block">
     \frac{d}{dx}F(x) = f(x).
     \frac{d}{dx}F(x) = f(x).
   </math>
   </math>
Line 80: Line 80:
If a probability distribution admits a density, then the probability of every one-point set {''a''} is zero; the same holds for finite and countable sets.  
If a probability distribution admits a density, then the probability of every one-point set {''a''} is zero; the same holds for finite and countable sets.  


Two probability densities <math>f</math> and <math>g</math> represent the same [[wikipedia:probability_distribution|probability distribution]] precisely if they differ only on a set of [[wikipedia:Lebesgue measure|Lebesgue]] [[wikipedia:measure zero|measure zero]].
Two probability densities <math>f</math> and <math>g</math> represent the same probability distribution precisely if they differ only on a set of [[Lebesgue measure|Lebesgue]] [[measure zero|measure zero]].


=== Families of densities ===
=== Families of densities ===


It is common for probability density functions (and [[wikipedia:probability_mass_functiton|probability mass function]]s) to  
It is common for probability density functions (and [[probability_mass_functiton|probability mass function]]s) to  
be parametrized&mdash;that is, to be characterized by unspecified [[wikipedia:parameter|parameter]]s.  For example, the [[wikipedia:normal_distribution|normal distribution]] is parametrized in terms of the [[wikipedia:mean|mean]] and the [[wikipedia:variance|variance]], denoted by <math>\mu</math> and <math>\sigma^2</math> respectively, giving the family of densities<math display="block">
be parametrized&mdash;that is, to be characterized by unspecified [[parameter|parameter]]s.  For example, the [[normal_distribution|normal distribution]] is parametrized in terms of the [[mean|mean]] and the [[variance|variance]], denoted by <math>\mu</math> and <math>\sigma^2</math> respectively, giving the family of densities<math display="block">
   f(x;\mu,\sigma^2) = \frac{1}{\sigma\sqrt{2\pi}} e^{ -\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2 }.
   f(x;\mu,\sigma^2) = \frac{1}{\sigma\sqrt{2\pi}} e^{ -\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2 }.
</math>
</math>
It is important to keep in mind the difference between the [[wikipedia:Domain of a function|domain]] of a family of densities and the parameters of the family.  Different values of the parameters describe different distributions of different [[wikipedia:random_variable|random variable]]s on the same [[wikipedia:sample space|sample space]] (the same set of all possible values of the variable); this sample space is the domain of the family of random variables that this family of distributions describes.  A given set of parameters describes a single distribution within the family sharing the functional form of the density. From the perspective of a given distribution, the parameters are constants, and terms in a density function that contain only parameters, but not variables, are part of the [[wikipedia:normalization factor|normalization factor]] of a distribution (the multiplicative factor that ensures that the area under the density&mdash;the probability of ''something'' in the domain occurring&mdash; equals 1). This normalization factor is outside the [[wikipedia:kernel (statistics)|kernel]] of the distribution.
It is important to keep in mind the difference between the [[Domain of a function|domain]] of a family of densities and the parameters of the family.  Different values of the parameters describe different distributions of different [[random_variable|random variable]]s on the same [[sample space|sample space]] (the same set of all possible values of the variable); this sample space is the domain of the family of random variables that this family of distributions describes.  A given set of parameters describes a single distribution within the family sharing the functional form of the density. From the perspective of a given distribution, the parameters are constants, and terms in a density function that contain only parameters, but not variables, are part of the [[normalization factor|normalization factor]] of a distribution (the multiplicative factor that ensures that the area under the density&mdash;the probability of ''something'' in the domain occurring&mdash; equals 1). This normalization factor is outside the [[kernel (statistics)|kernel]] of the distribution.


==Expected Value==
==Expected Value==


In probability theory, the '''expected value''' of a random variable, intuitively, is the long-run average value of repetitions of the experiment it represents. For example, within an insurance context, the expected loss can be thought of as the average loss incurred by an insurer on a very large portfolio of policies sharing a common loss distribution (similar risk profile). Less roughly, the [[wikipedia:law of large numbers|law of large numbers]] states that the [[wikipedia:arithmetic mean|arithmetic mean]] of the values [[wikipedia:almost surely|almost surely]] converges to the expected value as the number of repetitions approaches infinity.  
In probability theory, the '''expected value''' of a random variable, intuitively, is the long-run average value of repetitions of the experiment it represents. For example, within an insurance context, the expected loss can be thought of as the average loss incurred by an insurer on a very large portfolio of policies sharing a common loss distribution (similar risk profile). Less roughly, the [[law of large numbers|law of large numbers]] states that the [[arithmetic mean|arithmetic mean]] of the values [[almost surely|almost surely]] converges to the expected value as the number of repetitions approaches infinity.  


The expected value does not exist for random variables having some distributions with [[wikipedia:Heavy tails|large "tails"]], such as the [[wikipedia:Cauchy distribution|Cauchy distribution]].<ref name=Hamming2>{{cite book |title=The art of probability for scientists and engineers |author=Richard W Hamming |chapter=Example 8.7–1 The Cauchy distribution |page=290 ''ff'' |publisher=Addison-Wesley |year=1991 |isbn=0-201-40686-1 |url=https://books.google.com/books?id=jX_F-77TA3gC&printsec=frontcover&dq=isbn:0201406861&cd=1#v=onepage&q=Cauchy&f=false |quote=Sampling from the Cauchy distribution and averaging gets you nowhere — one sample has the same distribution as the average of 1000 samples!}}</ref>  For random variables such as these, the long-tails of the distribution prevent the sum/integral from converging. That being said, most loss models encountered in insurance implicitly assume finite expected losses.  
The expected value does not exist for random variables having some distributions with [[Heavy tails|large "tails"]], such as the [[Cauchy distribution|Cauchy distribution]].<ref name=Hamming2>{{cite book |title=The art of probability for scientists and engineers |author=Richard W Hamming |chapter=Example 8.7–1 The Cauchy distribution |page=290 ''ff'' |publisher=Addison-Wesley |year=1991 |isbn=0-201-40686-1 |url=https://books.google.com/books?id=jX_F-77TA3gC&printsec=frontcover&dq=isbn:0201406861&cd=1#v=onepage&q=Cauchy&f=false |quote=Sampling from the Cauchy distribution and averaging gets you nowhere — one sample has the same distribution as the average of 1000 samples!}}</ref>  For random variables such as these, the long-tails of the distribution prevent the sum/integral from converging. That being said, most loss models encountered in insurance implicitly assume finite expected losses.  


The expected value is also known as the '''expectation''', '''mathematical expectation''', '''EV''', '''average''', '''mean value''', '''mean''', or '''first moment'''.
The expected value is also known as the '''expectation''', '''mathematical expectation''', '''EV''', '''average''', '''mean value''', '''mean''', or '''first moment'''.


===Univariate discrete random variable ===
===Univariate discrete random variable ===
Let <math>X</math> be a [[wikipedia:discrete random variable|discrete random variable]] taking values <math>x_1,x_2,\ldots</math> with probabilities <math>p_1,p_2,\ldots</math> respectively. Then the expected value of this random variable is the [[wikipedia:infinite sum|infinite sum]]<math display="block">  \operatorname{E}[X] = \sum_{i=1}^\infty x_i\, p_i,</math>
Let <math>X</math> be a [[discrete random variable|discrete random variable]] taking values <math>x_1,x_2,\ldots</math> with probabilities <math>p_1,p_2,\ldots</math> respectively. Then the expected value of this random variable is the [[infinite sum|infinite sum]]<math display="block">  \operatorname{E}[X] = \sum_{i=1}^\infty x_i\, p_i,</math>
provided that this series [[wikipedia:absolute convergence|converges absolutely]] (that is, the sum must remain finite if we were to replace all <math>x_i</math>s with their absolute values). If this series does not converge absolutely, we say that the expected value of <math>X</math> does not exist.
provided that this series [[absolute convergence|converges absolutely]] (that is, the sum must remain finite if we were to replace all <math>x_i</math>s with their absolute values). If this series does not converge absolutely, we say that the expected value of <math>X</math> does not exist.


===Univariate continuous random variable===
===Univariate continuous random variable===
Line 112: Line 112:
*The expected value of a constant is equal to the constant itself; i.e., if <math>c</math> is a constant, then <math display="inline">\operatorname{E}[c]=c</math>.
*The expected value of a constant is equal to the constant itself; i.e., if <math>c</math> is a constant, then <math display="inline">\operatorname{E}[c]=c</math>.


*If <math>X</math> and <math>Y</math> are random variables such that <math display="inline">X \le Y</math> [[wikipedia:almost surely|almost surely]], then <math>\operatorname{E}[X] \le \operatorname{E}[Y]</math>.
*If <math>X</math> and <math>Y</math> are random variables such that <math display="inline">X \le Y</math> [[almost surely|almost surely]], then <math>\operatorname{E}[X] \le \operatorname{E}[Y]</math>.


*The expected value operator (or '''expectation operator''') <math>\operatorname{E}[\cdot]</math> is [[wikipedia:linear operator|linear]] in the sense that
*The expected value operator (or '''expectation operator''') <math>\operatorname{E}[\cdot]</math> is [[linear operator|linear]] in the sense that


<math display="block">\begin{align*}
<math display="block">\begin{align*}
Line 173: Line 173:


</math>
</math>


==Notes==
==Notes==
Line 180: Line 179:
==References==
==References==
*{{cite book
*{{cite book
  | author = [[wikipedia:Pierre Simon de Laplace|Pierre Simon de Laplace]]
  | author = [[Pierre Simon de Laplace|Pierre Simon de Laplace]]
  | year = 1812
  | year = 1812
  | title = Analytical Theory of Probability}}
  | title = Analytical Theory of Probability}}
Line 186: Line 185:


*{{cite book
*{{cite book
  | author = [[wikipedia:Andrey Kolmogorov|Andrei Nikolajevich Kolmogorov]]
  | author = [[Andrey Kolmogorov|Andrei Nikolajevich Kolmogorov]]
  | year = 1950
  | year = 1950
  | title = Foundations of the Theory of Probability}}
  | title = Foundations of the Theory of Probability}}
Line 192: Line 191:


*{{cite book
*{{cite book
  | author = [[wikipedia:Patrick Billingsley|Patrick Billingsley]]
  | author = [[Patrick Billingsley|Patrick Billingsley]]
  | title = Probability and Measure
  | title = Probability and Measure
  | publisher = John Wiley and Sons
  | publisher = John Wiley and Sons
Line 205: Line 204:
  | isbn = 0-521-42028-8}}
  | isbn = 0-521-42028-8}}
:: Chapters 7 to 9 are about continuous variables.
:: Chapters 7 to 9 are about continuous variables.
* B. S. Everitt: ''The Cambridge Dictionary of Statistics'', [[wikipedia:Cambridge University Press|Cambridge University Press]], Cambridge (3rd edition, 2006). ISBN 0-521-69027-7
* B. S. Everitt: ''The Cambridge Dictionary of Statistics'', [[Cambridge University Press|Cambridge University Press]], Cambridge (3rd edition, 2006). ISBN 0-521-69027-7
* Bishop: ''Pattern Recognition and Machine Learning'', [[wikipedia:Springer Publishing|Springer]], ISBN 0-387-31073-8
* Bishop: ''Pattern Recognition and Machine Learning'', [[Springer Publishing|Springer]], ISBN 0-387-31073-8
* den Dekker A. J., Sijbers J., (2014) "Data distributions in magnetic resonance images: a review", ''Physica Medica'', [http://dx.doi.org/10.1016/j.ejmp.2014.05.002]
* den Dekker A. J., Sijbers J., (2014) "Data distributions in magnetic resonance images: a review", ''Physica Medica'', [http://dx.doi.org/10.1016/j.ejmp.2014.05.002]
*{{cite web |url= https://en.wikipedia.org/w/index.php?title=Probability_density_function&oldid=1063440620 |title= Probability density function |author = Wikipedia contributors |website= Wikipedia |publisher= Wikipedia |access-date = 28 January 2022 }}
*{{cite web |url= https://en.wikipedia.org/w/index.php?title=Probability_density_function&oldid=1063440620 |title= Probability density function |author = Wikipedia contributors |website= Wikipedia |publisher= Wikipedia |access-date = 28 January 2022 }}
*{{cite web |url= https://en.wikipedia.org/w/index.php?title=Probability_distribution&oldid=1067642803 |title=  Probability distribution |author = Wikipedia contributors |website= Wikipedia |publisher= Wikipedia |access-date = 28 January 2022 }}
*{{cite web |url= https://en.wikipedia.org/w/index.php?title=Probability_distribution&oldid=1067642803 |title=  Probability distribution |author = Wikipedia contributors |website= Wikipedia |publisher= Wikipedia |access-date = 28 January 2022 }}

Revision as of 16:45, 4 April 2024

A probability distribution assigns a probability to each measurable subset of the possible outcomes of a random experiment, survey, or procedure of statistical inference. Examples are found in experiments whose sample space is non-numerical, where the distribution would be a categorical distribution; experiments whose sample space is encoded by discrete random variable, where the distribution can be specified by a probability mass function; and experiments with sample spaces encoded by continuous random variables, where the distribution can be specified by a probability density function

A univariate distribution gives the probabilities of a single random variable taking on various alternative values. Important and commonly encountered univariate probability distributions include the binomial distribution, the hypergeometric distribution, and the normal distribution.

Introduction

To define probability distributions for the simplest cases, one needs to distinguish between discrete and continuous random variables. In the discrete case, one can easily assign a probability to each possible value: for example, when throwing a fair dice, each of the six values 1 to 6 has the probability 1/6. In contrast, when a random variable takes values from a continuum then, typically, probabilities can be nonzero only if they refer to intervals: in quality control one might demand that the probability of a "500 g" package containing between 490 g and 510 g should be no less than 98%.

If the random variable is real-valued (or more generally, if a total order is defined for its possible values), the cumulative distribution function (CDF) gives the probability that the random variable is no larger than a given value; in the real-valued case, the CDF is the integral of the probability density function (pdf) provided that this function exists.

Cumulative distribution function

Because a probability distribution P on the real line is determined by the probability of a scalar random variable [math]X[/math] being in a half-open interval (-∞, [math]X[/math]], the probability distribution is completely characterized by its cumulative distribution function:

[[math]] F(x) = \operatorname{P} \left[ X \le x \right] \qquad \text{ for all } x \in \mathbb{R}.[[/math]]

Discrete probability distribution

A discrete probability distribution is a probability distribution characterized by a probability mass function. Thus, the distribution of a random variable [math]X[/math] is discrete, and [math]X[/math] is called a discrete random variable, if

[[math]]\sum_u \operatorname{P}(X=u) = 1[[/math]]

as [math]u[/math] runs through the set of all possible values of [math]X[/math]. Hence, a random variable can assume only a finite or countably infinite number of values—the random variable is a discrete variable. For the number of potential values to be countably infinite, even though their probabilities sum to 1, the probabilities have to decline to zero fast enough. For example, if [math]\operatorname{P}(X=n) = \tfrac{1}{2^n}[/math] for [math]n = 1, 2, \ldots,[/math] we have the sum of probabilities 1/2 + 1/4 + 1/8 + ... = 1.

Well-known discrete probability distributions used in statistical modeling include the poisson distribution, the Bernoulli distribution, the binomial distribution, the geometric distribution, and the negative binomial distribution. Additionally, the discrete uniform distribution is commonly used in computer programs that make equal-probability random selections between a number of choices.

Probability Mass Function

Suppose that [math]X[/math] is a discrete random variable defined on a sample space [math]S[/math]. Then the probability mass function [math]f_X[/math] is defined as[1][2]

[[math]]f_X(x) = \operatorname{P}(X = x) = \operatorname{P}(\{s \in S: X(s) = x\}).[[/math]]

Thinking of probability as mass helps to avoid mistakes since the physical mass is conserved as is the total probability for all hypothetical outcomes [math]X[/math]:

[[math]]\sum_{x\in A} f_X(x) = 1[[/math]]

When there is a natural order among the hypotheses [math]X[/math], it may be convenient to assign numerical values to them (or n-tuples in case of a discrete multivariate random variable) and to consider also values not in the image of [math]X[/math]. That is, [math]f_X[/math] may be defined for all real numbers and [math]f_X = 0[/math] for all [math]X[/math] not in [math] X(S)[/math].

Since the image of [math]X[/math] is countable, the probability mass function [math]f_X(x)[/math] is zero for all but a countable number of values of [math]X[/math]. The discontinuity of probability mass functions is related to the fact that the cumulative distribution function of a discrete random variable is also discontinuous. Where it is differentiable, the derivative is zero, just as the probability mass function is zero at all such points.

Cumulative Distribution

Equivalently to the above, a discrete random variable can be defined as a random variable whose cumulative distribution function (cdf) increases only by jump discontinuities—that is, its cdf increases only where it "jumps" to a higher value, and is constant between those jumps. The points where jumps occur are precisely the values which the random variable may take.

Indicator-function representation

For a discrete random variable [math]X[/math], let [math]u_0, u_1, \ldots [/math] be the values it can take with non-zero probability. Denote

[[math]]\Omega_i=X^{-1}(u_i)= \{\omega: X(\omega)=u_i\},\, i=0, 1, 2, \dots[[/math]]

These are disjoint sets, and by formula (1)

[[math]]\operatorname{P}\left(\bigcup_i \Omega_i\right)=\sum_i \operatorname{P}(\Omega_i)=\sum_i\operatorname{P}(X=u_i)=1.[[/math]]

It follows that the probability that [math]X[/math] takes any value except for [math]u_0, u_1, \ldots [/math] is zero, and thus one can write [math]X[/math] as

[[math]]X=\sum_i u_i 1_{\Omega_i}[[/math]]

except on a set of probability zero, where [math]1_A[/math] is the indicator function of [math]A[/math]. This may serve as an alternative definition of discrete random variables.

Continuous probability distribution

A continuous probability distribution is a probability distribution that has a cumulative distribution function that is continuous. Most often they are generated by having a probability density function. Mathematicians call distributions with probability density functions absolutely continuous, since their cumulative distribution function is absolutely continuous with respect to the Lebesgue measure. If the distribution of [math]X[/math] is continuous, then [math]X[/math] is called a continuous random variable.

Intuitively, a continuous random variable is the one which can take a continuous range of values—as opposed to a discrete distribution, where the set of possible values for the random variable is at most countable. While for a discrete distribution an event with probability zero is impossible (e.g., rolling 3.5 on a standard dice is impossible, and has probability zero), this is not so in the case of a continuous random variable.

Formally, if [math]X[/math] is a continuous random variable, then it has a probability density function [math]f_X[/math], and therefore its probability of falling into a given interval, say [math][a,b][/math], is given by the integral

[[math]] \operatorname{P}[a\le X\le b] = \int_a^b f(x) \, dx [[/math]]

In particular, the probability for [math]X[/math] to take any single value is zero, because an integral with coinciding upper and lower limits is always equal to zero.

The definition states that a continuous probability distribution must possess a density, or equivalently, its cumulative distribution function be absolutely continuous. This requirement is stronger than simple continuity of the cumulative distribution function, and there is a special class of distributions, singular distributions, which are neither continuous nor discrete nor a mixture of those. An example is given by the Cantor distribution. Such singular distributions however are never encountered in practice.

Note on terminology: some authors use the term "continuous distribution" to denote the distribution with continuous cumulative distribution function. Thus, their definition includes both the (absolutely) continuous and singular distributions.

By one convention, a probability distribution [math]\,\mu[/math] is called continuous if its cumulative distribution function [math]F(x)=\mu(-\infty,x][/math] is continuous and, therefore, the probability measure of singletons [math]\mu\{x\}\,=\,0[/math] for all [math]\,x[/math].

Another convention reserves the term continuous probability distribution for absolutely continuous distributions. These distributions can be characterized by a probability density function: a non-negative Lebesgue integrable function [math]\,f[/math] defined on the real numbers such that

[[math]] F(x) = \mu(-\infty,x] = \int_{-\infty}^x f(t)\,dt. [[/math]]

Discrete distributions and some continuous distributions (like the Cantor distribution) do not admit such a density.

Further details

Not every probability distribution has a density function: the distributions of discrete random variables do not; nor does the Cantor distribution, even though it has no discrete component, i.e., does not assign positive probability to any individual point.

A distribution has a density function if and only if its cumulative distribution function [math]F(x)[/math] is absolutely continuous. In this case: [math]F[/math] is almost everywhere differentiable, and its derivative can be used as probability density:

[[math]] \frac{d}{dx}F(x) = f(x). [[/math]]

If a probability distribution admits a density, then the probability of every one-point set {a} is zero; the same holds for finite and countable sets.

Two probability densities [math]f[/math] and [math]g[/math] represent the same probability distribution precisely if they differ only on a set of Lebesgue measure zero.

Families of densities

It is common for probability density functions (and probability mass functions) to be parametrized—that is, to be characterized by unspecified parameters. For example, the normal distribution is parametrized in terms of the mean and the variance, denoted by [math]\mu[/math] and [math]\sigma^2[/math] respectively, giving the family of densities

[[math]] f(x;\mu,\sigma^2) = \frac{1}{\sigma\sqrt{2\pi}} e^{ -\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2 }. [[/math]]

It is important to keep in mind the difference between the domain of a family of densities and the parameters of the family. Different values of the parameters describe different distributions of different random variables on the same sample space (the same set of all possible values of the variable); this sample space is the domain of the family of random variables that this family of distributions describes. A given set of parameters describes a single distribution within the family sharing the functional form of the density. From the perspective of a given distribution, the parameters are constants, and terms in a density function that contain only parameters, but not variables, are part of the normalization factor of a distribution (the multiplicative factor that ensures that the area under the density—the probability of something in the domain occurring— equals 1). This normalization factor is outside the kernel of the distribution.

Expected Value

In probability theory, the expected value of a random variable, intuitively, is the long-run average value of repetitions of the experiment it represents. For example, within an insurance context, the expected loss can be thought of as the average loss incurred by an insurer on a very large portfolio of policies sharing a common loss distribution (similar risk profile). Less roughly, the law of large numbers states that the arithmetic mean of the values almost surely converges to the expected value as the number of repetitions approaches infinity.

The expected value does not exist for random variables having some distributions with large "tails", such as the Cauchy distribution.[3] For random variables such as these, the long-tails of the distribution prevent the sum/integral from converging. That being said, most loss models encountered in insurance implicitly assume finite expected losses.

The expected value is also known as the expectation, mathematical expectation, EV, average, mean value, mean, or first moment.

Univariate discrete random variable

Let [math]X[/math] be a discrete random variable taking values [math]x_1,x_2,\ldots[/math] with probabilities [math]p_1,p_2,\ldots[/math] respectively. Then the expected value of this random variable is the infinite sum

[[math]] \operatorname{E}[X] = \sum_{i=1}^\infty x_i\, p_i,[[/math]]

provided that this series converges absolutely (that is, the sum must remain finite if we were to replace all [math]x_i[/math]s with their absolute values). If this series does not converge absolutely, we say that the expected value of [math]X[/math] does not exist.

Univariate continuous random variable

If the probability distribution of [math]X[/math] admits a probability density function [math]f(x)[/math], then the expected value can be computed as

[[math]] \operatorname{E}[X] = \int_{-\infty}^\infty x f(x)\, \mathrm{d}x . [[/math]]

Properties

  • The expected value of a constant is equal to the constant itself; i.e., if [math]c[/math] is a constant, then [math]\operatorname{E}[c]=c[/math].
  • If [math]X[/math] and [math]Y[/math] are random variables such that [math]X \le Y[/math] almost surely, then [math]\operatorname{E}[X] \le \operatorname{E}[Y][/math].
  • The expected value operator (or expectation operator) [math]\operatorname{E}[\cdot][/math] is linear in the sense that

[[math]]\begin{align*} \operatorname{E}[X + c] &= \operatorname{E}[X] + c \\ \operatorname{E}[X + Y] &= \operatorname{E}[X] + \operatorname{E}[Y] \\ \operatorname{E}[aX] &= a \operatorname{E}[X] \end{align*}[[/math]]

Combining the results from previous three equations, we can see that

[[math]]\operatorname{E}[a X + b Y + c] = a \operatorname{E}[X] + b \operatorname{E}[Y] + c\,[[/math]]

for any two random variables [math]X[/math] and [math]Y[/math] and any real numbers [math]a[/math],[math]b[/math] and [math]c[/math].

Layer Cake Representation

When a continuous random variable [math]X[/math] takes only non-negative values, we can use the following formula for computing its expectation (even when the expectation is infinite):

[[math]] \operatorname{E}[X]=\int_0^\infty \operatorname{P}(X \ge x)\; \mathrm{d}x[[/math]]

Similarly, when a random variable takes only values in {0, 1, 2, 3, ...} we can use the following formula for computing its expectation:

[[math]] \operatorname{E}[X]=\sum\limits_{i=1}^\infty \operatorname{P}(X\geq i).[[/math]]

Residual Life Distribution

Suppose [math]X[/math] is a non-negative random variable which can be thought of as representing the lifetime for some entity of interest. A family of residual life distributions can be constructed by considering the conditional distribution of [math]X[/math] given that [math]X[/math] is beyond some level [math]d[/math],i.e., the distribution of lifetime given that death (failure) hasn't yet occurred at time [math]d[/math]:

[[math]] \begin{align} R_d(t) &= \operatorname{P}(X \leq d + t \mid X \gt d) \\ &= \frac{1 - S(t+d)}{S(d)} \end{align} [[/math]]

with [math]S(t)[/math] denoting the survival function for [math]X[/math] representing the probability that [math]X[/math] is greater than [math]t[/math] (the lifetime of [math]X[/math] is greater than [math]t[/math]).

Residual life distributions are relevant for insurance policies with deductibles. Since a claim is made when the loss to the insured is beyond the deductible, the loss to the insurer given that a claim was made is precisely the residual life distribution [math]R_d(t)[/math].

Mean Excess Loss Function

If [math]X[/math] represents loss to the insured with an insurance policy with a deductible [math]d[/math], then the expected loss to the insurer given that a claim was made is the mean excess loss function evaluated at [math]d[/math]:

[[math]] m(d) = \operatorname{E}[X-d \mid X \gt d] = \int_{0}^{\infty}\frac{S(t + d)}{S(d)} \,dt \,. [[/math]]

This function is also called the mean residual life function when [math]X[/math] is a general non-negative random variable. When the distribution of [math]X[/math] has a density say [math]f(x)[/math], then the mean excess loss function equals

[[math]] m(d) = \frac{\int_{d}^{\infty} (x-d) f(x) \, dx}{S(d)} \,. [[/math]]

Notes

  1. Kumar, Dinesh (2006). Reliability & Six Sigma. Birkhäuser. p. 22. ISBN 978-0-387-30255-3.
  2. Rao, S.S. (1996). Engineering optimization: theory and practice. John Wiley & Sons. p. 717. ISBN 978-0-471-55034-1.
  3. Richard W Hamming (1991). "Example 8.7–1 The Cauchy distribution". The art of probability for scientists and engineers. Addison-Wesley. p. 290 ff. ISBN 0-201-40686-1. Sampling from the Cauchy distribution and averaging gets you nowhere — one sample has the same distribution as the average of 1000 samples!

References

The first major treatise blending calculus with probability theory, originally in French: Théorie Analytique des Probabilités.
The modern measure-theoretic foundation of probability theory; the original German version (Grundbegriffe der Wahrscheinlichkeitsrechnung) appeared in 1933.
Chapters 7 to 9 are about continuous variables.
  • B. S. Everitt: The Cambridge Dictionary of Statistics, Cambridge University Press, Cambridge (3rd edition, 2006). ISBN 0-521-69027-7
  • Bishop: Pattern Recognition and Machine Learning, Springer, ISBN 0-387-31073-8
  • den Dekker A. J., Sijbers J., (2014) "Data distributions in magnetic resonance images: a review", Physica Medica, [1]
  • Wikipedia contributors. "Probability density function". Wikipedia. Wikipedia. Retrieved 28 January 2022.
  • Wikipedia contributors. "Probability distribution". Wikipedia. Wikipedia. Retrieved 28 January 2022.