Change of Variables
Previously we studied transformations of a single random variable and how to compute the distribution of a transformed random variable. Here we are primarily considered with situations when the function of the random variables, sometimes called a change of variables, has a (joint) cumulative distribution or (joint) density function that can be computed simply in terms of the corresponding cumulative distribution function or joint density function of the random variables that it depends on.
The Setup
We let [math]X_1,\ldots,X_n[/math] denote a sequence of random variables and let
denote the change of variables.
Monotone One-to-One Transformations
A transformation [math]T[/math] is one-to-one when [math]T(\mathbf{x}) = T( \mathbf{y}) [/math] implies that [math] \mathbf{x} = \mathbf{y}[/math]: [math]T [/math] maps two distinct points to two distinct points. If [math]T[/math] is one-to-one then [math]T[/math] has a left-inverse [math]T^{-1}[/math]:
Strictly Increasing
If [math]T[/math] is strictly increasing,
, then
and we obtain the simple relation
Example
Suppose that [math]\mathbf{X} = [X_1,X_2] [/math] and let
with [math]A[/math] the linear transformation
The inverse of the transformation, [math]T^{-1}[/math], equals
By \ref{transform-rel-up}, the cumulative distribution function of [math]\textbf{Y}[/math] equals [math]F_{\textbf{Y}} = f_{\textbf{X}} \circ A^{-1} \circ \log.[/math]
Strictly Decreasing
If the transformation [math]T[/math] is strictly decreasing and the distribution function of [math]X[/math], [math]F_{\textbf{X}}[/math], is continuous then
and the distribution funtion of [math]\textbf{Y}[/math] equals [math]1 - f_{\textbf{X}} \circ T^{-1} [/math].
Smooth One-to-One Transformations
We assume the following:
- [math]X_1,\ldots,X_n[/math] have a joint density.
- The transformation [math]T[/math] is one-to-one.
- The transformation [math]T[/math] is differentiable almost surely: the set of points where [math]T[/math] is not differentiable has probability 0.
To illustrate condition 3, suppose that [math]X_1 [/math] and [math]X_2[/math] are uniformly distributed on [0,1] and [math]T(X_1,X_2) = [X_2,X_1][/math]. Then [math]T[/math] is defined on the square region [0,1] x [0,1] and the points where it's differentiable has probability 1:
If the conditions above are met, we can actually compute the joint density of [math]\textbf{Y}[/math]:
Here [math]J_T[/math] denotes the Jacobian matrix of [math]T[/math] and [math]\textrm{det}[J_T][/math] its determinant. We can express the density of [math]Y[/math] more succinctly (recall that [math]\circ[/math] denotes function composition):
Example
Let [math]\textbf{U}[/math] denote a uniform on the square [0,1]x[0,1] and let
with [math]A[/math] the linear transformation [math]A(x_1,x_2) = [x_1 + x+2,x_1 - x_2][/math]. The inverse of the transformation, [math]T^{-1}[/math], equals
We compute the determinant of the Jacobian of the transformation
and its corresponding determinant equals [math] -2e^{2x_1}[/math]. Recalling \ref{transform-density}, the density function of [math]\textbf{Y}[/math] equals
with [math]D[/math] equal to the region [math]T([0,1] \times [0,1])[/math] (the image of the unit square under the transformation). Expressing [math]x_1[/math] and [math]x_2[/math] in terms of [math]y_1[/math] and [math]y_2[/math], the region [math]D[/math] can be described as follows:
Other Methods
We present other ways of calculating the cumulative distribution function of transformed random variables or the probability of events associated with said variables.
Direct Evaluation
The most direct way to compute probabilities such as [math]\operatorname{P}[\textbf{Y} \in A] [/math] is to express them in terms of events associated with the initial random variable [math]\textbf{X}[/math]:
Example 1
If [math]Y = X_1 + X_2= T(\mathbf{X})[/math], then [math]\operatorname{P}[Y_1 \leq y] = \operatorname{P}[X \in B ] [/math] with [math]B[/math] the set of points
Example 2
If [math]\textbf{Y} = [X_1 + X_2,X_1 - X_2] = T(\mathbf{X}) [/math], then [math]\operatorname{P}[Y_1 \leq y_1,Y_2 \leq y_2] = \operatorname{P}[\textbf{X} \in B ] [/math] with [math]B[/math] the set of points
Conditioning
We limit ourselves to the case when we only have two variables: [math]\mathbf{X} = [X_1,X_2] [/math]. The conditioning method to computing probabilities associated with [math]\textbf{Y}[/math] is to condition on [math]X_1[/math] or [math]X_2[/math]:
[math]\mathbf{X}[/math] typically has a joint density function and then \ref{cond-method-eqn} also equals
with [math]f_{X_i}[/math] the marginal density of [math]X_i[/math].
Example 1
Suppose [math]X_1[/math] and [math]X_2[/math] are mutually independent. Suppose further that [math]X_1[/math] has density function
and [math]X_2 [/math] is uniformly distributed on [0,1]. We consider the random variable [math]Y = X_1 X_2 [/math]. Conditioning on [math]X_2[/math], we have
and thus the density function of [math]Y[/math] equals
Example 2
Suppose [math]X[/math] is uniform on the square [0,1]x[0,1] and let
Conditional on [math]X_2[/math], [math]Y[/math] is uniformly distributed on the interval [math]X_2 [/math]. Furthermore, [math]X_2[/math] is uniformly distributed on [0,1]. By \ref{cond-method-eqn}, we have
Sums of Independent Random Variables
The probability distribution of the sum of two or more independent random variables is the convolution of their individual distributions. The term is motivated by the fact that the probability mass function or probability density function of a sum of random variables is the convolution of their corresponding probability mass functions or probability density functions respectively.
Convolutions
Continuous
The convolution of [math]f[/math] and [math]g[/math] is written [math]f*g[/math], using an asterisk or star. It is defined as the integral of the product of the two functions after one is reversed and shifted. As such, it is a particular kind of integral transform:
While the symbol [math]t[/math] is used above, it need not represent the time domain. But in that context, the convolution formula can be described as a weighted average of the function [math]f(\tau)[/math] at the moment [math]t[/math] where the weighting is given by [math]g(-\tau)[/math] simply shifted by amount [math]t[/math]. As [math]t[/math] changes, the weighting function emphasizes different parts of the input function.
For functions [math]f[/math], [math]g[/math] supported on only [math][0, \infty)[/math] (i.e., zero for negative arguments), the integration limits can be truncated, resulting in
Discrete
For functions [math]f[/math], [math]g[/math] defined on the set [math]Z[/math] of integers, the discrete convolution of [math]f[/math] and [math]g[/math] is given by:[1]
When [math]g[/math] has finite support in the set [math]\{-M,-M+1,\dots,M-1,M\}[/math] (representing, for instance, a finite impulse response), a finite summation may be used:[2]
Continuous Distributions
If [math]X_1,\ldots,X_n[/math] denotes a sequence of mutually independent continuous random variables each having a probability density function, then the sum [math] Y = \sum_{i=1}^n X_i [/math] has probability density function [math] f_{X_1} * \cdots * f_{X_n}[/math]. Here is a table giving the distribution for sums of well-known discrete distributions:
[math]X_i[/math] |
[math]Y[/math] |
---|---|
[math]\textrm{Normal}(\mu_i, \sigma^2_i)[/math] | [math]\textrm{Normal}(\mu_1 + ... + \mu_n, \sigma_1^2 + \ldots + \sigma^2_n)[/math] |
[math]\textrm{Cauchy}(a_i,γ_i)[/math] | [math]\textrm{Cauchy}(a_1+...+a_n,γ_1 +...+γ_n)[/math] |
[math]\textrm{Exponential}(\theta)[/math] | [math]\textrm{Gamma}(n, \theta)[/math] |
[math]\textrm{Chi-Square}(r_i)[/math] | [math]\textrm{Chi-Square}(r_1 +...+ r_n) [/math] |
To demonstrate the convolution technique, we will give a derivation of the result described in the third row in the table above (sum of exponentials). The proof is by induction on [math]n[/math]. For [math]n = 1 [/math], there is nothing to show. If
then we need to show that [math]Y_{n+1}[/math] is [math]\textrm{Gamma}(n+1, \theta)[/math] given that [math]Y_n[/math] is [math]\textrm{Gamma}(n,\theta)[/math]. It suffices to show the result for [math]\theta = 1 [/math] since the general case then follows by scaling. Using the convolution technique, the density of [math]Y_{n+1}[/math] equals:
This is the density function of a [math]\textrm{Gamma}(n+1,1)[/math] and thus, by induction, we have completed the derivation.
Discrete Distributions
Similarly, if [math]X_1, \ldots, X_n [/math] denotes a sequence of mutually independent integer valued random variables, then the sum
has probability mass function [math]p_{Y} = p_{X_1} * \cdots * p_{X_n}.[/math]. Here is a table giving the distribution for sums of well-known discrete distributions:
[math]X_i[/math] |
[math]Y[/math] |
---|---|
[math]\textrm{Bernouilli}(p)[/math] | [math]\textrm{Binomial}(n,p)[/math] |
[math]\textrm{Binomial}(n_i,p)[/math] | [math]\textrm{Binomial}(n_1 +...+n_n,p) [/math] |
[math]\textrm{NegativeBinomial}(n_i,p)[/math] | [math]\textrm{NegativeBinomial}(n_1 + \cdots + n_n, p)[/math] |
[math]\textrm{Geometric}(p)[/math] | [math]\textrm{NegativeBinomial}(n,p) [/math] |
[math]\textrm{Poisson}(\lambda_i)[/math] | [math]\textrm{Poisson}(\lambda_1 + \ldots + \lambda_n)[/math] |
To demonstrate the convolution technique, we will give a derivation of the result described in the second row in the table above (sum of binomials). The proof is by induction on [math]n[/math]. For [math] n = 1 [/math] there is nothing to show. If [math]Y_n = \sum_{i=1}^n X_i[/math], [math]s_n = \sum_{i=1}^n n_i[/math], and we set [math]q= 1-p[/math] and [math]\operatorname{P}[Y_{n+1} = y] = p(y) [/math], then
To go from \ref{conv-bin-1} to \ref{conv-bin-2}, we have used the combinatorial identity
\ref{conv-bin-2} indicates that [math]Y_{n+1}[/math] is a binomial with parameters [math]s_{n+1}[/math] and [math]p[/math]. By induction, we have completed the derivation.
Notes
- Damelin & Miller 2011, p. 232
- Press, William H.; Flannery, Brian P.; Teukolsky, Saul A.; Vetterling, William T. (1989). Numerical Recipes in Pascal. Cambridge University Press. p. 450. ISBN 0-521-37516-9.
References
- Hogg, Robert V.; McKean, Joseph W.; Craig, Allen T. (2004). Introduction to mathematical statistics (6th ed.). Upper Saddle River, New Jersey: Prentice Hall. p. 692. ISBN 978-0-13-008507-8. MR 0467974.
- Wikipedia contributors. "Convolution". Wikipedia. Wikipedia. Retrieved 28 January 2022.