Moments of Random Variables
Moments and Variance
Let [math](\Omega,\A,\p)[/math] be a probability space. Let [math]X[/math] be a r.v. and let [math]p\geq 1[/math] be an integer (or even a real number). The [math]p[/math]-th moment of [math]X[/math] is by definition [math]\E[X^p][/math], which is well defined when [math]X\geq 0[/math] or [math]\E[\vert X\vert^p] \lt \infty[/math], which is by definition
When [math]p=1[/math], we get the expected value. We say that [math]X[/math] is [math]centered[/math] if [math]\E[X]=0[/math]. The spaces [math]L^p(\Omega,\A,\p)[/math] for [math]p\in[1,\infty)[/math] are defined as we have seen in the course [math]Measure[/math] [math]and[/math] [math]Integral[/math]. From Hölder's inequality we can observe that
whenever [math]p,q\geq 1[/math] and [math]\frac{1}{p}+\frac{1}{q}=1[/math]. If we take [math]Y=1[/math] above, we obtain
which means [math]\|X\|_1\leq \|X\|_p[/math]. This can be extended as [math]\|X\|_r\leq \|X\|_p[/math] if [math]r\leq p[/math]. So it follows that [math]L^p(\Omega,\A,\p)\subset L^r(\Omega,\A,\p)[/math]. For [math]p=q=2[/math] we get the Cauchy-Schwarz inequality as follows
With [math]Y=1[/math] we have [math]\E[\vert X\vert]^2=\E[X^2][/math].
Let [math](\Omega,\A,\p)[/math] be a probability space. Consider a r.v. [math]X\in L^2(\Omega,\A,\p)[/math]. The variance of [math]X[/math] is defined as
Informally, the variance represents the deviation of [math]X[/math] around its mean [math]\E[X][/math]. Note that [math]Var(X)=0[/math] if and only if [math]X[/math] is a.s. constant.
Consequently, we get
It follows that if [math]X[/math] is centered (i.e. [math]\E[X]=0[/math]), we get [math]Var(X)=\E[X^2][/math]. Moreover, the following two simple inequalities are very often used.
- (Markov inequality) If [math]X\geq 0[/math] and [math]a\geq 0[/math] then
[[math]] \boxed{\p[X \gt a]\leq \frac{1}{a}\E[X]}. [[/math]]
- (Tchebisheff inequality)
[[math]] \boxed{\p[\vert X-\E[X]\vert \gt a]\leq \frac{1}{a^2}Var(X)}. [[/math]]
We want to show both inequalities.
- Note that
[[math]] \p[X\geq a]=\E[\one_{\{X\geq a\}}]\leq \E\left[\frac{X}{a}\underbrace{\one_{\{X\geq a\}}}_{\leq 1}\right]\leq \E\left[\frac{X}{a}\right]. [[/math]]
- This follows from (1) because [math]\vert X-\E[X]\vert[/math] is a positive r.v. and hence
[[math]] \p[\vert X-\E[X]\vert\geq a]=\p[\vert X-\E[X]\vert^2\geq a^2]\leq \frac{1}{a^2}\underbrace{\E[\vert X-\E[X]\vert^2]}_{Var(X)}. [[/math]]
Let [math](\Omega,\A,\p)[/math] be a probability space. Consider two r.v.'s [math]X,Y\in L^2(\Omega,\A,\p)[/math]. The covariance of [math]X[/math] and [math]Y[/math] is defined as
If [math]X=(X_1,...,X_d)\in\R^d[/math] is a r.v. such that [math]\forall i\in\{1,...,d\}[/math], [math]X_i\in L^2(\Omega,\A,\p)[/math], then the covariance matrix of [math]X[/math] is defined as
Informally speaking, the covariance between [math]X[/math] and [math]Y[/math] measures the correlation between [math]X[/math] and [math]Y[/math]. Note that [math]Cov(X,X)=Var(X)[/math] and from Cauchy-Schwarz we get
The application [math](X,Y)\mapsto Cov(X,Y)[/math] is a bilinear form on [math]L^2(\Omega,\A,\p)[/math]. We also note that [math]K_X[/math] is symmetric and positive, i.e. if [math]\lambda_1,...,\lambda_d\in\R[/math], [math]\lambda=(\lambda_1,...,\lambda_d)^T[/math], then
So we get
Linear Regression
Let [math](\Omega,\A,\p)[/math] be a probability space. Let [math]X,Y_1,...,Y_n[/math] be r.v.'s in [math]L^2(\Omega,\A,\p)[/math]. We want the best approximation of [math]X[/math] as an affine function of [math]Y_1,...,Y_n[/math]. More precisely we want to minimize
over all possible choices of [math](\beta_0,\beta_1,...,\beta_n)[/math].
Let [math](\Omega,\A,\p)[/math] be a probability space. Let [math]X,Y\in L^1(\Omega,\A,\p)[/math] be two r.v.'s. Then
Let [math]H[/math] be the linear subspace of [math]L^2(\Omega,\A,\p)[/math] spanned by [math]\{1,Y_1,...,Y_n\}[/math]. Then we know that the r.v. [math]Z[/math], which minimizes
When [math]n=1[/math], we have
General references
Moshayedi, Nima (2020). "Lectures on Probability Theory". arXiv:2010.16280 [math.PR].