$L^2(\Omega,\F,\p)$ as a Hilbert space and orthogonal projections

In this section we will always work on a probability space [math](\Omega,\F,\p)[/math]. Consider the space [math]L^2(\Omega,\F,\p)[/math], which is given by

[[math]] L^2(\Omega,\F,\p):=\{X:\Omega\to \R \text{a r.v.}\mid \E[X^2] \lt \infty\}. [[/math]]

More precisely, consists only of equivalence classes, i.e. [math]X[/math] and [math]Y[/math] are identified if [math]X=Y[/math] a.s. We also know that we have a norm on this space given by

[[math]] \|X\|_2=\E[X^2]^{1/2}. [[/math]]

Recall that on [math]\mathcal{L}^2(\Omega,\F,\p)[/math] this would only be a semi-norm rather than a norm.

We can also define an inner product on [math]L^2(\Omega,\F,\p)[/math] by

[[math]] \langle X,Y\rangle=\E[XY]. [[/math]]

One can easily check that this satisfies the conditions of an inner product. It remains to show that [math]\vert\langle X,Y\rangle\vert \lt \infty[/math]. By Cauchy-Schwarz we get

[[math]] \vert\langle X,Y\rangle\vert\leq \|X\|_2\|Y\|_2=\E[X^2]^{1/2}\E[Y^2]^{1/2}, [[/math]]

and since we assume that the second moment of our r.v.'s exists, this is finite. Moreover one can see that

[[math]] \sqrt{\langle X,X\rangle}=\E[X^2]^{1/2}=\|X\|_2. [[/math]]

Now we know that [math]L^2(\p)[/math] is a Banach space and therefore a complete, normed vector space, i.e. every Cauchy sequence has a limit inside the space with respect to the norm. Since we have also an inner product on [math]L^2(\p)[/math], it is also a Hilbert space. We shall recall what a Hilbert space is.

Definition (Hilbert space)

An inner product space [math](\mathcal{H},\langle\cdot,\cdot\rangle)[/math] is called a Hilbert space, if it is complete and the norm is derived from the inner product. If it is not complete it is called a pre-Hilbert space.

It is in fact important that we have noted [math]L^2(\Omega,\F,\p)[/math], for instance if [math]\mathcal{G}[/math] is a [math]\sigma[/math]-Algebra and [math]\mathcal{G}\subset\F[/math], then [math]L^2(\Omega,\mathcal{G},\p)\subset L^2(\Omega,\F,\p)[/math]. When we have several [math]\sigma[/math]-Algebras, we write explicitly the dependence of them, by noting for instance [math]L^2(\Omega,\mathcal{G},\p),L^2(\Omega,\F,\p)[/math], etc.

Example

Take [math]\mathcal{H}=\R^n[/math]. This is indeed a Hilbert space with the euclidean inner product, i.e. if [math]X,Y\in\R^n[/math] then

[[math]] \langle X,Y\rangle=\sum_{i=1}^nX_iY_i. [[/math]]

Be aware that this is an example of a finite dimensional Hilbert space, but [math]L^2(\p)[/math] is a infinite dimensional Hilbert space. Another example would be [math]\mathcal{H}=l^2(\N)[/math] the space of square summable sequences a subspace of [math]l^\infty(\N)[/math] which is the space of convergent sequences, i.e.

[[math]] l^2(\N):=\left\{x=(x_n)_{n\geq1}\in l^\infty(\N)\mid \sum_{n=1}^\infty \vert x_n\vert^2 \lt \infty\right\}. [[/math]]

This is also an example of an infinite dimensional Hilbert space. It is clearly related to the [math]L^p(\mu)[/math] spaces, where we use the counting measure. The theory of Hilbert spaces is discussed in more detail in a course on functional analysis.

We want to give here more, maybe a bit harder, examples of Hilbert spaces used in functional analysis.

Sobolev spaces: Sobolev spaces, denoted by [math]H^s[/math] or [math]W^{s,2}[/math], are Hilbert spaces. These are a special kind of a function space in which differentiation may be performed, but that support the structure of an inner product. Because differentiation is permitted, Sobolev spaces are a convenient setting for the theory of partial differential equations. They also form the basis of the theory of direct methods in the calculus of variations. For [math]s[/math] a nonnegative integer and [math]\Omega\subset\R^n[/math], the Sobolev space [math]H^s(\Omega)[/math] contains [math]L^2[/math]-functions whose weak derivatives of order up to [math]s[/math] are also in [math]L^2[/math]. The inner product in [math]H^s(\Omega)[/math] is
[[math]] \langle f,g\rangle_{H^s(\Omega)}=\int_\Omega f(x)\bar g(x)d\mu(x)+\int_\Omega Df(x)\cdot D\bar g(x)d\mu(x)+...+\int_\Omega D^sf(x)\cdot D^s\bar g(x)d\mu(x), [[/math]]
where the dot indicates the dot product in the euclidean space of partial derivatives of each order. Slobber spaces can also be defined when [math]s[/math] is not an integer.
Hardy spaces: The Hardy spaces are function spaces, arising in complex analysis and harmonic analysis, whose elements are certain holomorphic functions in a complex domain. Let [math]U[/math] denote the unit disc in the complex plane. Then the Hardy space [math]\mathcal{H}^2(U)[/math] is defined as the space of holomorphic functions [math]f[/math] on [math]U[/math] such that the means
[[math]] M_r(f)=\frac{1}{2\pi}\int_0^{2\pi}\vert f(re^{i\theta})\vert^2d\theta [[/math]]
remains bounded for [math]r \lt 1[/math]. The norm on this Hardy space is defined by
[[math]] \|f\|_{\mathcal{H}^2(U)}=\lim_{r\to 1}\sqrt{M_r(f)}. [[/math]]
Hardy spaces in the disc are related to Fourier series. A function [math]f[/math] is in [math]\mathcal{H}^2(U)[/math] if and only if [math]f(z)=\sum_{n=0}^\infty a_nz^n[/math], where [math]\sum_{n=0}^\infty\vert a_n\vert^2 \lt \infty[/math]. Thus [math]\mathcal{H}^2(U)[/math] consists of those functions that are [math]L^2[/math] on the circle and whose negative frequency Fourier coefficients vanish.
Bergman spaces: The Bergman spaces are another family of Hilbert spaces of holomorphic functions. Let [math]D[/math] be a bounded open set in the complex plane (or a higher-dimensional complex space) and let [math]L^{2,h}(D)[/math] be the space of holomorphic functions [math]f[/math] in [math]D[/math] that are also in [math]L^2(D)[/math] in the sense that
[[math]] \|f\|^2=\int_D\vert f(z)\vert^2 d\mu(z) \lt \infty, [[/math]]
where the integral is taken with respect to the Lebesgue measure in [math]D[/math]. Clearly [math]L^{2,h}(D)[/math] is a subspace of [math]L^2(D)[/math]; in fact, it is a closed subspace and so a Hilbert space in its own right. This is a consequence of the estimate, valid on compact subsets [math]K[/math] of [math]D[/math], that
[[math]] \sup_{z\in K}\vert f(z)\vert\leq C_K\|f\|_2, [[/math]]
which in turn follows from Cauchy's integral formula. Thus convergence of a sequence of holomorphic functions in [math]L^2(D)[/math] implies also compact convergence and so the function is also holomorphic. Another consequence of this inequality is that the linear functional that evaluates a function [math]f[/math] at a point of [math]D[/math] is actually continuous on [math]L^{2,h}(D)[/math]. The Riesz representation theorem (see notes on measure and integral) implies that the evaluation functional can be represented as an element of [math]L^{2,h}(D)[/math]. Thus, for every [math]z\in D[/math], there is a function [math]\eta_z\in L^{2,h}(D)[/math] such that
[[math]] f(z)=\int_Df(\zeta)\overline{\eta_z(\zeta)}d\mu(\zeta) [[/math]]
for all [math]f\in L^{2,h}(D)[/math]. The integrand [math]K(\zeta,z)=\overline{\eta_z(\zeta)}[/math] is known as the Bergman kernel of [math]D[/math]. This integral kernel satisfies a reproducing property
[[math]] f(z)=\int_D f(\zeta) K(\zeta,z)d\mu(\zeta). [[/math]]
A Bergman space is an example of a reproducing kernel Hilbert space, which is a Hilbert space of functions along with a kernel [math]K(\zeta,z)[/math] that verifies a reproducing property analogous to this one. The Hardy space [math]\mathcal{H}^2(D)[/math] also admits a reproducing kernel, known as the [math]Szeg\ddot{o}[/math] kernel. Reproducing kernels are common in other areas of mathematics as well. For instance, in harmonic analysis the Poisson kernel is a reproducing kernel for the Hilbert space of square integrable harmonic functions in the unit ball. That the latter is a Hilbert space at all is a consequence of the mean value theorem for harmonic functions.

The notion of Hilbert spaces allow us to do basic geometry on them. Even if our space is an infinite dimensional vector space, we can still make sense of geometrical meanings, for example orthogonality, only by using the inner product on the space.

Definition (Orthogonal)

Two elements [math]X,Y[/math] in a Hilbert space [math](\mathcal{H},\langle\cdot,\cdot\rangle)[/math] are said to be orthogonal if

[[math]] \langle X,Y\rangle=0. [[/math]]

For a real valued Hilbert space [math]\mathcal{H}[/math] we get the following identity. For every [math]X,Y\in\mathcal{H}[/math]

[[math]] \|X+Y\|^2=\langle X+Y,X+Y\rangle=\langle X,X\rangle+\langle X,Y\rangle+\langle Y,X\rangle+\langle Y,Y\rangle=\|X\|^2+\|Y\|^2+2\langle X,Y\rangle, [[/math]]

and if [math]X[/math] and [math]Y[/math] are orthogonal, i.e. [math]\langle X,Y\rangle=0[/math], we get the usual pythagorean relation

[[math]] \|X+Y\|^2=\|X\|^2+\|Y\|^2. [[/math]]

Theorem

Let [math](X_n)_{n\geq1}[/math] and [math](Y_n)_{n\geq 1}[/math] be two converging sequences in a Hilbert space [math]\mathcal{H}[/math] such that [math]X_n\xrightarrow{n\to\infty}X[/math] and [math]Y_n\xrightarrow{n\to\infty}Y[/math]. Then

[[math]] \langle X_n,Y_n\rangle\xrightarrow{n\to\infty}\langle X,Y\rangle. [[/math]]

(In particular, [math]X_n=Y_n[/math] gives us that [math]\|X_n\|\xrightarrow{n\to\infty}\|X\|[/math])

Show Proof

We can look at the difference, which is given by

[[math]] \begin{align*} \vert \langle X,Y\rangle-\langle X_n,Y_n\rangle\vert&=\vert \langle X-X_n,Y\rangle+\langle X_n,Y\rangle-\langle X_n,Y_n\rangle\vert\\ &=\vert\langle X-X_n,Y\rangle+\langle X_n,Y-Y_n\rangle\vert\\ &\leq \vert \langle X-X_n,Y\rangle\vert +\vert X_n,Y-Y_n\rangle\vert\\ &\leq \|X-X_n\|\|Y\|+\|X_n\|\|Y-Y_n\|, \end{align*} [[/math]]

where we have used the Cauchy-Schwarz inequality. Now since for [math]n[/math] large enough there is some [math]\varepsilon \gt 0[/math] such that [math]\|X-X_n\| \lt \varepsilon[/math] and [math]\|Y-Y_n\| \lt \varepsilon[/math] by assumption, and the fact that [math]\|X_n\|[/math] is bounded independently of [math]n[/math], we get the claim.

■

Lemma (Parallelogram identity)

Let [math]\mathcal{H}[/math] be a Hilbert space. For all [math]X,Y\in\mathcal{H}[/math] we get

[[math]] \|X+Y\|^2+\|X-Y\|^2=2(\|X\|^2+\|Y\|^2). [[/math]]

Moreover if a norm satisfies the parallelogram identity, it can be derived from an inner product.

Show Proof

Exercise^[a].

■

Definition (Closed linear subset)

Let [math]\mathcal{H}[/math] be a Hilbert space and let [math]\mathcal{L}\subset\mathcal{H}[/math] be a linear subset. [math]\mathcal{L}[/math] is called closed if for every sequence [math](X_n)_{n\geq1}[/math] in [math]\mathcal{L}[/math] with [math]X_n\xrightarrow{n\to\infty}X[/math] we get that [math]X\in\mathcal{L}[/math].

Theorem

Let [math]\mathcal{H}[/math] be a Hilbert space and let [math]\Gamma\subset\mathcal{H}[/math] be a subset. Let [math]\Gamma^\perp[/math] denote the set of all elements of [math]\mathcal{H}[/math] which are orthogonal to [math]\Gamma[/math], i.e.

[[math]] \Gamma^\perp=\{X\in\mathcal{H}\mid \langle X,\gamma\rangle=0,\forall\gamma\in\Gamma\}. [[/math]]

Then [math]\Gamma^\perp[/math] is a closed subspace of [math]\mathcal{H}[/math]. We call [math]\Gamma^\perp[/math] the orthogonal complement of [math]\Gamma[/math].

Show Proof

Let [math]\alpha,\beta\in \R[/math] and [math]X,X'\in\Gamma^\perp[/math]. It is clear that for all [math]Y\in\Gamma[/math]

[[math]] \langle\alpha X + \beta X',Y\rangle=0. [[/math]]

Hence [math]\Gamma^\perp[/math] is a linear subspace of [math]\mathcal{H}[/math]. Next we want to check whether it's closed. Take a sequence [math](X_n)_{n\geq1}[/math] in [math]\Gamma^\perp[/math] such that [math]X_n\xrightarrow{n\to\infty}X[/math] with [math]X\in\mathcal{H}[/math]. Now for all [math]Y\in\Gamma[/math] we get [math]\langle X_n,Y\rangle=0[/math] and [math]\langle X_n,Y\rangle\xrightarrow{n\to\infty}\langle X,Y\rangle[/math] because of the previous theorem. Hence [math]\langle X,Y\rangle=0[/math] and therefore [math]X\in \Gamma^\perp[/math] and the claim follows.

■

Definition (Distance to a closed subspace)

Let [math]\mathcal{H}[/math] be a Hilbert space and let [math]X\in\mathcal{H}[/math]. Moreover let [math]\mathcal{L}\subset\mathcal{H}[/math] be a closed subspace. The distance of [math]X[/math] to [math]\mathcal{L}[/math] is given by

[[math]] d(X,\mathcal{L})=\inf_{Y\in\mathcal{L}}\|X-Y\|=\inf\{\|X-Y\|\mid Y\in\mathcal{L}\}. [[/math]]

Since [math]\mathcal{L}[/math] is closed, [math]X\in\mathcal{L}[/math] if and only if [math]d(X,\mathcal{L})=0[/math].

Convex sets in uniformly Convex spaces

While the emphasis in this section is on Hilbert spaces, it is useful to isolate a more abstract property which is precisely what is needed for several proofs.

Definition (Uniformly convex vector space)

A normed vector space [math](V,\|\cdot\|)[/math] is called uniformly convex if for [math]X,Y\in V[/math]

[[math]] \|X\|,\|Y\|\leq 1\Longrightarrow\left\|\frac{X+Y}{2}\right\|\leq 1-\psi(\|X-Y\|), [[/math]]

where [math]\psi:[0,2]\to[0,1][/math] is a monotonically increasing function with [math]\psi(r) \gt 0[/math] for all [math]r \gt 0[/math].

Lemma

A Hilbert space [math](\mathcal{H},\langle\cdot,\cdot\rangle)[/math] is uniformly convex.

Show Proof

For [math]X,Y\in \mathcal{H}[/math] with [math]\|X\|,\|Y\|\leq 1[/math] then by parallelogram identity we have

[[math]] \left\|\frac{X+Y}{2}\right\|=\sqrt{\frac|{1}{2}\|X\|^2+\frac{1}{2}\|Y\|^2-\frac{1}{2}\|X-Y\|^2}\leq \sqrt{1-\frac{1}{2}\|X-Y\|^2}=1-\psi(\|X-Y\|) [[/math]]

as required, with [math]\psi(r)=1-\sqrt{1-\frac{1}{2}r^2}[/math].

■

Heuristically, we can think of Definition 1.4 as having the following geometrical meaning. If vectors [math]X[/math] and [math]Y[/math] have norm (length) one, then their mid-point [math]\frac{X+Y}{2}[/math] has much smaller norm unless [math]X[/math] and [math]Y[/math] are very close together. This accords closely with the geometrical intuition from finite-dimensions spaces with euclidean distance. The following theorem, whose conclusion is illustrated in the figure, will have many important consequences for the study of Hilbert spaces.

Theorem (Unique approximation of a closed convex set)

Let [math](V,\|\cdot\|)[/math] be a Banach space with a uniformly convex norm, let [math]K\subset V[/math] be a closed convex subset and assume that [math]v_0\in V[/math]. Then there exists a unique element [math]w\in K[/math] that is closest to [math]v_0[/math] in the sense that [math]w[/math] is the only element of [math]K[/math] with

[[math]] \|w-v_0\|=d(v_0,K)=\inf_{k\in K}\|k-v_0\|. [[/math]]

Show Proof

By translating both the set [math]K[/math] and the point [math]v_0[/math] by [math]-v_0[/math] we may assume without loss of generality that [math]v_0=0[/math]. We define

[[math]] s=\inf_{k\in K}\|k-v_0\|=\inf_{k\in K}\|k\|. [[/math]]

If [math]s=0[/math], then we must have [math]0\in K[/math] since [math]K[/math] is closed and the only choice is then [math]w=v_0=0[/math] (the uniqueness of [math]w[/math] is a consequence of the strict positivity of the norm). So assume that [math]s \gt 0[/math]. By multiplying by the scalar [math]\frac{1}{s}[/math] we have found a point [math]w\in K[/math] with norm 1, then its uniqueness is an immediate consequence of the uniform convexity: if [math]w_1,w_2\in K[/math] have [math]\|w_1\|=\|w_2\|=1[/math], then [math]\frac{w_1+w_2}{2}\in K[/math] because [math]K[/math] is convex. Also, [math]\left\|\frac{w_1+w_2}{2}\right\|=1[/math] by the triangle inequality and since [math]s=1[/math]. By uniform convexity this implies that [math]w_1=w_2[/math]. Turning to the existence, let us first sketch the argument. Choose a sequence [math](k_n)[/math] in [math]K[/math] with [math]\| k_n\|\to 1[/math] as [math]n\to\infty[/math]. Then the mid-points [math]\frac{k_n+k_m}{2}[/math] also lie in [math]K[/math], since [math]K[/math] is convex. However, this shows that the mid-point must have norm greater than or equal to 1, since [math]s=1[/math]. Therefore [math]k_n[/math] and [math]k_m[/math] must be close together by uniform convexity. Making this precise, we will see that [math](k_n)[/math] is a Cauchy sequence. Since [math]V[/math] is complete and [math]K[/math] is closed, this will give a point [math]w\in K[/math] with [math]\|w\|=1=s[/math] as required. To make this more precise, we apply uniform convexity to the normalized vectors

[[math]] x_n=\frac{1}{s_n}k_n, [[/math]]

where [math]s_n=\|k_n\|[/math]. The mid-point of [math]x_n[/math] and [math]x_m[/math] can now be expressed as

[[math]] \frac{x_m+x_n}{2}=\frac{1}{2s_m}k_m+\frac{1}{2s_n}k_n=\left(\frac{1}{2s_m}+\frac{1}{2s_n}\right (ak_m+bk_n) [[/math]]

with

[[math]] a=\frac{\frac{1}{2s_m}}{\frac{1}{2s_m}+\frac{1}{2s_n}}\geq0 [[/math]]

[[math]] b=\frac{\frac{1}{2s_n}}{\frac{1}{2s_m}+\frac{1}{2s_n}}\geq0 [[/math]]

and [math]a+b=1[/math]. Therefore [math]ak_m+bk_n\in K[/math] by convexity and so

[[math]] \left\|\frac{x_m+x_n}{2}\right\|=\left(\frac{1}{2s_m}+\frac{1}{2s_n}\right)\|ak_m+bk_n\|\geq \frac{1}{2s_m}+\frac{1}{2s_n}. [[/math]]

Let [math]\psi[/math] be as in Definition 1.4 and fix [math]\varepsilon \gt 0[/math]. Choose [math]N=N(\varepsilon)[/math] large enough to ensure that [math]m\geq N[/math] implies that

[[math]] \frac{1}{s_m}\geq 1-\psi(\varepsilon). [[/math]]

Then [math]m,n\geq N[/math] implies that

[[math]] \frac{1}{2s_m}+\frac{1}{s_n}\geq 1-\psi(\varepsilon), [[/math]]

which together with the definition of uniform convexity gives

[[math]] 1-\psi(\|x_m-x_n\|)\geq \left\|\frac{x_m+x_n}{2}\right\|\geq 1-\psi(\varepsilon). [[/math]]

By monotonicity of the function [math]\psi[/math] this implies for all [math]m,n\geq N[/math] that [math]\|x_m-x_n\|\leq \varepsilon[/math], showing that [math](x_n)[/math] is a Cauchy sequence. As [math]V[/math] is assumed to be complete, we deduce that [math](x_n)[/math] converges to some [math]x\in V[/math]. Since [math]s_n\to 1[/math] and [math]k_n\to s_nx_n[/math] as [math]n\to\infty[/math] it follows that [math]\lim_{n\to\infty}k_n=x[/math]. As [math]K[/math] is closed the limit [math]x[/math] belongs to [math]K[/math] and by contradiction is an (and hence is the unique) element closest to [math]v_0=0[/math].

■

This unique approximation is clearly true for Hilbert spaces, since they are uniformly convex spaces.

The unique closest element of [math]K[/math] to [math]v_0[/math].

Corollary (Orthogonal decomposition)

Let [math]\mathcal{H}[/math] be a Hilbert space and let [math]\mathcal{L}\subset\mathcal{H}[/math] be a closed subspace. Then [math]\mathcal{L}^\perp[/math] is a closed subspace with

[[math]] \mathcal{H}=\mathcal{L}\oplus\mathcal{L}^\perp, [[/math]]

meaning that every element [math]H\in \mathcal{H}[/math] can be written in the form

[[math]] H=Y+Z [[/math]]

with [math]Y\in \mathcal{L}[/math] and [math]Z\in\mathcal{L}^\perp[/math] and [math]Y[/math] and [math]Z[/math] are unique with these properties. Moreover, [math]Y=(Y^\perp)^\perp[/math] and

[[math]] \|H\|^2=\|Y\|^2+\|Z\|^2 [[/math]]

if [math]H=Y+Z[/math] with [math]Y\in\mathcal{L}[/math] and [math]Z\in\mathcal{L}^\perp[/math].

Show Proof

As [math]H\mapsto \langle H,Y\rangle[/math] is a (continuous linear) functional for each [math]Y\in\mathcal{L}[/math], the set [math]\mathcal{L}^\perp[/math] is an intersection of closed subspaces and hence is a closed subspace. Using positivity of the inner product, it is easy to see that [math]\mathcal{L}\cap\mathcal{L}^\perp=\{0\}[/math] and from this the uniqueness of the decomposition

[[math]] H=Y+Z [[/math]]

with [math]Y\in\mathcal{L}[/math] and [math]Z\in\mathcal{L}^\perp[/math] follows at once. So it remains to show the existence of this decomposition. Fix [math]H\in\mathcal{H}[/math] and apply the theorem of unique approximation with [math]K=\mathcal{L}[/math] to find a point [math]Y\in\mathcal{L}[/math] that is closest to [math]h[/math]. Let [math]Z=H-Y[/math], so that for any [math]v\in \mathcal{L}[/math] and any scalar [math]t[/math] we have

[[math]] \|Z\|^2\leq \|H-\underbrace{(tv+Y)}_{\in\mathcal{L}}\|^2=\|Z-tv\|^2=\|Z\|^2-2t\langle v,Z\rangle +\vert t\vert^2\|Z\|^2. [[/math]]

However, this shows that [math]t\langle v,Z\rangle=0[/math] for all scalars [math]t[/math] and [math]v\in\mathcal{L}[/math] and so [math]\langle v,Z\rangle=0[/math] for all [math]v\in\mathcal{L}[/math]. Thus [math]Z\in\mathcal{L}^\perp[/math] and hence

[[math]] \|H\|^2=\langle H,H\rangle=\langle Y+Z,Y+Z\rangle=\|Y\|^2+\|Z\|^2. [[/math]]

It is clear from the definitions that [math]\mathcal{L}\subset (\mathcal{L}^\perp)^\perp[/math]. If [math]v\in(\mathcal{L}^\perp)^\perp[/math] then

[[math]] v=Y+Z [[/math]]

for some [math]Y\in\mathcal{L}[/math] and [math]Z\in\mathcal{L}^\perp[/math] by the first part of the proof. However,

[[math]] 0=\langle v,Z\rangle=\|Z\|^2 [[/math]]

implies that [math]v=Y[/math] and so [math]\mathcal{L}=(\mathcal{L}^\perp)^\perp[/math].

■

Orthogonal projection

Let again [math]\mathcal{H}[/math] be a Hilbert space. The projection of an element [math]X\in\mathcal{H}[/math] onto a closed subspace [math]\mathcal{L}\subset\mathcal{H}[/math] is the unique point [math]Y\in\mathcal{L}[/math] such that

[[math]] d(X,\mathcal{L})=\|X-Y\|. [[/math]]

We denote this projection by

[[math]] \Pi:\mathcal{H}\to\mathcal{L},X\mapsto \Pi X=Y [[/math]]

Theorem

Let [math]\mathcal{H}[/math] be a Hilbert space and let [math]\mathcal{L}\subset\mathcal{H}[/math] be a closed subspace. Then the projection operator [math]\Pi[/math] of [math]\mathcal{H}[/math] onto [math]\mathcal{L}[/math] satisfies

[math]\Pi^2=\Pi[/math]
[math]\begin{cases}\Pi X=X,& X\in\mathcal{L}\\ \Pi X=0,& X\in\mathcal{L}^\perp\end{cases}[/math]
[math](X-\Pi X)\perp \mathcal{L}[/math] for all [math]X\in\mathcal{H}[/math]

Show Proof

[math](i)[/math] is clear. The first statement of [math](ii)[/math] is clear from [math](i)[/math]. For the second statement of [math](ii)[/math], if [math]X\in\mathcal{L}^\perp[/math], then for [math]Y\in\mathcal{L}[/math] we get

[[math]] \|X-Y\|^2=\|X\|^2+\|Y\|^2. [[/math]]

This is going to be minimized if [math]Y=0[/math]. Hence then [math]\Pi X=0[/math]. For [math](iii)[/math], If [math]Y\in \mathcal{L}[/math] we get

[[math]] \|X-\underbrace{\Pi X}_{\in\mathcal{L}}\|^2\leq \|X-\underbrace{\Pi X-Y}_{\in\mathcal{L}}\|^2=\|Y\|^2+\|X-\Pi X\|^2-2\langle X-\Pi X,Y\rangle. [[/math]]

Therefore

[[math]] \begin{equation} 2\langle X-\Pi X,Y\rangle\leq \|Y\|^2 \end{equation} [[/math]]

for all [math]Y\in \mathcal{L}[/math]. Now since [math]\mathcal{L}[/math] is a linear space we get that for all [math]\alpha \gt 0[/math]

[[math]] \alpha Y\in\mathcal{L}. [[/math]]

So in particular (1) is true when [math]Y[/math] is replaced by [math]\alpha Y[/math]. Therefore we get

[[math]] 2\langle X-\Pi X,Y\rangle\leq \alpha\|Y\|^2. [[/math]]

Now let [math]\alpha\to 0[/math]. Hence we obtain

[[math]] \langle X-\Pi X,Y\rangle\leq 0 [[/math]]

for all [math]Y\in \mathcal{L}[/math]. Since [math]-Y\in\mathcal{L}[/math] we get that

[[math]] -\langle X-\Pi X,Y\rangle \leq 0. [[/math]]

But this means [math]\langle X-\Pi X,Y\rangle=0[/math] and the claim follows.

■

Corollary

Let [math]\mathcal{H}[/math] be a Hilbert space and let [math]\mathcal{L}\subset\mathcal{H}[/math] be a closed subspace. Moreover let [math]\Pi[/math] be the projection operator of [math]\mathcal{H}[/math] onto [math]\mathcal{L}[/math]. Then

[[math]] X=(X-\Pi X)+\Pi X [[/math]]

is the unique representation of [math]X[/math] as the sum of an element of [math]\mathcal{L}[/math] and an element of [math]\mathcal{L}^\perp[/math].

Show Proof

This is just a consequence of Corollary

■

The uniqueness of the projection operator implies that, for [math]X_1\in\mathcal{L}[/math] and [math]X_2\in\mathcal{L}^\perp[/math]

[[math]] \Pi(X_1+X_2)=X_1. [[/math]]

Corollary

Let [math]\mathcal{H}[/math] be a Hilbert space and let [math]\mathcal{L}\subset\mathcal{H}[/math] be a closed subspace. Moreover let [math]\Pi[/math] be the projection operator of [math]\mathcal{H}[/math] onto [math]\mathcal{L}[/math]. Then

[math]\langle \Pi X,Y\rangle =\langle X,\Pi Y\rangle[/math] for all [math]X,Y\in\mathcal{H}[/math].
[math]\Pi(\alpha X+\beta Y)=\alpha\Pi X+\beta \Pi Y[/math] for all [math]\alpha,\beta\in\R[/math] and [math]X,Y\in\mathcal{H}[/math].

Show Proof

For [math](i)[/math], let [math]X,Y\in\mathcal{H}[/math], [math]X=X_1+X_2[/math] with [math]X_1\in\mathcal{L}[/math] and [math]X_2\in\mathcal{L}^\perp[/math] and [math]Y=Y_1+Y_2[/math] with [math]Y_1\in\mathcal{L}[/math] and [math]Y_2\in\mathcal{L}^\perp[/math]. Then we get

[[math]] \langle \Pi X,Y\rangle=\langle \Pi (X_1+X_2),Y_1+Y_2\rangle=\langle X_1,Y_1+Y_2\rangle=\langle X_1,Y_1\rangle [[/math]]

[[math]] \langle X,\Pi Y\rangle=\langle X_1+X_2,\Pi (Y_1+Y_2)\rangle=\langle X_1+X_2,Y_1\rangle=\langle X_1,Y_1\rangle. [[/math]]

Therefore they are the same. For [math](ii)[/math], take [math]\alpha,\beta\in\R[/math] and look at

[[math]] \alpha X+\beta Y=\underbrace{(\alpha X_1+\beta Y_1)}_{\in\mathcal{L}}+\underbrace{(\alpha X_2+\beta Y_2)}_{\in\mathcal{L}^\perp}. [[/math]]

Hence we get

[[math]] \Pi (\alpha X+\beta Y)=\alpha \Pi(X_1+X_2)+\beta\Pi (Y_1+Y_2)=\alpha \Pi X+\beta \Pi Y. [[/math]]

■

General references

Moshayedi, Nima (2020). "Lectures on Probability Theory". arXiv:2010.16280 [math.PR].

Notes

Use the fact [math]\|X+Y\|^2=\langle X+Y,X+Y\rangle[/math]. This proof can be found in the notes of measure and integral

[1] Use the fact [math]\|X+Y\|^2=\langle X+Y,X+Y\rangle[/math]. This proof can be found in the notes of measure and integral

[a]