Exercise
[math] \require{textmacros} \def \bbeta {\bf \beta} \def\fat#1{\mbox{\boldmath$#1$}} \def\reminder#1{\marginpar{\rule[0pt]{1mm}{11pt}}\textbf{#1}} \def\SSigma{\bf \Sigma} \def\ttheta{\bf \theta} \def\aalpha{\bf \alpha} \def\ddelta{\bf \delta} \def\eeta{\bf \eta} \def\llambda{\bf \lambda} \def\ggamma{\bf \gamma} \def\nnu{\bf \nu} \def\vvarepsilon{\bf \varepsilon} \def\mmu{\bf \mu} \def\nnu{\bf \nu} \def\ttau{\bf \tau} \def\SSigma{\bf \Sigma} \def\TTheta{\bf \Theta} \def\XXi{\bf \Xi} \def\PPi{\bf \Pi} \def\GGamma{\bf \Gamma} \def\DDelta{\bf \Delta} \def\ssigma{\bf \sigma} \def\UUpsilon{\bf \Upsilon} \def\PPsi{\bf \Psi} \def\PPhi{\bf \Phi} \def\LLambda{\bf \Lambda} \def\OOmega{\bf \Omega} [/math]
The coefficients [math]\bbeta[/math] of a linear regression model, [math]\mathbf{Y} = \mathbf{X} \bbeta + \vvarepsilon[/math], are estimated by [math]\hat{\bbeta} = (\mathbf{X}^{\top} \mathbf{X})^{-1} \mathbf{X}^{\top} \mathbf{Y}[/math]. The associated fitted values then given by [math]\widehat{\mathbf{Y}} = \mathbf{X} \, \hat{\bbeta} = \mathbf{X} (\mathbf{X}^{\top} \mathbf{X})^{-1} \mathbf{X}^{\top} \mathbf{Y} = \mathbf{H} \mathbf{Y}[/math], where [math]\mathbf{H} =\mathbf{X} (\mathbf{X}^{\top} \mathbf{X})^{-1} \mathbf{X}^{\top}[/math] referred to as the hat matrix. The hat matrix [math]\mathbf{H}[/math] is a projection matrix as it satisfies [math]\mathbf{H} = \mathbf{H}^2[/math]. Hence, linear regression projects the response [math]\mathbf{Y}[/math] onto the vector space spanned by the columns of [math]\mathbf{Y}[/math]. Consequently, the residuals [math]\hat{\vvarepsilon}[/math] and [math]\hat{\mathbf{Y}}[/math] are orthogonal. Now consider the ridge estimator of the regression coefficients: [math]\hat{\bbeta}(\lambda) = (\mathbf{X}^{\top} \mathbf{X} + \lambda \mathbf{I}_{pp})^{-1} \mathbf{X}^{\top} \mathbf{Y}[/math]. Let [math]\hat{\mathbf{Y}}(\lambda) = \mathbf{X} \hat{\bbeta}(\lambda)[/math] be the vector of associated fitted values.
- Show that the ridge hat matrix [math]\mathbf{H}(\lambda) = \mathbf{X} (\mathbf{X}^{\top} \mathbf{X} + \lambda \mathbf{I}_{pp})^{-1} \mathbf{X}^{\top}[/math], associated with ridge regression, is not a projection matrix (for any [math]\lambda \gt 0[/math]), i.e. [math]\mathbf{H}(\lambda) \not= [\mathbf{H}(\lambda)]^2[/math].
- Show that for any [math]\lambda \gt 0[/math] the ‘ridge fit’ [math]\widehat{\mathbf{Y}}(\lambda)[/math] is not orthogonal to the associated ‘ridge residuals’ [math]\hat{\vvarepsilon}(\lambda)[/math], defined as [math]\vvarepsilon(\lambda) = \mathbf{Y} - \mathbf{X} \hat{\bbeta}(\lambda)[/math].