ABy Admin
Jun 24'23

Exercise

[math] \require{textmacros} \def \bbeta {\bf \beta} \def\fat#1{\mbox{\boldmath$#1$}} \def\reminder#1{\marginpar{\rule[0pt]{1mm}{11pt}}\textbf{#1}} \def\SSigma{\bf \Sigma} \def\ttheta{\bf \theta} \def\aalpha{\bf \alpha} \def\ddelta{\bf \delta} \def\eeta{\bf \eta} \def\llambda{\bf \lambda} \def\ggamma{\bf \gamma} \def\nnu{\bf \nu} \def\vvarepsilon{\bf \varepsilon} \def\mmu{\bf \mu} \def\nnu{\bf \nu} \def\ttau{\bf \tau} \def\SSigma{\bf \Sigma} \def\TTheta{\bf \Theta} \def\XXi{\bf \Xi} \def\PPi{\bf \Pi} \def\GGamma{\bf \Gamma} \def\DDelta{\bf \Delta} \def\ssigma{\bf \sigma} \def\UUpsilon{\bf \Upsilon} \def\PPsi{\bf \Psi} \def\PPhi{\bf \Phi} \def\LLambda{\bf \Lambda} \def\OOmega{\bf \Omega} [/math]

Consider the standard linear regression model [math]Y_i = \mathbf{X}_{i,\ast} \bbeta + \varepsilon_i[/math] for [math]i=1, \ldots, n[/math] and with [math]\varepsilon_i \sim_{i.i.d.} \mathcal{N}(0, \sigma^2)[/math]. The ridge regression estimator of [math]\bbeta[/math] is denoted by [math]\hat{\bbeta}(\lambda)[/math] for [math]\lambda \gt 0[/math].

  • Show:
    [[math]] \begin{eqnarray*} \mbox{tr}\{ \mbox{Var}[ \widehat{\mathbf{Y}} (\lambda)] \} \, \, \, = \, \, \, \sigma^2 \sum\nolimits_{j=1}^p (\mathbf{D}_x)_{jj}^4 [(\mathbf{D}_x)_{jj}^2 + \lambda ]^{-2}, \end{eqnarray*} [[/math]]
    where [math]\widehat{\mathbf{Y}} (\lambda) = \mathbf{X} \hat{\bbeta}(\lambda)[/math] and [math]\mathbf{D}_x[/math] is the diagonal matrix containing the singular values of [math]\mathbf{X}[/math] on its diagonal.
  • The coefficient of determination is defined as:
    [[math]] \begin{eqnarray*} R^2 & = & [\mbox{Var}(\mathbf{Y}) - \mbox{Var}(\widehat{\mathbf{Y}})] / [\mbox{Var}(\mathbf{Y}) ] \, \, \, = \, \, \, [ \mbox{Var}(\mathbf{Y} - \widehat{\mathbf{Y}}) ] / [ \mbox{Var}(\mathbf{Y}) ], \end{eqnarray*} [[/math]]
    where [math]\widehat{\mathbf{Y}} = \mathbf{X} \hat{\bbeta}[/math] with [math]\hat{\bbeta} = (\mathbf{X}^{\top} \mathbf{X})^{-1} \mathbf{X}^{\top} \mathbf{Y}[/math]. Show that the second equality does not hold when [math]\widehat{\mathbf{Y}}[/math] is now replaced by the ridge regression predictor defined as [math]\widehat{\mathbf{Y}}(\lambda) = \mathbf{H}(\lambda) \mathbf{Y}[/math] where [math]\mathbf{H}(\lambda) = \mathbf{X} (\mathbf{X}^{\top} \mathbf{X} + \lambda \mathbf{I}_{pp})^{-1} \mathbf{X}^{\top}[/math]. Hint: Use the fact that [math]\mathbf{H}(\lambda)[/math] is not a projection matrix, i.e. [math]\mathbf{H}(\lambda) \not= [\mathbf{H}(\lambda)]^2[/math].