ABy Admin
Jun 24'23

Exercise

[math] \require{textmacros} \def \bbeta {\bf \beta} \def\fat#1{\mbox{\boldmath$#1$}} \def\reminder#1{\marginpar{\rule[0pt]{1mm}{11pt}}\textbf{#1}} \def\SSigma{\bf \Sigma} \def\ttheta{\bf \theta} \def\aalpha{\bf \alpha} \def\ddelta{\bf \delta} \def\eeta{\bf \eta} \def\llambda{\bf \lambda} \def\ggamma{\bf \gamma} \def\nnu{\bf \nu} \def\vvarepsilon{\bf \varepsilon} \def\mmu{\bf \mu} \def\nnu{\bf \nu} \def\ttau{\bf \tau} \def\SSigma{\bf \Sigma} \def\TTheta{\bf \Theta} \def\XXi{\bf \Xi} \def\PPi{\bf \Pi} \def\GGamma{\bf \Gamma} \def\DDelta{\bf \Delta} \def\ssigma{\bf \sigma} \def\UUpsilon{\bf \Upsilon} \def\PPsi{\bf \Psi} \def\PPhi{\bf \Phi} \def\LLambda{\bf \Lambda} \def\OOmega{\bf \Omega} [/math]

Consider the ridge regression estimator [math]\hat{\bbeta}(\lambda)[/math] of the linear regression model parameter [math]\bbeta[/math]. Its penalty parameter [math]\lambda[/math] may be chosen as the minimizer of Allen's PRESS statistic, i.e.: [math]\lambda_{\mbox{{\tiny opt}}} = \arg \min_{\lambda \gt 0} n^{-1} \sum\nolimits_{i=1}^n [Y_i - \mathbf{X}_{i, \ast} \hat{\bbeta}_{-i}(\lambda)]^2[/math], with the LOOCV ridge regression estimator [math]\hat{\bbeta}_{-i}(\lambda) = (\mathbf{X}_{- i, \ast}^{\top} \mathbf{X}_{- i, \ast} + \lambda \mathbf{I}_{pp})^{-1} \mathbf{X}_{- i, \ast}^{\top} \mathbf{Y}_{- i}[/math]. This is computationally demanding as it involves [math]n[/math] evaluations of [math]\hat{\bbeta}_{-i}(\lambda)[/math], which can be circumvented by rewriting Allen's PRESS statistics. Hereto:

  • Use the Woodbury matrix identity to verify:
    [[math]] \begin{eqnarray*} (\mathbf{X}_{- i, \ast}^{\top} \mathbf{X}_{- i, \ast} + \lambda \mathbf{I}_{pp})^{-1} & = & (\mathbf{X}^{\top} \mathbf{X} + \lambda \mathbf{I}_{pp})^{-1} \\ & & + (\mathbf{X}^{\top} \mathbf{X} + \lambda \mathbf{I}_{pp})^{-1} \mathbf{X}_{i, \ast}^{\top} [ 1 - \mathbf{H}_{ii}(\lambda)]^{-1} \mathbf{X}_{i, \ast} (\mathbf{X}^{\top} \mathbf{X} + \lambda \mathbf{I}_{pp})^{-1}, \end{eqnarray*} [[/math]]
    in which [math]\mathbf{H}_{ii}(\lambda) = \mathbf{X}_{i, \ast} (\mathbf{X}^{\top} \mathbf{X} + \lambda \mathbf{I}_{pp})^{-1} \mathbf{X}_{i, \ast}^{\top}[/math].
  • Rewrite the LOOCV ridge regression estimator to:
    [[math]] \begin{eqnarray*} \hat{\bbeta}_{- i}(\lambda) & = & \hat{\bbeta}(\lambda) - (\mathbf{X}^{\top} \mathbf{X} + \lambda \mathbf{I}_{pp})^{-1} \mathbf{X}_{i, \ast}^{\top} [ 1 - \mathbf{H}_{ii}(\lambda)]^{-1} [ Y_i - \mathbf{X}_{i, \ast} \hat{\bbeta}(\lambda) ]. \end{eqnarray*} [[/math]]
    In this use part a) and the identity [math]\mathbf{X}_{-i}^{\top} \mathbf{Y}_{-i} = \mathbf{X}^{\top} \mathbf{Y} - \mathbf{X}_{i, \ast}^{\top} Y_i[/math].
  • Reformulate, using part b), the prediction error as [math]Y_i - \mathbf{X}_{i, \ast} \hat{\bbeta}_{-i}(\lambda) = [ 1 - \mathbf{H}_{ii}(\lambda)]^{-1} [ Y_i - \mathbf{X}_{i, \ast}^{\top} \hat{\bbeta}(\lambda) ][/math] and express Allen's PRESS statistic as:
    [[math]] \begin{eqnarray*} n^{-1} \sum\nolimits_{i=1}^n [Y_i - \mathbf{X}_{i, \ast} \hat{\bbeta}_{-i}(\lambda)]^2 & = & n^{-1} \| \mathbf{B}(\lambda) [\mathbf{I}_{nn} - \mathbf{H}(\lambda)] \mathbf{Y} \|_ F^2, \end{eqnarray*} [[/math]]
    where [math]\mathbf{B}(\lambda)[/math] is diagonal with [math][\mathbf{B}(\lambda)]_{ii} = [ 1 - \mathbf{H}_{ii}(\lambda)]^{-1}[/math].