exercise:7e1d6e3a37

Jun 24'23

Exercise

Consider the ridge regression estimator [math]\hat{\bbeta}(\lambda)[/math] of the linear regression model parameter [math]\bbeta[/math]. Its penalty parameter [math]\lambda[/math] may be chosen as the minimizer of Allen's PRESS statistic, i.e.: [math]\lambda_{\mbox{{\tiny opt}}} = \arg \min_{\lambda \gt 0} n^{-1} \sum\nolimits_{i=1}^n [Y_i - \mathbf{X}_{i, \ast} \hat{\bbeta}_{-i}(\lambda)]^2[/math], with the LOOCV ridge regression estimator [math]\hat{\bbeta}_{-i}(\lambda) = (\mathbf{X}_{- i, \ast}^{\top} \mathbf{X}_{- i, \ast} + \lambda \mathbf{I}_{pp})^{-1} \mathbf{X}_{- i, \ast}^{\top} \mathbf{Y}_{- i}[/math]. This is computationally demanding as it involves [math]n[/math] evaluations of [math]\hat{\bbeta}_{-i}(\lambda)[/math], which can be circumvented by rewriting Allen's PRESS statistics. Hereto:

Use the Woodbury matrix identity to verify:
[[math]] \begin{eqnarray*} (\mathbf{X}_{- i, \ast}^{\top} \mathbf{X}_{- i, \ast} + \lambda \mathbf{I}_{pp})^{-1} & = & (\mathbf{X}^{\top} \mathbf{X} + \lambda \mathbf{I}_{pp})^{-1} \\ & & + (\mathbf{X}^{\top} \mathbf{X} + \lambda \mathbf{I}_{pp})^{-1} \mathbf{X}_{i, \ast}^{\top} [ 1 - \mathbf{H}_{ii}(\lambda)]^{-1} \mathbf{X}_{i, \ast} (\mathbf{X}^{\top} \mathbf{X} + \lambda \mathbf{I}_{pp})^{-1}, \end{eqnarray*} [[/math]]
in which [math]\mathbf{H}_{ii}(\lambda) = \mathbf{X}_{i, \ast} (\mathbf{X}^{\top} \mathbf{X} + \lambda \mathbf{I}_{pp})^{-1} \mathbf{X}_{i, \ast}^{\top}[/math].
Rewrite the LOOCV ridge regression estimator to:
[[math]] \begin{eqnarray*} \hat{\bbeta}_{- i}(\lambda) & = & \hat{\bbeta}(\lambda) - (\mathbf{X}^{\top} \mathbf{X} + \lambda \mathbf{I}_{pp})^{-1} \mathbf{X}_{i, \ast}^{\top} [ 1 - \mathbf{H}_{ii}(\lambda)]^{-1} [ Y_i - \mathbf{X}_{i, \ast} \hat{\bbeta}(\lambda) ]. \end{eqnarray*} [[/math]]
In this use part a) and the identity [math]\mathbf{X}_{-i}^{\top} \mathbf{Y}_{-i} = \mathbf{X}^{\top} \mathbf{Y} - \mathbf{X}_{i, \ast}^{\top} Y_i[/math].
Reformulate, using part b), the prediction error as [math]Y_i - \mathbf{X}_{i, \ast} \hat{\bbeta}_{-i}(\lambda) = [ 1 - \mathbf{H}_{ii}(\lambda)]^{-1} [ Y_i - \mathbf{X}_{i, \ast}^{\top} \hat{\bbeta}(\lambda) ][/math] and express Allen's PRESS statistic as:
[[math]] \begin{eqnarray*} n^{-1} \sum\nolimits_{i=1}^n [Y_i - \mathbf{X}_{i, \ast} \hat{\bbeta}_{-i}(\lambda)]^2 & = & n^{-1} \| \mathbf{B}(\lambda) [\mathbf{I}_{nn} - \mathbf{H}(\lambda)] \mathbf{Y} \|_ F^2, \end{eqnarray*} [[/math]]
where [math]\mathbf{B}(\lambda)[/math] is diagonal with [math][\mathbf{B}(\lambda)]_{ii} = [ 1 - \mathbf{H}_{ii}(\lambda)]^{-1}[/math].

Add answer Add answer