ABy Admin
Jun 25'23

Exercise

[math] \require{textmacros} \def \bbeta {\bf \beta} \def\fat#1{\mbox{\boldmath$#1$}} \def\reminder#1{\marginpar{\rule[0pt]{1mm}{11pt}}\textbf{#1}} \def\SSigma{\bf \Sigma} \def\ttheta{\bf \theta} \def\aalpha{\bf \alpha} \def\ddelta{\bf \delta} \def\eeta{\bf \eta} \def\llambda{\bf \lambda} \def\ggamma{\bf \gamma} \def\nnu{\bf \nu} \def\vvarepsilon{\bf \varepsilon} \def\mmu{\bf \mu} \def\nnu{\bf \nu} \def\ttau{\bf \tau} \def\SSigma{\bf \Sigma} \def\TTheta{\bf \Theta} \def\XXi{\bf \Xi} \def\PPi{\bf \Pi} \def\GGamma{\bf \Gamma} \def\DDelta{\bf \Delta} \def\ssigma{\bf \sigma} \def\UUpsilon{\bf \Upsilon} \def\PPsi{\bf \Psi} \def\PPhi{\bf \Phi} \def\LLambda{\bf \Lambda} \def\OOmega{\bf \Omega} [/math]

Consider the linear regression model [math]\mathbf{Y} = \mathbf{X} \bbeta + \vvarepsilon[/math] with [math]\vvarepsilon \sim \mathcal{N}(\mathbf{0}_n, \sigma^2 \mathbf{I}_nn)[/math]. Goldstein & Smith (1974) proposed a novel generalized ridge estimator of its [math]p[/math]-dimensional regression parameter:

[[math]] \begin{eqnarray*} \hat{\bbeta}_m(\lambda) & = & [ (\mathbf{X}^{\top} \mathbf{X})^m + \lambda \mathbf{I}_{pp} ]^{-1} (\mathbf{X}^{\top} \mathbf{X})^{m-1} \mathbf{X}^{\top} \mathbf{Y}, \end{eqnarray*} [[/math]]

with penalty parameter [math]\lambda \gt 0[/math] and ‘shape’ or ‘rate’ parameter [math]m[/math].

  • Assume, only for part a), that [math]n=p[/math] and the design matrix is orthonormal. Show that, irrespectively of the choice of [math]m[/math], this generalized ridge regression estimator coincides with the ‘regular’ ridge regression estimator.
  • Consider the generalized ridge loss function [math]\| \mathbf{Y} - \mathbf{X} \bbeta \|_2^2 + \bbeta^{\top} \mathbf{A} \bbeta[/math] with [math]\mathbf{A}[/math] a [math]p \times p[/math]-dimensional symmetric matrix. For what [math]\mathbf{A}[/math], does [math]\hat{\bbeta}_m(\lambda)[/math] minimize this loss function?
  • Let [math]d_j[/math] be the [math]j[/math]-th singular value of [math]\mathbf{X}[/math]. Show that in [math]\hat{\bbeta}_m(\lambda)[/math] the singular values are shrunken as [math](d_j^{2m} + \lambda)^{-1} d_j^{2m-1}[/math]. Hint: use the singular value decomposition of [math]\mathbf{X}[/math].
  • Do, for positive singular values, larger [math]m[/math] lead to more shrinkage? Hint: Involve particulars of the singular value in your answer.
  • Express [math]\mathbb{E}[\hat{\bbeta}_m(\lambda)][/math] in terms of the design matrix, model and shrinkage parameters ([math]\lambda[/math] and [math]m[/math]).
  • Express [math]\mbox{Var}[\hat{\bbeta}_m(\lambda)][/math] in terms of the design matrix, model and shrinkage parameters ([math]\lambda[/math] and [math]m[/math]).