ABy Admin
Jun 24'23

Exercise

[math] \require{textmacros} \def \bbeta {\bf \beta} \def\fat#1{\mbox{\boldmath$#1$}} \def\reminder#1{\marginpar{\rule[0pt]{1mm}{11pt}}\textbf{#1}} \def\SSigma{\bf \Sigma} \def\ttheta{\bf \theta} \def\aalpha{\bf \alpha} \def\ddelta{\bf \delta} \def\eeta{\bf \eta} \def\llambda{\bf \lambda} \def\ggamma{\bf \gamma} \def\nnu{\bf \nu} \def\vvarepsilon{\bf \varepsilon} \def\mmu{\bf \mu} \def\nnu{\bf \nu} \def\ttau{\bf \tau} \def\SSigma{\bf \Sigma} \def\TTheta{\bf \Theta} \def\XXi{\bf \Xi} \def\PPi{\bf \Pi} \def\GGamma{\bf \Gamma} \def\DDelta{\bf \Delta} \def\ssigma{\bf \sigma} \def\UUpsilon{\bf \Upsilon} \def\PPsi{\bf \Psi} \def\PPhi{\bf \Phi} \def\LLambda{\bf \Lambda} \def\OOmega{\bf \Omega} [/math]

Consider the standard linear regression model [math]Y_i = \mathbf{X}_{i,\ast} \bbeta + \varepsilon_i[/math] for [math]i=1, \ldots, n[/math] and with the [math]\varepsilon_i[/math] i.i.d. normally distributed with zero mean and a common variance. Moreover, [math]\mathbf{X}_{\ast,j} = \mathbf{X}_{\ast,j'}[/math] for all [math]j, j'=1, \ldots, p[/math] and [math]\sum_{i=1}^n X_{i,j}^2 = 1[/math]. Show that the ridge regression estimator, defined as [math]\bbeta(\lambda_2) = \arg \min_{\bbeta \in \mathbb{R}^p} \| \mathbf{Y} - \mathbf{X} \bbeta \|_2^2 + \lambda \| \bbeta \|_2^2[/math] for [math]\lambda \gt 0[/math], equals:

[[math]] \begin{eqnarray*} \hat{\bbeta}(\lambda_2) & = & b [ \lambda^{-1} - p (\lambda^{2}+\lambda p)^{-1} ] \mathbf{1}_p, \end{eqnarray*} [[/math]]

where [math]b = \mathbf{X}_{\ast,1}^{\top} \mathbf{Y}[/math]. Hint: you may want to use the Sherman-Morrison formula. Let [math]\mathbf{A}[/math] and [math]\mathbf{B}[/math] be symmetric matrices of the same dimension, with [math]\mathbf{A}[/math] invertible and [math]\mathbf{B}[/math] of rank one. Moreover, define [math]g = \mbox{tr}( \mathbf{A}^{-1} \mathbf{B})[/math]. Then: [math](\mathbf{A} + \mathbf{B})^{-1} = \mathbf{A}^{-1} - (1+g)^{-1} \mathbf{A}^{-1} \mathbf{B} \mathbf{A}^{-1}[/math].