ABy Admin
Jun 24'23

Exercise

[math] \require{textmacros} \def \bbeta {\bf \beta} \def\fat#1{\mbox{\boldmath$#1$}} \def\reminder#1{\marginpar{\rule[0pt]{1mm}{11pt}}\textbf{#1}} \def\SSigma{\bf \Sigma} \def\ttheta{\bf \theta} \def\aalpha{\bf \alpha} \def\ddelta{\bf \delta} \def\eeta{\bf \eta} \def\llambda{\bf \lambda} \def\ggamma{\bf \gamma} \def\nnu{\bf \nu} \def\vvarepsilon{\bf \varepsilon} \def\mmu{\bf \mu} \def\nnu{\bf \nu} \def\ttau{\bf \tau} \def\SSigma{\bf \Sigma} \def\TTheta{\bf \Theta} \def\XXi{\bf \Xi} \def\PPi{\bf \Pi} \def\GGamma{\bf \Gamma} \def\DDelta{\bf \Delta} \def\ssigma{\bf \sigma} \def\UUpsilon{\bf \Upsilon} \def\PPsi{\bf \Psi} \def\PPhi{\bf \Phi} \def\LLambda{\bf \Lambda} \def\OOmega{\bf \Omega} [/math]

Provide an alternative proof of Theorem that states the existence of a positive value of the penalty parameter for which the ridge regression estimator has a superior MSE compared to that of its maximum likelihood counterpart. Hereto show that the derivative of the MSE with respect to the penalty parameter is negative at zero. In this use the following results from matrix calculus:

[[math]] \begin{eqnarray*} \frac{d}{d \lambda} \mbox{tr} [ \mathbf{A} (\lambda ) ] & = & \mbox{tr} \Big[ \frac{d}{d \lambda} \mathbf{A} (\lambda ) \Big], \qquad \frac{d}{d \lambda} (\mathbf{A} + \lambda \mathbf{B})^{-1} \, \, \, = \, \, \, - (\mathbf{A} + \lambda \mathbf{B})^{-1} \mathbf{B} (\mathbf{A} + \lambda \mathbf{B})^{-1}, \end{eqnarray*} [[/math]]

and the chain rule

[[math]] \begin{eqnarray*} \frac{d}{d \lambda} \mathbf{A} (\lambda ) \, \mathbf{B} (\lambda ) & = & \Big[ \frac{d}{d \lambda} \mathbf{A} (\lambda ) \Big] \, \mathbf{B} (\lambda ) + \mathbf{A} (\lambda ) \, \Big[ \frac{d}{d \lambda} \mathbf{B} (\lambda ) \Big], \end{eqnarray*} [[/math]]

where [math]\mathbf{A} (\lambda)[/math] and [math]\mathbf{B} (\lambda)[/math] are square, symmetric matrices parameterized by the scalar [math]\lambda[/math].

Note: the proof in the lecture notes is a stronger one, as it provides an interval on the penalty parameter where the MSE of the ridge regression estimator is better than that of the maximum likelihood one.