Exercise
[math] \require{textmacros} \def \bbeta {\bf \beta} \def\fat#1{\mbox{\boldmath$#1$}} \def\reminder#1{\marginpar{\rule[0pt]{1mm}{11pt}}\textbf{#1}} \def\SSigma{\bf \Sigma} \def\ttheta{\bf \theta} \def\aalpha{\bf \alpha} \def\ddelta{\bf \delta} \def\eeta{\bf \eta} \def\llambda{\bf \lambda} \def\ggamma{\bf \gamma} \def\nnu{\bf \nu} \def\vvarepsilon{\bf \varepsilon} \def\mmu{\bf \mu} \def\nnu{\bf \nu} \def\ttau{\bf \tau} \def\SSigma{\bf \Sigma} \def\TTheta{\bf \Theta} \def\XXi{\bf \Xi} \def\PPi{\bf \Pi} \def\GGamma{\bf \Gamma} \def\DDelta{\bf \Delta} \def\ssigma{\bf \sigma} \def\UUpsilon{\bf \Upsilon} \def\PPsi{\bf \Psi} \def\PPhi{\bf \Phi} \def\LLambda{\bf \Lambda} \def\OOmega{\bf \Omega} [/math]
Consider fitting the linear regression model, [math]\mathbf{Y} = \mathbf{X} \bbeta + \vvarepsilon[/math] with [math]\varepsilon \sim \mathcal{N}(\mathbf{0}_n, \sigma^2 \mathbf{I}_{nn})[/math], to data by means of the ridge regression estimator. This estimator involves the penalty parameter which is said to be positive. It has been suggested, by among others [1], to extend the range of the penalty parameter to the whole set of real numbers. That is, also tolerating negative values. Let's investigate the consequences of allowing negative values of the penalty parameter. Hereto use in the remainder the following numerical values for the design matrix, response, and corresponding summary statistics:
- For which [math]\lambda \lt 0[/math] is the ridge regression estimator [math]\hat{\bbeta}(\lambda) = (\mathbf{X}^{\top} \mathbf{X} + \lambda \mathbf{I}_{22})^{-1} \mathbf{X}^{\top} \mathbf{Y}[/math] well-defined?
- Now consider the ridge regression estimator to be defined via the ridge loss function, i.e.
[[math]] \begin{eqnarray*} \hat{\bbeta} ( \lambda) & = & \arg \min\nolimits_{\bbeta \in \mathbb{R}^2} \| \mathbf{Y} - \mathbf{X} \bbeta \|_2^2 + \lambda \| \bbeta \|_2^2. \end{eqnarray*} [[/math]]Let [math]\lambda = -20[/math]. Plot the level sets of this loss function, and add a point with the corresponding ridge regression estimate [math]\hat{\bbeta}(-20)[/math].
- Verify that the ridge regression estimate [math]\hat{\bbeta}(-20)[/math] is a saddle point of the ridge loss function, as can also be seen from the contour plot generated in part b). Hereto study the eigenvalues of its Hessian matrix. Moreover, specify the range of negative penalty parameters for which the ridge loss function is convex (and does have a unique well-defined minimum).
- Find the minimum of the ridge loss function.
References