Exercise
[math] \require{textmacros} \def \bbeta {\bf \beta} \def\fat#1{\mbox{\boldmath$#1$}} \def\reminder#1{\marginpar{\rule[0pt]{1mm}{11pt}}\textbf{#1}} \def\SSigma{\bf \Sigma} \def\ttheta{\bf \theta} \def\aalpha{\bf \alpha} \def\ddelta{\bf \delta} \def\eeta{\bf \eta} \def\llambda{\bf \lambda} \def\ggamma{\bf \gamma} \def\nnu{\bf \nu} \def\vvarepsilon{\bf \varepsilon} \def\mmu{\bf \mu} \def\nnu{\bf \nu} \def\ttau{\bf \tau} \def\SSigma{\bf \Sigma} \def\TTheta{\bf \Theta} \def\XXi{\bf \Xi} \def\PPi{\bf \Pi} \def\GGamma{\bf \Gamma} \def\DDelta{\bf \Delta} \def\ssigma{\bf \sigma} \def\UUpsilon{\bf \Upsilon} \def\PPsi{\bf \Psi} \def\PPhi{\bf \Phi} \def\LLambda{\bf \Lambda} \def\OOmega{\bf \Omega} [/math]
The linear regression model, [math]\mathbf{Y} =\mathbf{X} \bbeta + \vvarepsilon[/math] with [math]\vvarepsilon \sim \mathcal{N}(\mathbf{0}_n, \sigma^2 \mathbf{I}_{nn})[/math], is fitted by to the data with the following response, design matrix, and relevant summary statistics:
Hence, [math]p=2[/math] and [math]n=1[/math]. The fitting uses the ridge regression estimator.
- Section Expectation states that the regularization path of the ridge regression estimator, i.e. [math]\{ \hat{\bbeta}(\lambda) : \lambda \gt 0\}[/math], is confined to a line in [math]\mathbb{R}^2[/math]. Give the details of this line and draw it in the [math](\beta_1, \beta_2)[/math]-plane.
- Verify numerically, for a set of penalty parameter values, whether the corresponding estimates [math]\hat{\bbeta}(\lambda)[/math] are indeed confined to the line found in part a). Do this by plotting the estimates in the [math](\beta_1, \beta_2)[/math]-plane (along with the line found in part a). In this use the following set of [math]\lambda[/math]'s:
lambdas <- exp(seq(log(10^(-15)), log(1), length.out=100))
- Part b) reveals that, for small values of [math]\lambda[/math], the estimates fall outside the line found in part a). Using the theory outlined in Section Expectation , the estimates can be decomposed into a part that falls on this line and a part that is orthogonal to it. The latter is given by [math](\mathbf{I}_{22} - \mathbf{P}_x) \hat{\bbeta}(\lambda)[/math] where [math]\mathbf{P}_x[/math] is the projection matrix onto the space spanned by the columns of [math]\mathbf{X}[/math]. Evaluate the projection matrix [math]\mathbf{P}_x[/math].
- Numerical inaccuracy, resulting from the ill-conditionedness of [math]\mathbf{X}^{\top} \mathbf{X} + \lambda \mathbf{I}_{22}[/math], causes [math](\mathbf{I}_{22} - \mathbf{P}_x) \hat{\bbeta}(\lambda) \not= \mathbf{0}_2[/math]. Verify that the observed non-null [math](\mathbf{I}_{22} - \mathbf{P}_x) \hat{\bbeta}(\lambda)[/math] are indeed due to numerical inaccuracy. Hereto generate a log-log plot of the condition number of [math]\mathbf{X}^{\top} \mathbf{X} + \lambda \mathbf{I}_{22}[/math] vs. the [math]\| (\mathbf{I}_{22} - \mathbf{P}_x) \hat{\bbeta}(\lambda) \|_2[/math] for the provided set of [math]\lambda[/math]'s.