Exercise
[math] \require{textmacros} \def \bbeta {\bf \beta} \def\fat#1{\mbox{\boldmath$#1$}} \def\reminder#1{\marginpar{\rule[0pt]{1mm}{11pt}}\textbf{#1}} \def\SSigma{\bf \Sigma} \def\ttheta{\bf \theta} \def\aalpha{\bf \alpha} \def\ddelta{\bf \delta} \def\eeta{\bf \eta} \def\llambda{\bf \lambda} \def\ggamma{\bf \gamma} \def\nnu{\bf \nu} \def\vvarepsilon{\bf \varepsilon} \def\mmu{\bf \mu} \def\nnu{\bf \nu} \def\ttau{\bf \tau} \def\SSigma{\bf \Sigma} \def\TTheta{\bf \Theta} \def\XXi{\bf \Xi} \def\PPi{\bf \Pi} \def\GGamma{\bf \Gamma} \def\DDelta{\bf \Delta} \def\ssigma{\bf \sigma} \def\UUpsilon{\bf \Upsilon} \def\PPsi{\bf \Psi} \def\PPhi{\bf \Phi} \def\LLambda{\bf \Lambda} \def\OOmega{\bf \Omega} [/math]
Consider the standard linear regression model [math]Y_i = \mathbf{X}_{i,\ast} \bbeta + \varepsilon_i[/math] for [math]i=1, \ldots, n[/math] and with the [math]\varepsilon_i[/math] i.i.d. normally distributed with zero mean and a common variance. The rows of the design matrix [math]\mathbf{X}[/math] have two elements, and neither column represents the intercept, but [math]\mathbf{X}_{\ast, 1} = \mathbf{X}_{\ast, 2}[/math].
- Suppose an estimator of the regression parameter [math]\bbeta[/math] of this model is obtained through the minimization of the sum-of-squares augmented with a ridge penalty, [math]\| \mathbf{Y} - \mathbf{X} \bbeta \|_2^2 + \lambda \| \bbeta \|_2^2[/math], in which [math]\lambda \gt 0[/math] is the penalty parameter. The minimizer is called the ridge estimator and is denoted by [math]\hat{\bbeta}(\lambda)[/math]. Show that [math][\hat{\bbeta}(\lambda)]_1 = [\hat{\bbeta}(\lambda)]_2[/math] for all [math]\lambda \gt 0[/math].
- The covariates are now related as [math]\mathbf{X}_{\ast, 1} = - 2 \mathbf{X}_{\ast, 2}[/math]. Data on the response and the covariates are:
[[math]] \begin{eqnarray*} \{(y_i, x_{i,1}, x_{i,2})\}_{i=1}^6 & = & \{ (1.5, 1.0, -0.5), (1.9, -2.0, 1.0), (-1.6, 1.0, -0.5), \\ & & \, \, \, (0.8, 4.0, -2.0), (0.9, 2.0, -1.0), (\textcolor{white}{-} 0.5, 4.0, -2.0) \}. \end{eqnarray*} [[/math]]Evaluate the ridge regression estimator for these data with [math]\lambda = 1[/math].
- The data are as in part b). Show [math]\hat{\bbeta}(\lambda+\delta) = (52.5 + \lambda) (52.5 + \lambda + \delta)^{-1} \hat{\bbeta}(\lambda)[/math] for a fixed [math]\lambda[/math] and any [math]\delta \gt 0[/math]. That is, given the ridge regression estimator evaluated for a particular value of the penalty parameter [math]\lambda[/math], the remaining regularization path [math]\{ \hat{\bbeta}(\lambda + \delta) \}_{\delta \geq 0}[/math] is known analytically. Hint: Use the singular value decomposition of the design matrix [math]\mathbf{X}[/math] and the fact that its largest singular value equals [math]\sqrt{52.5}[/math].
- The data are as in part b). Consider the model [math]Y_i = X_{i,1} \gamma + \varepsilon_i[/math]. The parameter [math]\gamma[/math] is estimated through minimization of [math]\sum_{i=1}^6 (Y_i - X_{i,1} \gamma)^2 + \lambda_{\gamma} \gamma^2[/math]. The perfectly linear relation of the covariates suggests that the regularization paths of the linear predictors [math]X_{i,1} \hat{\gamma}(\lambda_{\gamma})[/math] and [math]\mathbf{X}_{i,\ast} \hat{\bbeta}(\lambda)[/math] overlap. Find the functional relationship [math]\lambda_{\gamma} = f(\lambda)[/math] such that the resulting linear predictor [math]X_{i,1} \hat{\gamma}(\lambda_{\gamma})[/math] indeed coincides with that obtained from the estimate evaluated in part b) of this exercise, i.e. [math]\mathbf{X} \hat{\bbeta}(\lambda)[/math].