exercise:3e2da28af9

Jun 25'23

Exercise

Consider the linear regression model: [math]\mathbf{Y} = \mathbf{X} \bbeta + \vvarepsilon[/math] with [math]\vvarepsilon \sim \mathcal{N} ( \mathbf{0}_{n}, \sigma^2 \mathbf{I}_{nn})[/math]. Let [math]\hat{\bbeta}(\lambda) = (\mathbf{X}^{\top} \mathbf{X} + \lambda \mathbf{I}_{pp})^{-1} \mathbf{X}^{\top} \mathbf{Y}[/math] be the ridge regression estimator with penalty parameter [math]\lambda[/math]. The shrinkage of the ridge regression estimator propogates to the scale of the ‘ridge prediction’ [math]\mathbf{X} \hat{\bbeta}(\lambda)[/math]. To correct (a bit) for the shrinkage, ^[1] propose the alternative ridge regression estimator: [math]\hat{\bbeta}(\alpha) = [ (1-\alpha) \mathbf{X}^{\top} \mathbf{X} + \alpha \mathbf{I}_{pp}]^{-1} \mathbf{X}^{\top} \mathbf{Y}[/math] with shrinkage parameter [math]\alpha \in [0,1][/math].

Let [math]\alpha = \lambda ( 1+ \lambda)^{-1}[/math]. Show that [math]\hat{\bbeta}(\alpha) = (1+\lambda) \hat{\bbeta}(\lambda)[/math] with [math]\hat{\bbeta}(\lambda)[/math] as in the introduction above.
Use part a) and the parametrization of [math]\alpha[/math] provided there to show that the some shrinkage has been undone. That is, show: [math]\mbox{Var}[ \mathbf{X} \hat{\bbeta}(\lambda)] \lt \mbox{Var}[ \mathbf{X} \hat{\bbeta}(\alpha)][/math] for any [math]\lambda \gt 0[/math].
Use the singular value decomposition of [math]\mathbf{X}[/math] to show that [math]\lim_{\alpha \downarrow 0} \hat{\bbeta}(\alpha) = (\mathbf{X}^{\top} \mathbf{X})^{-1} \mathbf{X}^{\top} \mathbf{Y}[/math] (should it exist) and [math]\lim_{\alpha \uparrow 1} \hat{\bbeta}(\alpha) = \mathbf{X}^{\top} \mathbf{Y}[/math].
Derive the expectation, variance and mean squared error of [math]\hat{\bbeta}(\alpha)[/math].
Temporarily assume that [math]p=1[/math] and let [math]\mathbf{X}^{\top} \mathbf{X} = c[/math] for some [math]c \gt 0[/math]. Then, [math]\mbox{MSE}[\hat{\bbeta}(\alpha)] = (c -1)^2 \beta^2 + \sigma^2 c [ (1-\alpha) c + \alpha ]^{-2}[/math]. Does there exist an [math]\alpha \in (0,1)[/math] such that the mean squared error of [math]\hat{\bbeta}(\alpha)[/math] is smaller than that of its maximum likelihood counterpart? Motivate. % Hint: distinguish between different values of [math]c[/math].
Now assume [math]p \gt 1[/math] and an orthonormal design matrix. Specify the regularization path of the alternative ridge regression estimator [math]\hat{\bbeta}(\alpha)[/math].

de Vlaming, R. and Groenen, P. J. F. (2015).The current and future use of ridge regression for prediction in quantitative genetics.BioMed Research International, page Article ID 143712

Add answer Add answer

[deVlaming2015-1] Vlaming, R. and Groenen, P. J. F. (2015).The current and future use of ridge regression for prediction in quantitative genetics.BioMed Research International, page Article ID 143712

[1]