Exercise
[math] \require{textmacros} \def \bbeta {\bf \beta} \def\fat#1{\mbox{\boldmath$#1$}} \def\reminder#1{\marginpar{\rule[0pt]{1mm}{11pt}}\textbf{#1}} \def\SSigma{\bf \Sigma} \def\ttheta{\bf \theta} \def\aalpha{\bf \alpha} \def\ddelta{\bf \delta} \def\eeta{\bf \eta} \def\llambda{\bf \lambda} \def\ggamma{\bf \gamma} \def\nnu{\bf \nu} \def\vvarepsilon{\bf \varepsilon} \def\mmu{\bf \mu} \def\nnu{\bf \nu} \def\ttau{\bf \tau} \def\SSigma{\bf \Sigma} \def\TTheta{\bf \Theta} \def\XXi{\bf \Xi} \def\PPi{\bf \Pi} \def\GGamma{\bf \Gamma} \def\DDelta{\bf \Delta} \def\ssigma{\bf \sigma} \def\UUpsilon{\bf \Upsilon} \def\PPsi{\bf \Psi} \def\PPhi{\bf \Phi} \def\LLambda{\bf \Lambda} \def\OOmega{\bf \Omega} [/math]
Consider the standard linear regression model [math]Y_i = \mathbf{X}_{i,\ast} \bbeta + \varepsilon_i[/math] for [math]i=1, \ldots, n[/math]. Suppose estimates of the regression parameters [math]\bbeta[/math] of this model are obtained through the minimization of the sum-of-squares augmented with a ridge-type penalty:
for known [math]\alpha \in [0,1][/math], nonrandom [math]p[/math]-dimensional target vectors [math]\bbeta_{t,a}[/math] and [math]\bbeta_{t,b}[/math] with [math]\bbeta_{t,a} \not= \bbeta_{t,b}[/math], and penalty parameter [math]\lambda \gt 0[/math]. Here [math]\mathbf{Y} = (Y_1, \ldots, Y_n)^{\top}[/math] and [math]\mathbf{X}[/math] is [math]n \times p[/math] matrix with the [math]n[/math] row-vectors [math]\mathbf{X}_{i,\ast}[/math] stacked.
- When [math]p \gt n[/math] the sum-of-squares part does not have a unique minimum. Does the above employed penalty warrant a unique minimum for the loss function above (i.e., sum-of-squares plus penalty)? Motivate your answer.
- Could it be that for intermediate values of [math]\alpha[/math], i.e. [math]0 \lt \alpha \lt 1[/math], the loss function assumes smaller values than for the boundary values [math]\alpha=0[/math] and [math]\alpha=1[/math]? Motivate your answer.
- Draw the parameter constraint induced by this penalty for [math]\alpha = 0, 0.5[/math] and [math]1[/math] when [math]p = 2[/math]
- Derive the estimator of [math]\bbeta[/math], defined as the minimum of the loss function, explicitly.
- Discuss the behaviour of the estimator [math]\alpha = 0, 0.5[/math] and [math]1[/math] for [math]\lambda \rightarrow \infty[/math].