exercise:5d3dd6eb71

Jun 25'23

Exercise

Consider the standard linear regression model [math]Y_i = \mathbf{X}_{i,\ast} \bbeta + \varepsilon_i[/math] for [math]i=1, \ldots, n[/math]. Suppose estimates of the regression parameters [math]\bbeta[/math] of this model are obtained through the minimization of the sum-of-squares augmented with a ridge-type penalty:

[[math]] \begin{eqnarray*} \| \mathbf{Y} - \mathbf{X} \bbeta \|_2^2 + \lambda \big[ (1-\alpha) \| \bbeta - \bbeta_{t,a} \|_2^2 + \alpha \| \bbeta - \bbeta_{t,b} \|_2^2 \big], \end{eqnarray*} [[/math]]

for known [math]\alpha \in [0,1][/math], nonrandom [math]p[/math]-dimensional target vectors [math]\bbeta_{t,a}[/math] and [math]\bbeta_{t,b}[/math] with [math]\bbeta_{t,a} \not= \bbeta_{t,b}[/math], and penalty parameter [math]\lambda \gt 0[/math]. Here [math]\mathbf{Y} = (Y_1, \ldots, Y_n)^{\top}[/math] and [math]\mathbf{X}[/math] is [math]n \times p[/math] matrix with the [math]n[/math] row-vectors [math]\mathbf{X}_{i,\ast}[/math] stacked.

When [math]p \gt n[/math] the sum-of-squares part does not have a unique minimum. Does the above employed penalty warrant a unique minimum for the loss function above (i.e., sum-of-squares plus penalty)? Motivate your answer.
Could it be that for intermediate values of [math]\alpha[/math], i.e. [math]0 \lt \alpha \lt 1[/math], the loss function assumes smaller values than for the boundary values [math]\alpha=0[/math] and [math]\alpha=1[/math]? Motivate your answer.
Draw the parameter constraint induced by this penalty for [math]\alpha = 0, 0.5[/math] and [math]1[/math] when [math]p = 2[/math]
Derive the estimator of [math]\bbeta[/math], defined as the minimum of the loss function, explicitly.
Discuss the behaviour of the estimator [math]\alpha = 0, 0.5[/math] and [math]1[/math] for [math]\lambda \rightarrow \infty[/math].

Add answer Add answer