exercise:5826191471

Jun 25'23

Exercise

Consider the linear regression model [math]Y_i = \beta_1 X_{i,1} + \beta_2 X_{i,2} + \varepsilon_i[/math] for [math]i=1, \ldots, n[/math]. Suppose estimates of the regression parameters [math](\beta_1, \beta_2)[/math] of this model are obtained through the minimization of the sum-of-squares augmented with a ridge-type penalty:

[[math]] \begin{eqnarray*} \sum\nolimits_{i=1}^n (Y_i - \beta_1 X_{i,1} - \beta_2 X_{i,2})^2 + \lambda (\beta_1^2 + \beta_2^2 + 2 \nu \beta_1 \beta_2), \end{eqnarray*} [[/math]]

with penalty parameters [math]\lambda \in \mathbb{R}_{\gt 0}[/math] and [math]\nu \in (-1, 1)[/math].

Recall the equivalence between constrained and penalized estimation (cf. Section Constrained estimation ). Sketch (for both [math]\nu=0[/math] and [math]\nu=0.9[/math]) the shape of the parameter constraint induced by the penalty above and describe in words the qualitative difference between both shapes.
When [math]\nu = -1[/math] and [math]\lambda \rightarrow \infty[/math] the estimates of [math]\beta_1[/math] and [math]\beta_2[/math] (resulting from minimization of the penalized loss function above) converge towards each other: [math]\lim_{\lambda \rightarrow \infty} \hat{\beta}_1(\lambda, -1) = \lim_{\lambda \rightarrow \infty} \hat{\beta}_2(\lambda, -1)[/math]. Motivated by this observation a data scientists incorporates the equality constraint [math]\beta_1 = \beta = \beta_2[/math] explicitly into the model, and s/he estimates the ‘joint regression parameter’ [math]\beta[/math] through the minimization (with respect to [math]\beta[/math]) of:
[[math]] \begin{eqnarray*} \sum\nolimits_{i=1}^n (Y_i - \beta X_{i,1} - \beta X_{i,2})^2 + \delta \beta^2, \end{eqnarray*} [[/math]]
with penalty parameter [math]\delta \in \mathbb{R}_{\gt 0}[/math]. The data scientist is surprised to find that resulting estimate [math]\hat{\beta}(\delta)[/math] does not have the same limiting (in the penalty parameter) behavior as the [math]\hat{\beta}_1(\lambda, -1)[/math], i.e. [math]\lim_{\delta \rightarrow \infty} \hat{\beta} (\delta) \not= \lim_{\lambda \rightarrow \infty} \hat{\beta}_1(\lambda, -1)[/math]. Explain the misconception of the data scientist.
Assume that i) [math]n \gg 2[/math], ii) the unpenalized estimates [math](\hat{\beta}_1(0, 0), \hat{\beta}_2(0, 0))^{\top}[/math] equal [math](-2,2)[/math], and iii) that the two covariates [math]X_1[/math] and [math]X_2[/math] are zero-centered, have equal variance, and are strongly negatively correlated. Consider [math](\hat{\beta}_1(\lambda, \nu), \hat{\beta}_2(\lambda, \nu))^{\top}[/math] for both [math]\nu=-0.9[/math] and [math]\nu=0.9[/math]. For which value of [math]\nu[/math] do you expect the sum of the absolute value of the estimates to be largest? Hint: Distinguish between small and large values of [math]\lambda[/math] and think geometrically!

Add answer Add answer