ABy Admin
Jun 25'23

Exercise

[math] \require{textmacros} \def \bbeta {\bf \beta} \def\fat#1{\mbox{\boldmath$#1$}} \def\reminder#1{\marginpar{\rule[0pt]{1mm}{11pt}}\textbf{#1}} \def\SSigma{\bf \Sigma} \def\ttheta{\bf \theta} \def\aalpha{\bf \alpha} \def\ddelta{\bf \delta} \def\eeta{\bf \eta} \def\llambda{\bf \lambda} \def\ggamma{\bf \gamma} \def\nnu{\bf \nu} \def\vvarepsilon{\bf \varepsilon} \def\mmu{\bf \mu} \def\nnu{\bf \nu} \def\ttau{\bf \tau} \def\SSigma{\bf \Sigma} \def\TTheta{\bf \Theta} \def\XXi{\bf \Xi} \def\PPi{\bf \Pi} \def\GGamma{\bf \Gamma} \def\DDelta{\bf \Delta} \def\ssigma{\bf \sigma} \def\UUpsilon{\bf \Upsilon} \def\PPsi{\bf \Psi} \def\PPhi{\bf \Phi} \def\LLambda{\bf \Lambda} \def\OOmega{\bf \Omega} [/math]

Consider the standard linear regression model [math]Y_i = X_{i,1} \beta_1 + X_{i,2} \beta_2 + \varepsilon_i[/math] for [math]i=1, \ldots, n[/math] and with the [math]\varepsilon_i[/math] i.i.d. normally distributed with zero mean and some known common variance. In the estimation of the regression parameter [math](\beta_1, \beta_2)^{\top}[/math] a lasso penalty is used: [math]\lambda_{1,1} | \beta_1 | + \lambda_{1,2} | \beta_2 |[/math] with penalty parameters [math]\lambda_{1,1}, \lambda_{1,2} \gt 0[/math].

  • Let [math]\lambda_{1,1} = \lambda_{1,2}[/math] and assume the covariates are orthogonal with the spread of the first covariate being much larger than that of the second. Draw a plot with [math]\beta_1[/math] and [math]\beta_2[/math] on the [math]x[/math]- and [math]y[/math]-axis, repectively. Sketch the parameter constraint as implied by the lasso penalty. Add the levels sets of the sum-of-squares, [math]\| \mathbf{Y} - \mathbf{X} \bbeta \|_2^2[/math], loss criterion. Use the plot to explain why the lasso tends to select covariates with larger spread.
  • Assume the covariates to be orthonormal. Let [math]\lambda_{1,2} \gg \lambda_{1,1}[/math]. Redraw the plot of part a of this exercise. Use the plot to explain the effect of differening [math]\lambda_{1,1}[/math] and [math]\lambda_{1,2}[/math] on the resulting lasso estimate.
  • Show that the two cases (i.e. the assumptions on the covariates and penalty parameters) of part a and b of this exercise are equivalent, in the sense that their loss functions can be rewritten in terms of the other.