Revision as of 01:55, 25 June 2023 by Admin (Created page with "<div class="d-none"> <math> \require{textmacros} \def \bbeta {\bf \beta} \def\fat#1{\mbox{\boldmath$#1$}} \def\reminder#1{\marginpar{\rule[0pt]{1mm}{11pt}}\textbf{#1}} \def\SS...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
ABy Admin
Jun 25'23

Exercise

[math] \require{textmacros} \def \bbeta {\bf \beta} \def\fat#1{\mbox{\boldmath$#1$}} \def\reminder#1{\marginpar{\rule[0pt]{1mm}{11pt}}\textbf{#1}} \def\SSigma{\bf \Sigma} \def\ttheta{\bf \theta} \def\aalpha{\bf \alpha} \def\ddelta{\bf \delta} \def\eeta{\bf \eta} \def\llambda{\bf \lambda} \def\ggamma{\bf \gamma} \def\nnu{\bf \nu} \def\vvarepsilon{\bf \varepsilon} \def\mmu{\bf \mu} \def\nnu{\bf \nu} \def\ttau{\bf \tau} \def\SSigma{\bf \Sigma} \def\TTheta{\bf \Theta} \def\XXi{\bf \Xi} \def\PPi{\bf \Pi} \def\GGamma{\bf \Gamma} \def\DDelta{\bf \Delta} \def\ssigma{\bf \sigma} \def\UUpsilon{\bf \Upsilon} \def\PPsi{\bf \Psi} \def\PPhi{\bf \Phi} \def\LLambda{\bf \Lambda} \def\OOmega{\bf \Omega} [/math]

Augment the lasso penalty with the sum of the absolute differences all pairs of successive regression coefficients:

[[math]] \begin{eqnarray*} \lambda_1 \sum\nolimits_{j=1}^p | \beta_j | + \lambda_{1,f} \sum\nolimits_{j=2}^p | \beta_j - \beta_{j-1} |. \end{eqnarray*} [[/math]]

This augmented lasso penalty is referred to as the fused lasso penalty.

  • Consider the standard multiple linear regression model: [math]Y_i = \sum_{j=1}^p X_{ij} \, \beta_j + \varepsilon_i[/math]. Estimation of the regression parameters takes place via minimization of penalized sum of squares, in which the fused lasso penalty is used with [math]\lambda_1 =0[/math]. Rewrite the corresponding loss function to the standard lasso problem by application of the following change-of-variables: [math]\gamma_1 = \beta_1[/math] and [math]\gamma_{j} = \beta_j - \beta_{j-1}[/math].
  • Investigate on simulated data the effect of the second summand of the fused lasso penalty on the parameter estimates. In this, temporarily set [math]\lambda_1 = 0[/math].
  • Let [math]\lambda_1[/math] equal zero still. Compare the regression estimates of part b) to the ridge estimates with a first-order autoregressive prior. What is qualitatively the difference in the behavior of the two estimates? Hint: plot the full solution path for the penalized estimates of both estimation procedures.
  • How do the estimates of part b) of this question change if we allow [math]\lambda_1 \gt0[/math]?