Exercise
[math] \require{textmacros} \def \bbeta {\bf \beta} \def\fat#1{\mbox{\boldmath$#1$}} \def\reminder#1{\marginpar{\rule[0pt]{1mm}{11pt}}\textbf{#1}} \def\SSigma{\bf \Sigma} \def\ttheta{\bf \theta} \def\aalpha{\bf \alpha} \def\ddelta{\bf \delta} \def\eeta{\bf \eta} \def\llambda{\bf \lambda} \def\ggamma{\bf \gamma} \def\nnu{\bf \nu} \def\vvarepsilon{\bf \varepsilon} \def\mmu{\bf \mu} \def\nnu{\bf \nu} \def\ttau{\bf \tau} \def\SSigma{\bf \Sigma} \def\TTheta{\bf \Theta} \def\XXi{\bf \Xi} \def\PPi{\bf \Pi} \def\GGamma{\bf \Gamma} \def\DDelta{\bf \Delta} \def\ssigma{\bf \sigma} \def\UUpsilon{\bf \Upsilon} \def\PPsi{\bf \Psi} \def\PPhi{\bf \Phi} \def\LLambda{\bf \Lambda} \def\OOmega{\bf \Omega} [/math]
Augment the lasso penalty with the sum of the absolute differences all pairs of successive regression coefficients:
This augmented lasso penalty is referred to as the fused lasso penalty.
- Consider the standard multiple linear regression model: [math]Y_i = \sum_{j=1}^p X_{ij} \, \beta_j + \varepsilon_i[/math]. Estimation of the regression parameters takes place via minimization of penalized sum of squares, in which the fused lasso penalty is used with [math]\lambda_1 =0[/math]. Rewrite the corresponding loss function to the standard lasso problem by application of the following change-of-variables: [math]\gamma_1 = \beta_1[/math] and [math]\gamma_{j} = \beta_j - \beta_{j-1}[/math].
- Investigate on simulated data the effect of the second summand of the fused lasso penalty on the parameter estimates. In this, temporarily set [math]\lambda_1 = 0[/math].
- Let [math]\lambda_1[/math] equal zero still. Compare the regression estimates of part b) to the ridge estimates with a first-order autoregressive prior. What is qualitatively the difference in the behavior of the two estimates? Hint: plot the full solution path for the penalized estimates of both estimation procedures.
- How do the estimates of part b) of this question change if we allow [math]\lambda_1 \gt0[/math]?