⧼exchistory⧽
11 exercise(s) shown, 0 hidden
ABy Admin
Jun 25'23


[math] \require{textmacros} \def \bbeta {\bf \beta} \def\fat#1{\mbox{\boldmath$#1$}} \def\reminder#1{\marginpar{\rule[0pt]{1mm}{11pt}}\textbf{#1}} \def\SSigma{\bf \Sigma} \def\ttheta{\bf \theta} \def\aalpha{\bf \alpha} \def\ddelta{\bf \delta} \def\eeta{\bf \eta} \def\llambda{\bf \lambda} \def\ggamma{\bf \gamma} \def\nnu{\bf \nu} \def\vvarepsilon{\bf \varepsilon} \def\mmu{\bf \mu} \def\nnu{\bf \nu} \def\ttau{\bf \tau} \def\SSigma{\bf \Sigma} \def\TTheta{\bf \Theta} \def\XXi{\bf \Xi} \def\PPi{\bf \Pi} \def\GGamma{\bf \Gamma} \def\DDelta{\bf \Delta} \def\ssigma{\bf \sigma} \def\UUpsilon{\bf \Upsilon} \def\PPsi{\bf \Psi} \def\PPhi{\bf \Phi} \def\LLambda{\bf \Lambda} \def\OOmega{\bf \Omega} [/math]

Find the lasso regression solution for the data below for a general value of [math]\lambda[/math] and for the straight line model [math]Y = \beta_0 + \beta_1 X + \varepsilon[/math] (only apply the lasso penalty to the slope parameter, not to the intercept). Show that when [math]\lambda_1[/math] is chosen as 14, the lasso solution fit is [math]\hat{Y} = 40 + 1.75 X[/math]. Data: [math]\mathbf{X}^{\top} = (X_1, X_2, \ldots, X_{8})^{\top} = (-2, -1, -1, -1, 0, 1, 2, 2)^{\top}[/math], and [math]\mathbf{Y}^{\top} = (Y_1, Y_2, \ldots, Y_{8})^{\top} = (35, 40, 36, 38, 40, 43, 45, 43)^{\top}[/math].

ABy Admin
Jun 25'23

[math] \require{textmacros} \def \bbeta {\bf \beta} \def\fat#1{\mbox{\boldmath$#1$}} \def\reminder#1{\marginpar{\rule[0pt]{1mm}{11pt}}\textbf{#1}} \def\SSigma{\bf \Sigma} \def\ttheta{\bf \theta} \def\aalpha{\bf \alpha} \def\ddelta{\bf \delta} \def\eeta{\bf \eta} \def\llambda{\bf \lambda} \def\ggamma{\bf \gamma} \def\nnu{\bf \nu} \def\vvarepsilon{\bf \varepsilon} \def\mmu{\bf \mu} \def\nnu{\bf \nu} \def\ttau{\bf \tau} \def\SSigma{\bf \Sigma} \def\TTheta{\bf \Theta} \def\XXi{\bf \Xi} \def\PPi{\bf \Pi} \def\GGamma{\bf \Gamma} \def\DDelta{\bf \Delta} \def\ssigma{\bf \sigma} \def\UUpsilon{\bf \Upsilon} \def\PPsi{\bf \Psi} \def\PPhi{\bf \Phi} \def\LLambda{\bf \Lambda} \def\OOmega{\bf \Omega} [/math]

Consider the standard linear regression model [math]Y_i = \mathbf{X}_{i,\ast} \bbeta + \varepsilon_i[/math] for [math]i=1, \ldots, n[/math] and with [math]\varepsilon_i \sim_{i.i.d.} \mathcal{N}(0, \sigma^2)[/math]. The model comprises a single covariate and, depending on the subquestion, an intercept. Data on the response and the covariate are: [math]\{(y_i, x_{i,1})\}_{i=1}^4 = \{ (1.4, 0.0), (1.4, -2.0), (0.8, 0.0), (0.4, 2.0) \}[/math].

  • Evaluate the lasso regression estimator of the model without intercept for the data at hand with [math]\lambda_1 = 0.2[/math].
  • Evaluate the lasso regression estimator of the model with intercept for the data at hand with [math]\lambda_1 = 0.2[/math] that does not apply to the intercept (which is left unpenalized).
ABy Admin
Jun 25'23

[math] \require{textmacros} \def \bbeta {\bf \beta} \def\fat#1{\mbox{\boldmath$#1$}} \def\reminder#1{\marginpar{\rule[0pt]{1mm}{11pt}}\textbf{#1}} \def\SSigma{\bf \Sigma} \def\ttheta{\bf \theta} \def\aalpha{\bf \alpha} \def\ddelta{\bf \delta} \def\eeta{\bf \eta} \def\llambda{\bf \lambda} \def\ggamma{\bf \gamma} \def\nnu{\bf \nu} \def\vvarepsilon{\bf \varepsilon} \def\mmu{\bf \mu} \def\nnu{\bf \nu} \def\ttau{\bf \tau} \def\SSigma{\bf \Sigma} \def\TTheta{\bf \Theta} \def\XXi{\bf \Xi} \def\PPi{\bf \Pi} \def\GGamma{\bf \Gamma} \def\DDelta{\bf \Delta} \def\ssigma{\bf \sigma} \def\UUpsilon{\bf \Upsilon} \def\PPsi{\bf \Psi} \def\PPhi{\bf \Phi} \def\LLambda{\bf \Lambda} \def\OOmega{\bf \Omega} [/math]

Plot the regularization path of the lasso regression estimator over the range [math]\lambda_1 \in (0, 160][/math] using the data of Example.

ABy Admin
Jun 25'23

[math] \require{textmacros} \def \bbeta {\bf \beta} \def\fat#1{\mbox{\boldmath$#1$}} \def\reminder#1{\marginpar{\rule[0pt]{1mm}{11pt}}\textbf{#1}} \def\SSigma{\bf \Sigma} \def\ttheta{\bf \theta} \def\aalpha{\bf \alpha} \def\ddelta{\bf \delta} \def\eeta{\bf \eta} \def\llambda{\bf \lambda} \def\ggamma{\bf \gamma} \def\nnu{\bf \nu} \def\vvarepsilon{\bf \varepsilon} \def\mmu{\bf \mu} \def\nnu{\bf \nu} \def\ttau{\bf \tau} \def\SSigma{\bf \Sigma} \def\TTheta{\bf \Theta} \def\XXi{\bf \Xi} \def\PPi{\bf \Pi} \def\GGamma{\bf \Gamma} \def\DDelta{\bf \Delta} \def\ssigma{\bf \sigma} \def\UUpsilon{\bf \Upsilon} \def\PPsi{\bf \Psi} \def\PPhi{\bf \Phi} \def\LLambda{\bf \Lambda} \def\OOmega{\bf \Omega} [/math]

Consider the standard linear regression model [math]Y_i = X_{i,1} \beta_1 + X_{i,2} \beta_2 + \varepsilon_i[/math] for [math]i=1, \ldots, n[/math] and with the [math]\varepsilon_i[/math] i.i.d. normally distributed with zero mean and some known common variance. In the estimation of the regression parameter [math](\beta_1, \beta_2)^{\top}[/math] a lasso penalty is used: [math]\lambda_{1,1} | \beta_1 | + \lambda_{1,2} | \beta_2 |[/math] with penalty parameters [math]\lambda_{1,1}, \lambda_{1,2} \gt 0[/math].

  • Let [math]\lambda_{1,1} = \lambda_{1,2}[/math] and assume the covariates are orthogonal with the spread of the first covariate being much larger than that of the second. Draw a plot with [math]\beta_1[/math] and [math]\beta_2[/math] on the [math]x[/math]- and [math]y[/math]-axis, repectively. Sketch the parameter constraint as implied by the lasso penalty. Add the levels sets of the sum-of-squares, [math]\| \mathbf{Y} - \mathbf{X} \bbeta \|_2^2[/math], loss criterion. Use the plot to explain why the lasso tends to select covariates with larger spread.
  • Assume the covariates to be orthonormal. Let [math]\lambda_{1,2} \gg \lambda_{1,1}[/math]. Redraw the plot of part a of this exercise. Use the plot to explain the effect of differening [math]\lambda_{1,1}[/math] and [math]\lambda_{1,2}[/math] on the resulting lasso estimate.
  • Show that the two cases (i.e. the assumptions on the covariates and penalty parameters) of part a and b of this exercise are equivalent, in the sense that their loss functions can be rewritten in terms of the other.
ABy Admin
Jun 25'23

[math] \require{textmacros} \def \bbeta {\bf \beta} \def\fat#1{\mbox{\boldmath$#1$}} \def\reminder#1{\marginpar{\rule[0pt]{1mm}{11pt}}\textbf{#1}} \def\SSigma{\bf \Sigma} \def\ttheta{\bf \theta} \def\aalpha{\bf \alpha} \def\ddelta{\bf \delta} \def\eeta{\bf \eta} \def\llambda{\bf \lambda} \def\ggamma{\bf \gamma} \def\nnu{\bf \nu} \def\vvarepsilon{\bf \varepsilon} \def\mmu{\bf \mu} \def\nnu{\bf \nu} \def\ttau{\bf \tau} \def\SSigma{\bf \Sigma} \def\TTheta{\bf \Theta} \def\XXi{\bf \Xi} \def\PPi{\bf \Pi} \def\GGamma{\bf \Gamma} \def\DDelta{\bf \Delta} \def\ssigma{\bf \sigma} \def\UUpsilon{\bf \Upsilon} \def\PPsi{\bf \Psi} \def\PPhi{\bf \Phi} \def\LLambda{\bf \Lambda} \def\OOmega{\bf \Omega} [/math]

Investigate the effect of the variance of the covariates on variable selection by the lasso. Hereto consider the toy model: [math]Y_i = X_{1i} + X_{2i} + \varepsilon_i[/math], where [math]\epsilon_i \sim \mathcal{N}(0, 1)[/math], [math]X_{1i} \sim \mathcal{N}(0, 1)[/math], and [math]X_{2i} = a \, X_{1i}[/math] with [math]a \in [0, 2][/math]. Draw a hundred samples for both [math]X_{1i}[/math] and [math]\varepsilon_i[/math] and construct both [math]X_{2i}[/math] and [math]Y_i[/math] for a grid of [math]a[/math]'s. Fit the model by means of the lasso regression estimator with [math]\lambda_1=1[/math] for each choice of [math]a[/math]. Plot e.g. in one figure a) the variance of [math]X_{i1}[/math], b) the variance of [math]X_{2i}[/math], and c) the indicator of the selection of [math]X_{2i}[/math]. Which covariate is selected for which values of scale parameter [math]a[/math]?

ABy Admin
Jun 25'23

[math] \require{textmacros} \def \bbeta {\bf \beta} \def\fat#1{\mbox{\boldmath$#1$}} \def\reminder#1{\marginpar{\rule[0pt]{1mm}{11pt}}\textbf{#1}} \def\SSigma{\bf \Sigma} \def\ttheta{\bf \theta} \def\aalpha{\bf \alpha} \def\ddelta{\bf \delta} \def\eeta{\bf \eta} \def\llambda{\bf \lambda} \def\ggamma{\bf \gamma} \def\nnu{\bf \nu} \def\vvarepsilon{\bf \varepsilon} \def\mmu{\bf \mu} \def\nnu{\bf \nu} \def\ttau{\bf \tau} \def\SSigma{\bf \Sigma} \def\TTheta{\bf \Theta} \def\XXi{\bf \Xi} \def\PPi{\bf \Pi} \def\GGamma{\bf \Gamma} \def\DDelta{\bf \Delta} \def\ssigma{\bf \sigma} \def\UUpsilon{\bf \Upsilon} \def\PPsi{\bf \Psi} \def\PPhi{\bf \Phi} \def\LLambda{\bf \Lambda} \def\OOmega{\bf \Omega} [/math]

Show the non-uniqueness of the lasso regression estimator for [math]p \gt 2[/math] when the design matrix [math]\mathbf{X}[/math] contains linearly dependent columns.

ABy Admin
Jun 25'23

[math] \require{textmacros} \def \bbeta {\bf \beta} \def\fat#1{\mbox{\boldmath$#1$}} \def\reminder#1{\marginpar{\rule[0pt]{1mm}{11pt}}\textbf{#1}} \def\SSigma{\bf \Sigma} \def\ttheta{\bf \theta} \def\aalpha{\bf \alpha} \def\ddelta{\bf \delta} \def\eeta{\bf \eta} \def\llambda{\bf \lambda} \def\ggamma{\bf \gamma} \def\nnu{\bf \nu} \def\vvarepsilon{\bf \varepsilon} \def\mmu{\bf \mu} \def\nnu{\bf \nu} \def\ttau{\bf \tau} \def\SSigma{\bf \Sigma} \def\TTheta{\bf \Theta} \def\XXi{\bf \Xi} \def\PPi{\bf \Pi} \def\GGamma{\bf \Gamma} \def\DDelta{\bf \Delta} \def\ssigma{\bf \sigma} \def\UUpsilon{\bf \Upsilon} \def\PPsi{\bf \Psi} \def\PPhi{\bf \Phi} \def\LLambda{\bf \Lambda} \def\OOmega{\bf \Omega} [/math]

Consider the linear regression model [math]\mathbf{Y} = \mathbf{X} \bbeta + \vvarepsilon[/math] with [math]\vvarepsilon \sim \mathcal{N}(0,\sigma^2)[/math] and an [math]n \times 2[/math]-dimensional design matrix with zero-centered and standardized but collinear columns, i.e.:

[[math]] \begin{eqnarray*} \mathbf{X}^{\top} \mathbf{X} & = & \left( \begin{array}{ll} 1 & \rho \\ \rho & 1 \end{array} \right) \end{eqnarray*} [[/math]]

with [math]\rho \in (-1, 1)[/math]. Then, an analytic expression for the lasso regression estimator exists. Show that:

[[math]] \begin{eqnarray*} \hat{\beta}_j (\lambda_1) & = & \left\{ \begin{array}{ll} \mbox{sgn}(\hat{\beta}_j) [| \hat{\beta}_j | - \tfrac{1}{2} \lambda_1 (1+\rho)^{-1}]_+ & \mbox{ if } \, \mbox{sgn}[\hat{\beta}_1 (\lambda_1)] = \mbox{sgn}[\hat{\beta}_2 (\lambda_1)], \\ & \hat{\beta}_j (\lambda_1) \not= 0 \not= \hat{\beta}_2 (\lambda_1), \\ \mbox{sgn}(\hat{\beta}_j) [| \hat{\beta}_j | - \tfrac{1}{2} \lambda_1 (1-\rho)^{-1}]_+ & \mbox{ if } \, \mbox{sgn}[\hat{\beta}_1 (\lambda_1)] \not= \mbox{sgn}[\hat{\beta}_2 (\lambda_1)], \\ & \hat{\beta}_1 (\lambda_1) \not= 0 \not= \hat{\beta}_2 (\lambda_1), \\ \left\{ \begin{array}{lcl} 0 & \mbox{ if } & j \not= \arg \max_{j'} \{ | \hat{\beta}_{j'}^{\mbox{{\tiny (ols)}}} | \} \\ \mbox{sgn}(\tilde{\beta}_j) ( | \tilde{\beta}_j | - \tfrac{1}{2} \lambda_1)_+ & \mbox{ if } & j = \arg \max_{j'} \{ | \hat{\beta}_{j'}^{\mbox{{\tiny (ols)}}} | \} \end{array} \right. & \mbox{ otherwise, } \end{array} \right. \end{eqnarray*} [[/math]]

where [math]\tilde{\beta}_j = (\mathbf{X}_{\ast,j}^{\top} \mathbf{X}_{\ast,j})^{-1} \mathbf{X}_{\ast,j}^{\top} \mathbf{Y}[/math].

ABy Admin
Jun 25'23

[math] \require{textmacros} \def \bbeta {\bf \beta} \def\fat#1{\mbox{\boldmath$#1$}} \def\reminder#1{\marginpar{\rule[0pt]{1mm}{11pt}}\textbf{#1}} \def\SSigma{\bf \Sigma} \def\ttheta{\bf \theta} \def\aalpha{\bf \alpha} \def\ddelta{\bf \delta} \def\eeta{\bf \eta} \def\llambda{\bf \lambda} \def\ggamma{\bf \gamma} \def\nnu{\bf \nu} \def\vvarepsilon{\bf \varepsilon} \def\mmu{\bf \mu} \def\nnu{\bf \nu} \def\ttau{\bf \tau} \def\SSigma{\bf \Sigma} \def\TTheta{\bf \Theta} \def\XXi{\bf \Xi} \def\PPi{\bf \Pi} \def\GGamma{\bf \Gamma} \def\DDelta{\bf \Delta} \def\ssigma{\bf \sigma} \def\UUpsilon{\bf \Upsilon} \def\PPsi{\bf \Psi} \def\PPhi{\bf \Phi} \def\LLambda{\bf \Lambda} \def\OOmega{\bf \Omega} [/math]

Consider the standard linear regression model [math]Y_i = \mathbf{X}_{i,\ast} \bbeta + \varepsilon_i[/math] for [math]i=1, \ldots, n[/math] and with the [math]\varepsilon_i[/math] i.i.d. normally distributed with zero mean and a common variance. Moreover, [math]\mathbf{X}_{\ast,j} = \mathbf{X}_{\ast,j'}[/math] for all [math]j, j'=1, \ldots, p[/math] and [math]\sum_{i=1}^n X_{i,j}^2 = 1[/math]. Question revealed that in this case all elements of the ridge regression estimator are equal, irrespective of the choice of the penalty parameter [math]\lambda_2[/math]. Does this hold for the lasso regression estimator? Motivate your answer.

ABy Admin
Jun 25'23

[math] \require{textmacros} \def \bbeta {\bf \beta} \def\fat#1{\mbox{\boldmath$#1$}} \def\reminder#1{\marginpar{\rule[0pt]{1mm}{11pt}}\textbf{#1}} \def\SSigma{\bf \Sigma} \def\ttheta{\bf \theta} \def\aalpha{\bf \alpha} \def\ddelta{\bf \delta} \def\eeta{\bf \eta} \def\llambda{\bf \lambda} \def\ggamma{\bf \gamma} \def\nnu{\bf \nu} \def\vvarepsilon{\bf \varepsilon} \def\mmu{\bf \mu} \def\nnu{\bf \nu} \def\ttau{\bf \tau} \def\SSigma{\bf \Sigma} \def\TTheta{\bf \Theta} \def\XXi{\bf \Xi} \def\PPi{\bf \Pi} \def\GGamma{\bf \Gamma} \def\DDelta{\bf \Delta} \def\ssigma{\bf \sigma} \def\UUpsilon{\bf \Upsilon} \def\PPsi{\bf \Psi} \def\PPhi{\bf \Phi} \def\LLambda{\bf \Lambda} \def\OOmega{\bf \Omega} [/math]

Consider the linear regression model [math]Y_i = \mathbf{X}_{i,\ast} \bbeta + \varepsilon_i[/math] for [math]i=1, \ldots, n[/math] and with the [math]\varepsilon_i[/math] i.i.d. normally distributed with zero mean and a common variance. Relevant information on the response and design matrix are summarized as:

[[math]] \begin{eqnarray*} \mathbf{X}^{\top} \mathbf{X} = \left( \begin{array}{rr} 3 & -2 \\ -2 & 2 \end{array} \right), \qquad \mathbf{X}^{\top} \mathbf{Y} = \left( \begin{array}{r} 3 \\ -1 \end{array} \right). \end{eqnarray*} [[/math]]

The lasso regression estimator is used to learn parameter [math]\bbeta[/math].

  • Show that the lasso regression estimator is given by:
    [[math]] \begin{eqnarray*} \hat{\bbeta}(\lambda_1) & = & \arg \min_{\bbeta \in \mathbb{R}^2} 3 \beta_1^2 + 2 \beta_2^2 - 4 \beta_1 \beta_2 - 6 \beta_1 + 2 \beta_2 + \lambda_1 | \beta_1 | + \lambda_1 | \beta_2|. \end{eqnarray*} [[/math]]
  • For [math]\lambda_{1} = 0.2[/math] the lasso estimate of the second element of [math]\bbeta[/math] is [math]\hat{\beta}_2(\lambda_1) = 1.25[/math]. Determine the corresponding value of [math]\hat{\beta}_1(\lambda_1)[/math].
  • Determine the smallest [math]\lambda_1[/math] for which it is guaranteed that [math]\hat{\bbeta}(\lambda_1) = \mathbf{0}_2[/math].
ABy Admin
Jun 25'23

[math] \require{textmacros} \def \bbeta {\bf \beta} \def\fat#1{\mbox{\boldmath$#1$}} \def\reminder#1{\marginpar{\rule[0pt]{1mm}{11pt}}\textbf{#1}} \def\SSigma{\bf \Sigma} \def\ttheta{\bf \theta} \def\aalpha{\bf \alpha} \def\ddelta{\bf \delta} \def\eeta{\bf \eta} \def\llambda{\bf \lambda} \def\ggamma{\bf \gamma} \def\nnu{\bf \nu} \def\vvarepsilon{\bf \varepsilon} \def\mmu{\bf \mu} \def\nnu{\bf \nu} \def\ttau{\bf \tau} \def\SSigma{\bf \Sigma} \def\TTheta{\bf \Theta} \def\XXi{\bf \Xi} \def\PPi{\bf \Pi} \def\GGamma{\bf \Gamma} \def\DDelta{\bf \Delta} \def\ssigma{\bf \sigma} \def\UUpsilon{\bf \Upsilon} \def\PPsi{\bf \Psi} \def\PPhi{\bf \Phi} \def\LLambda{\bf \Lambda} \def\OOmega{\bf \Omega} [/math]

Show [math]\| \hat{\bbeta}(\lambda_1)\|_1[/math] is monotone decreasing in [math]\lambda_1[/math]. In this assume orthonormality of the design matrix [math]\mathbf{X}[/math].