exercise:E40cc9512d

Jun 25'23

Exercise

Investigate the effect of the variance of the covariates on variable selection by the lasso. Hereto consider the toy model: [math]Y_i = X_{1i} + X_{2i} + \varepsilon_i[/math], where [math]\epsilon_i \sim \mathcal{N}(0, 1)[/math], [math]X_{1i} \sim \mathcal{N}(0, 1)[/math], and [math]X_{2i} = a \, X_{1i}[/math] with [math]a \in [0, 2][/math]. Draw a hundred samples for both [math]X_{1i}[/math] and [math]\varepsilon_i[/math] and construct both [math]X_{2i}[/math] and [math]Y_i[/math] for a grid of [math]a[/math]'s. Fit the model by means of the lasso regression estimator with [math]\lambda_1=1[/math] for each choice of [math]a[/math]. Plot e.g. in one figure a) the variance of [math]X_{i1}[/math], b) the variance of [math]X_{2i}[/math], and c) the indicator of the selection of [math]X_{2i}[/math]. Which covariate is selected for which values of scale parameter [math]a[/math]?

Add answer Add answer