Revision as of 00:06, 25 June 2023 by Admin (Created page with "<div class="d-none"> <math> \require{textmacros} \def \bbeta {\bf \beta} \def\fat#1{\mbox{\boldmath$#1$}} \def\reminder#1{\marginpar{\rule[0pt]{1mm}{11pt}}\textbf{#1}} \def\SS...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
ABy Admin
Jun 25'23

Exercise

[math] \require{textmacros} \def \bbeta {\bf \beta} \def\fat#1{\mbox{\boldmath$#1$}} \def\reminder#1{\marginpar{\rule[0pt]{1mm}{11pt}}\textbf{#1}} \def\SSigma{\bf \Sigma} \def\ttheta{\bf \theta} \def\aalpha{\bf \alpha} \def\ddelta{\bf \delta} \def\eeta{\bf \eta} \def\llambda{\bf \lambda} \def\ggamma{\bf \gamma} \def\nnu{\bf \nu} \def\vvarepsilon{\bf \varepsilon} \def\mmu{\bf \mu} \def\nnu{\bf \nu} \def\ttau{\bf \tau} \def\SSigma{\bf \Sigma} \def\TTheta{\bf \Theta} \def\XXi{\bf \Xi} \def\PPi{\bf \Pi} \def\GGamma{\bf \Gamma} \def\DDelta{\bf \Delta} \def\ssigma{\bf \sigma} \def\UUpsilon{\bf \Upsilon} \def\PPsi{\bf \Psi} \def\PPhi{\bf \Phi} \def\LLambda{\bf \Lambda} \def\OOmega{\bf \Omega} [/math]

The ridge penalty may be interpreted as a multivariate normal prior on the regression coefficients: [math]\bbeta \sim \mathcal{N}(\mathbf{0}, \lambda^{-1} \mathbf{I}_{pp})[/math]. Different priors may be considered. In case the covariates are spatially related in some sense (e.g. genomically), it may of interest to assume a first-order autoregressive prior: [math]\bbeta \sim \mathcal{N}(\mathbf{0}, \lambda^{-1} \mathbf{\Sigma}_a)[/math], in which [math]\mathbf{\Sigma}_a[/math] is a [math](p \times p)[/math]-dimensional correlation matrix with [math](\mathbf{\Sigma}_a)_{j_1, j_2} = \rho^{ | j_1 - j_2 | } [/math] for some correlation coefficient [math]\rho \in [0, 1)[/math]. Hence,

[[math]] \begin{eqnarray*} \mathbf{\Sigma}_a \, \, \, = \, \, \, \left( \begin{array}{cccc} 1 & \rho & \ldots & \rho^{p-1} \\ \rho & 1 & \ldots & \rho^{p-2} \\ \vdots & \vdots & \ddots & \vdots \\ \rho^{p-1} & \rho^{p-2} & \ldots & 1 \end{array} \right). \end{eqnarray*} [[/math]]

  • The penalized loss function associated with this AR(1) prior is:
    [[math]] \begin{eqnarray*} \mathcal{L}(\bbeta; \lambda, \mathbf{\Sigma}_a) & = & \| \mathbf{Y} - \mathbf{X} \bbeta \|_2^2 + \lambda \bbeta^{\top} \mathbf{\Sigma}_a^{-1} \bbeta. \end{eqnarray*} [[/math]]
    Find the minimizer of this loss function.
  • What is the effect of [math]\rho[/math] on the ridge estimates? Contrast this to the effect of [math]\lambda[/math]. Illustrate this on (simulated) data.
  • Instead of an AR(1) prior assume a prior with a uniform correlation between the elements of [math]\bbeta[/math]. That is, replace [math]\mathbf{\Sigma}_a[/math] by [math]\mathbf{\Sigma}_u[/math], given by [math]\mathbf{\Sigma}_u = (1-\rho) \mathbf{I}_{pp} + \rho \mathbf{1}_{pp}[/math]. Investigate (again on data) the effect of changing from the AR(1) to the uniform prior on the ridge regression estimates.