Exercise
[math] \require{textmacros} \def \bbeta {\bf \beta} \def\fat#1{\mbox{\boldmath$#1$}} \def\reminder#1{\marginpar{\rule[0pt]{1mm}{11pt}}\textbf{#1}} \def\SSigma{\bf \Sigma} \def\ttheta{\bf \theta} \def\aalpha{\bf \alpha} \def\ddelta{\bf \delta} \def\eeta{\bf \eta} \def\llambda{\bf \lambda} \def\ggamma{\bf \gamma} \def\nnu{\bf \nu} \def\vvarepsilon{\bf \varepsilon} \def\mmu{\bf \mu} \def\nnu{\bf \nu} \def\ttau{\bf \tau} \def\SSigma{\bf \Sigma} \def\TTheta{\bf \Theta} \def\XXi{\bf \Xi} \def\PPi{\bf \Pi} \def\GGamma{\bf \Gamma} \def\DDelta{\bf \Delta} \def\ssigma{\bf \sigma} \def\UUpsilon{\bf \Upsilon} \def\PPsi{\bf \Psi} \def\PPhi{\bf \Phi} \def\LLambda{\bf \Lambda} \def\OOmega{\bf \Omega} [/math]
The ridge penalty may be interpreted as a multivariate normal prior on the regression coefficients: [math]\bbeta \sim \mathcal{N}(\mathbf{0}, \lambda^{-1} \mathbf{I}_{pp})[/math]. Different priors may be considered. In case the covariates are spatially related in some sense (e.g. genomically), it may of interest to assume a first-order autoregressive prior: [math]\bbeta \sim \mathcal{N}(\mathbf{0}, \lambda^{-1} \mathbf{\Sigma}_a)[/math], in which [math]\mathbf{\Sigma}_a[/math] is a [math](p \times p)[/math]-dimensional correlation matrix with [math](\mathbf{\Sigma}_a)_{j_1, j_2} = \rho^{ | j_1 - j_2 | } [/math] for some correlation coefficient [math]\rho \in [0, 1)[/math]. Hence,
- The penalized loss function associated with this AR(1) prior is:
[[math]] \begin{eqnarray*} \mathcal{L}(\bbeta; \lambda, \mathbf{\Sigma}_a) & = & \| \mathbf{Y} - \mathbf{X} \bbeta \|_2^2 + \lambda \bbeta^{\top} \mathbf{\Sigma}_a^{-1} \bbeta. \end{eqnarray*} [[/math]]Find the minimizer of this loss function.
- What is the effect of [math]\rho[/math] on the ridge estimates? Contrast this to the effect of [math]\lambda[/math]. Illustrate this on (simulated) data.
- Instead of an AR(1) prior assume a prior with a uniform correlation between the elements of [math]\bbeta[/math]. That is, replace [math]\mathbf{\Sigma}_a[/math] by [math]\mathbf{\Sigma}_u[/math], given by [math]\mathbf{\Sigma}_u = (1-\rho) \mathbf{I}_{pp} + \rho \mathbf{1}_{pp}[/math]. Investigate (again on data) the effect of changing from the AR(1) to the uniform prior on the ridge regression estimates.