⧼exchistory⧽
3 exercise(s) shown, 0 hidden
ABy Admin
Jun 25'23

[math] \require{textmacros} \def \bbeta {\bf \beta} \def\fat#1{\mbox{\boldmath$#1$}} \def\reminder#1{\marginpar{\rule[0pt]{1mm}{11pt}}\textbf{#1}} \def\SSigma{\bf \Sigma} \def\ttheta{\bf \theta} \def\aalpha{\bf \alpha} \def\ddelta{\bf \delta} \def\eeta{\bf \eta} \def\llambda{\bf \lambda} \def\ggamma{\bf \gamma} \def\nnu{\bf \nu} \def\vvarepsilon{\bf \varepsilon} \def\mmu{\bf \mu} \def\nnu{\bf \nu} \def\ttau{\bf \tau} \def\SSigma{\bf \Sigma} \def\TTheta{\bf \Theta} \def\XXi{\bf \Xi} \def\PPi{\bf \Pi} \def\GGamma{\bf \Gamma} \def\DDelta{\bf \Delta} \def\ssigma{\bf \sigma} \def\UUpsilon{\bf \Upsilon} \def\PPsi{\bf \Psi} \def\PPhi{\bf \Phi} \def\LLambda{\bf \Lambda} \def\OOmega{\bf \Omega} [/math]

Consider the linear regression model [math]Y_i = X_i \beta + \varepsilon_i[/math] with the [math]\varepsilon_i[/math] i.i.d. following a standard normal law [math]\mathcal{N}(0, 1)[/math]. Data on the response and covariate are available: [math]\{(y_i, x_i)\}_{i=1}^8 = \{ (-5, -2), (0, -1), \\ (-4, -1), (-2, -1), (0, 0), (3,1), (5,2), (3,2) \}[/math].

  • Assume a zero-centered normal prior on [math]\beta[/math]. What variance, i.e. which [math]\sigma_{\beta}^2 \in \mathbb{R}_{\gt0}[/math], of this prior yields a mean posterior [math]\mathbb{E}(\beta \, | \, \{(y_i, x_i)\}_{i=1}^8, \sigma_{\beta}^2)[/math] equal to [math]1.4[/math]?
  • Assume a non-zero centered normal prior. What (mean, variance)-combinations for the prior will yield a mean posterior estimate [math]\hat{\beta} = 2[/math]?
ABy Admin
Jun 25'23

[math] \require{textmacros} \def \bbeta {\bf \beta} \def\fat#1{\mbox{\boldmath$#1$}} \def\reminder#1{\marginpar{\rule[0pt]{1mm}{11pt}}\textbf{#1}} \def\SSigma{\bf \Sigma} \def\ttheta{\bf \theta} \def\aalpha{\bf \alpha} \def\ddelta{\bf \delta} \def\eeta{\bf \eta} \def\llambda{\bf \lambda} \def\ggamma{\bf \gamma} \def\nnu{\bf \nu} \def\vvarepsilon{\bf \varepsilon} \def\mmu{\bf \mu} \def\nnu{\bf \nu} \def\ttau{\bf \tau} \def\SSigma{\bf \Sigma} \def\TTheta{\bf \Theta} \def\XXi{\bf \Xi} \def\PPi{\bf \Pi} \def\GGamma{\bf \Gamma} \def\DDelta{\bf \Delta} \def\ssigma{\bf \sigma} \def\UUpsilon{\bf \Upsilon} \def\PPsi{\bf \Psi} \def\PPhi{\bf \Phi} \def\LLambda{\bf \Lambda} \def\OOmega{\bf \Omega} [/math]

Consider the Bayesian linear regression model [math]\mathbf{Y} = \mathbf{X} \bbeta + \vvarepsilon[/math] with [math]\vvarepsilon \sim \mathcal{N} ( \mathbf{0}_n, \sigma^2 \mathbf{I}_{nn})[/math] and priors [math]\bbeta \, | \, \sigma^2 \sim \mathcal{N} ( \mathbf{0}_p, \sigma_{\beta}^{2} \mathbf{I}_{pp})[/math] and [math]\sigma^2 \sim \mathcal{IG}(a_0, b_0)[/math] where [math] \sigma_{\beta}^{2} = c \sigma^{2}[/math] for some [math]c \gt 0[/math] and [math]a_0[/math] and [math]b_0[/math] are the shape and scale parameters, respectively, of the inverse Gamma distribution. This model is fitted to data from a study where the response is explained by a single covariate, and henceforth [math]\bbeta[/math] is replaced by [math]\beta[/math], with the following relevant summary statistics: [math]\mathbf{X}^{\top} \mathbf{X} = 2[/math] and [math]\mathbf{X}^{\top} \mathbf{Y} = 5[/math].

  • Suppose [math]\mathbb{E}( \beta \, | \, \sigma^2=1, c, \mathbf{X}, \mathbf{Y}) = 2[/math]. What amount of regularization should be used such that the ridge regression estimate [math]\hat{\beta}(\lambda_2)[/math] coincides with the aforementioned posterior (conditional) mean?
  • Give the (posterior) distribution of [math]\beta \, | \, \{ \sigma^2=2, c=2, \mathbf{X}, \mathbf{Y} \}[/math].
  • Discuss how a different prior of [math]\sigma^2[/math] affects the correspondence between [math]\mathbb{E} (\beta \, | \, \sigma^2, c, \mathbf{X}, \mathbf{Y})[/math] and the ridge regression estimator.
ABy Admin
Jun 25'23

[math] \require{textmacros} \def \bbeta {\bf \beta} \def\fat#1{\mbox{\boldmath$#1$}} \def\reminder#1{\marginpar{\rule[0pt]{1mm}{11pt}}\textbf{#1}} \def\SSigma{\bf \Sigma} \def\ttheta{\bf \theta} \def\aalpha{\bf \alpha} \def\ddelta{\bf \delta} \def\eeta{\bf \eta} \def\llambda{\bf \lambda} \def\ggamma{\bf \gamma} \def\nnu{\bf \nu} \def\vvarepsilon{\bf \varepsilon} \def\mmu{\bf \mu} \def\nnu{\bf \nu} \def\ttau{\bf \tau} \def\SSigma{\bf \Sigma} \def\TTheta{\bf \Theta} \def\XXi{\bf \Xi} \def\PPi{\bf \Pi} \def\GGamma{\bf \Gamma} \def\DDelta{\bf \Delta} \def\ssigma{\bf \sigma} \def\UUpsilon{\bf \Upsilon} \def\PPsi{\bf \Psi} \def\PPhi{\bf \Phi} \def\LLambda{\bf \Lambda} \def\OOmega{\bf \Omega} [/math]

Revisit question. From a Bayesian perspective, is the suggestion of a negative ridge penalty parameter sensible?