ABy Admin
Jun 25'23

Exercise

[math] \require{textmacros} \def \bbeta {\bf \beta} \def\fat#1{\mbox{\boldmath$#1$}} \def\reminder#1{\marginpar{\rule[0pt]{1mm}{11pt}}\textbf{#1}} \def\SSigma{\bf \Sigma} \def\ttheta{\bf \theta} \def\aalpha{\bf \alpha} \def\ddelta{\bf \delta} \def\eeta{\bf \eta} \def\llambda{\bf \lambda} \def\ggamma{\bf \gamma} \def\nnu{\bf \nu} \def\vvarepsilon{\bf \varepsilon} \def\mmu{\bf \mu} \def\nnu{\bf \nu} \def\ttau{\bf \tau} \def\SSigma{\bf \Sigma} \def\TTheta{\bf \Theta} \def\XXi{\bf \Xi} \def\PPi{\bf \Pi} \def\GGamma{\bf \Gamma} \def\DDelta{\bf \Delta} \def\ssigma{\bf \sigma} \def\UUpsilon{\bf \Upsilon} \def\PPsi{\bf \Psi} \def\PPhi{\bf \Phi} \def\LLambda{\bf \Lambda} \def\OOmega{\bf \Omega} [/math]

Consider the Bayesian linear regression model [math]\mathbf{Y} = \mathbf{X} \bbeta + \vvarepsilon[/math] with [math]\vvarepsilon \sim \mathcal{N}(\mathbf{0}_n, \sigma^2 \mathbf{I}_{nn})[/math], a multivariate normal law as conditional prior distribution on the regression parameter: [math]\bbeta \, | \, \sigma^2 \sim \mathcal{N}(\bbeta_0, \sigma^2 \mathbf{\Delta}^{-1})[/math], and an inverse gamma prior on the error variance [math]\sigma^2 \sim \mathcal{IG}(\gamma, \delta)[/math]. The consequences of various choices for the hyper parameters of the prior distribution on [math]\bbeta[/math] are studied.

  • Consider the following conditional prior distributions on the regression parameters [math]\bbeta \, | \, \sigma^2 \sim \mathcal{N}(\bbeta_0, \sigma^2 \mathbf{\Delta}_a^{-1})[/math] and [math]\bbeta \, | \, \sigma^2 \sim \mathcal{N}(\bbeta_0, \sigma^2 \mathbf{\Delta}_b^{-1})[/math] with precision matrices [math]\mathbf{\Delta}_a, \mathbf{\Delta}_b \in \mathcal{S}_{++}^p[/math] such that [math]\mathbf{\Delta}_a \succeq \mathbf{\Delta}_b[/math], i.e. [math]\mathbf{\Delta}_a = \mathbf{\Delta}_b + \mathbf{D}[/math] for some positive semi-definite symmetric matrix of appropriate dimensions. Verify:
    [[math]] \begin{eqnarray*} \mbox{Var}(\bbeta \, | \, \sigma^2, \mathbf{Y}, \mathbf{X}, \bbeta_0, \mathbf{\Delta}_a) & \preceq & \mbox{Var}(\bbeta \, | \, \sigma^2, \mathbf{Y}, \mathbf{X}, \bbeta_0, \mathbf{\Delta}_b), \end{eqnarray*} [[/math]]
    i.e. the smaller (in the positive definite ordering) the variance of the prior the smaller that of the posterior.
  • In the remainder of this exercise assume [math]\mathbf{\Delta}_a =\mathbf{\Delta} = \mathbf{\Delta}_b[/math]. Let [math]\bbeta_t[/math] be the ‘true’ or ‘ideal’ value of the regression parameter, that has been used in the generation of the data, and show that a better initial guess yields a better posterior probability at [math]\bbeta_t[/math]. That is, take two prior mean parameters [math]\bbeta_0 = \bbeta_0^{\mbox{{\tiny (a)}}}[/math] and [math]\bbeta_0 = \bbeta_0^{\mbox{{\tiny (b)}}}[/math] such that the former is closer to [math]\bbeta_t[/math] than the latter. Here close is defined in terms of the Mahalabonis distance, which for, e.g. [math]\bbeta_t[/math] and [math]\bbeta_0^{\mbox{{\tiny (a)}}}[/math] is defined as [math]d_M(\bbeta_t, \bbeta_0^{\mbox{{\tiny (a)}}}; \mathbf{\Sigma}) = [(\bbeta_t - \bbeta_0^{\mbox{{\tiny (a)}}})^{\top} \mathbf{\Sigma}^{-1} (\bbeta_t - \bbeta_0^{\mbox{{\tiny (a)}}})]^{1/2}[/math] with positive definite covariance matrix [math]\mathbf{\Sigma}[/math] with [math]\mathbf{\Sigma} = \sigma^2 \mathbf{\Delta}^{-1}[/math]. Show that the posterior density [math]\pi_{\bbeta \, | \, \sigma^2} (\bbeta \, | \, \sigma^2, \mathbf{X}, \mathbf{Y}, \bbeta_0^{\mbox{{\tiny (a)}}}, \mathbf{\Delta})[/math] is larger at [math]\bbeta =\bbeta_t[/math] than with the other prior mean paramater.
  • Adopt the assumptions of part b) and show that a better initial guess yields a better posterior mean. That is, show
    [[math]] \begin{eqnarray*} d_M[\bbeta_t, \mathbb{E}(\bbeta \, | \, \sigma^2, \mathbf{Y}, \mathbf{X}, \bbeta_0^{\mbox{{\tiny (a)}}}, \mathbf{\Delta}); \mathbf{\Sigma}] & \leq & d_M[\bbeta_t, \mathbb{E}(\bbeta \, | \, \sigma^2, \mathbf{Y}, \mathbf{X}, \bbeta_0^{\mbox{{\tiny (b)}}}, \mathbf{\Delta}); \mathbf{\Sigma}], \end{eqnarray*} [[/math]]
    now with [math]\mathbf{\Sigma} = \sigma^2 (\mathbf{X}^{\top} \mathbf{X} + \mathbf{\Delta})^{-1}[/math].