exercise:3c9a764dd4

Jun 25'23

Exercise

Consider the Bayesian linear regression model [math]\mathbf{Y} = \mathbf{X} \bbeta + \vvarepsilon[/math] with [math]\vvarepsilon \sim \mathcal{N}(\mathbf{0}_n, \sigma^2 \mathbf{I}_{nn})[/math], a multivariate normal law as conditional prior distribution on the regression parameter: [math]\bbeta \, | \, \sigma^2 \sim \mathcal{N}(\bbeta_0, \sigma^2 \mathbf{\Delta}^{-1})[/math], and an inverse gamma prior on the error variance [math]\sigma^2 \sim \mathcal{IG}(\gamma, \delta)[/math]. The consequences of various choices for the hyper parameters of the prior distribution on [math]\bbeta[/math] are studied.

Consider the following conditional prior distributions on the regression parameters [math]\bbeta \, | \, \sigma^2 \sim \mathcal{N}(\bbeta_0, \sigma^2 \mathbf{\Delta}_a^{-1})[/math] and [math]\bbeta \, | \, \sigma^2 \sim \mathcal{N}(\bbeta_0, \sigma^2 \mathbf{\Delta}_b^{-1})[/math] with precision matrices [math]\mathbf{\Delta}_a, \mathbf{\Delta}_b \in \mathcal{S}_{++}^p[/math] such that [math]\mathbf{\Delta}_a \succeq \mathbf{\Delta}_b[/math], i.e. [math]\mathbf{\Delta}_a = \mathbf{\Delta}_b + \mathbf{D}[/math] for some positive semi-definite symmetric matrix of appropriate dimensions. Verify:
[[math]] \begin{eqnarray*} \mbox{Var}(\bbeta \, | \, \sigma^2, \mathbf{Y}, \mathbf{X}, \bbeta_0, \mathbf{\Delta}_a) & \preceq & \mbox{Var}(\bbeta \, | \, \sigma^2, \mathbf{Y}, \mathbf{X}, \bbeta_0, \mathbf{\Delta}_b), \end{eqnarray*} [[/math]]
i.e. the smaller (in the positive definite ordering) the variance of the prior the smaller that of the posterior.
In the remainder of this exercise assume [math]\mathbf{\Delta}_a =\mathbf{\Delta} = \mathbf{\Delta}_b[/math]. Let [math]\bbeta_t[/math] be the ‘true’ or ‘ideal’ value of the regression parameter, that has been used in the generation of the data, and show that a better initial guess yields a better posterior probability at [math]\bbeta_t[/math]. That is, take two prior mean parameters [math]\bbeta_0 = \bbeta_0^{\mbox{{\tiny (a)}}}[/math] and [math]\bbeta_0 = \bbeta_0^{\mbox{{\tiny (b)}}}[/math] such that the former is closer to [math]\bbeta_t[/math] than the latter. Here close is defined in terms of the Mahalabonis distance, which for, e.g. [math]\bbeta_t[/math] and [math]\bbeta_0^{\mbox{{\tiny (a)}}}[/math] is defined as [math]d_M(\bbeta_t, \bbeta_0^{\mbox{{\tiny (a)}}}; \mathbf{\Sigma}) = [(\bbeta_t - \bbeta_0^{\mbox{{\tiny (a)}}})^{\top} \mathbf{\Sigma}^{-1} (\bbeta_t - \bbeta_0^{\mbox{{\tiny (a)}}})]^{1/2}[/math] with positive definite covariance matrix [math]\mathbf{\Sigma}[/math] with [math]\mathbf{\Sigma} = \sigma^2 \mathbf{\Delta}^{-1}[/math]. Show that the posterior density [math]\pi_{\bbeta \, | \, \sigma^2} (\bbeta \, | \, \sigma^2, \mathbf{X}, \mathbf{Y}, \bbeta_0^{\mbox{{\tiny (a)}}}, \mathbf{\Delta})[/math] is larger at [math]\bbeta =\bbeta_t[/math] than with the other prior mean paramater.
Adopt the assumptions of part b) and show that a better initial guess yields a better posterior mean. That is, show
[[math]] \begin{eqnarray*} d_M[\bbeta_t, \mathbb{E}(\bbeta \, | \, \sigma^2, \mathbf{Y}, \mathbf{X}, \bbeta_0^{\mbox{{\tiny (a)}}}, \mathbf{\Delta}); \mathbf{\Sigma}] & \leq & d_M[\bbeta_t, \mathbb{E}(\bbeta \, | \, \sigma^2, \mathbf{Y}, \mathbf{X}, \bbeta_0^{\mbox{{\tiny (b)}}}, \mathbf{\Delta}); \mathbf{\Sigma}], \end{eqnarray*} [[/math]]
now with [math]\mathbf{\Sigma} = \sigma^2 (\mathbf{X}^{\top} \mathbf{X} + \mathbf{\Delta})^{-1}[/math].

Add answer Add answer