⧼exchistory⧽
8 exercise(s) shown, 0 hidden
ABy Admin
Jun 25'23

[math] \require{textmacros} \def \bbeta {\bf \beta} \def\fat#1{\mbox{\boldmath$#1$}} \def\reminder#1{\marginpar{\rule[0pt]{1mm}{11pt}}\textbf{#1}} \def\SSigma{\bf \Sigma} \def\ttheta{\bf \theta} \def\aalpha{\bf \alpha} \def\ddelta{\bf \delta} \def\eeta{\bf \eta} \def\llambda{\bf \lambda} \def\ggamma{\bf \gamma} \def\nnu{\bf \nu} \def\vvarepsilon{\bf \varepsilon} \def\mmu{\bf \mu} \def\nnu{\bf \nu} \def\ttau{\bf \tau} \def\SSigma{\bf \Sigma} \def\TTheta{\bf \Theta} \def\XXi{\bf \Xi} \def\PPi{\bf \Pi} \def\GGamma{\bf \Gamma} \def\DDelta{\bf \Delta} \def\ssigma{\bf \sigma} \def\UUpsilon{\bf \Upsilon} \def\PPsi{\bf \Psi} \def\PPhi{\bf \Phi} \def\LLambda{\bf \Lambda} \def\OOmega{\bf \Omega} [/math]

The variation in a binary response [math]Y_i \in \{ 0, 1 \}[/math] due to two covariates [math]\mathbf{X}_{i,\ast} = (X_{i,1}, X_{i,2})[/math] is described by the logistic regression model: [math]P(Y_i = 1) = \exp(\mathbf{X}_{i,\ast} \bbeta) [1 + \exp(\mathbf{X}_{i,\ast} \bbeta)][/math]. The study design and the observed response are given by:

[[math]] \begin{eqnarray*} \mathbf{X} & = & \left( \begin{array}{rr} 1 & -1 \\ -1 & 1 \end{array} \right) \quad \mbox{and} \quad \mathbf{Y} \, \, \, = \, \, \, \left( \begin{array}{rr} 1 \\ 0 \end{array} \right). \end{eqnarray*} [[/math]]

  • Write down the loglikelihood and show that [math]\hat{\bbeta} \in \{ (\beta_1, \beta_2)^{\top} \, : \, \beta_1 - \beta_2 = \infty \}[/math].
  • Augment the loglikelihood with the ridge penalty [math]\tfrac{1}{2} \lambda_1 \| \bbeta \|_2^2[/math] and show that [math]| \hat{\beta}_j (\lambda_2) | \lt \lambda_2[/math] for [math]j=1, 2[/math].
ABy Admin
Jun 25'23

[math] \require{textmacros} \def \bbeta {\bf \beta} \def\fat#1{\mbox{\boldmath$#1$}} \def\reminder#1{\marginpar{\rule[0pt]{1mm}{11pt}}\textbf{#1}} \def\SSigma{\bf \Sigma} \def\ttheta{\bf \theta} \def\aalpha{\bf \alpha} \def\ddelta{\bf \delta} \def\eeta{\bf \eta} \def\llambda{\bf \lambda} \def\ggamma{\bf \gamma} \def\nnu{\bf \nu} \def\vvarepsilon{\bf \varepsilon} \def\mmu{\bf \mu} \def\nnu{\bf \nu} \def\ttau{\bf \tau} \def\SSigma{\bf \Sigma} \def\TTheta{\bf \Theta} \def\XXi{\bf \Xi} \def\PPi{\bf \Pi} \def\GGamma{\bf \Gamma} \def\DDelta{\bf \Delta} \def\ssigma{\bf \sigma} \def\UUpsilon{\bf \Upsilon} \def\PPsi{\bf \Psi} \def\PPhi{\bf \Phi} \def\LLambda{\bf \Lambda} \def\OOmega{\bf \Omega} [/math]

Consider an experiment involving [math]n[/math] cancer samples. For each sample [math]i[/math] the transcriptome of its tumor has been profiled and is denoted [math]\mathbf{X}_{i} = (X_{i1}, \ldots, X_{ip})^{\top}[/math] where [math]X_{ij}[/math] represents the gene [math]j=1, \ldots, p[/math] in sample [math]i[/math]. Additionally, the overall survival data, [math](Y_i, c_i)[/math] for [math]i=1, \ldots,n[/math] of these samples is available. In this [math]Y_i[/math] denotes the survival time of sample [math]i[/math] and [math]c_i[/math] the event indicator with [math]c_i = 0[/math] and [math]c_i = 1[/math] representing non- and censoredness, respectively. You may ignore the possibility of ties in the remainder.

  • Write down the Cox proportional regression model that links overall survival times (as the response variable) to the expression levels.
  • Specify its loss function for penalized maximum partial (!) likelihood estimation of the parameters. Penalization is via the regular ridge penalty.
  • From this loss function, derive the estimating equation for the Cox regression coefficients.
  • Describe (in words) how you would evaluate the ridge ML estimator numerically.
ABy Admin
Jun 25'23

[math] \require{textmacros} \def \bbeta {\bf \beta} \def\fat#1{\mbox{\boldmath$#1$}} \def\reminder#1{\marginpar{\rule[0pt]{1mm}{11pt}}\textbf{#1}} \def\SSigma{\bf \Sigma} \def\ttheta{\bf \theta} \def\aalpha{\bf \alpha} \def\ddelta{\bf \delta} \def\eeta{\bf \eta} \def\llambda{\bf \lambda} \def\ggamma{\bf \gamma} \def\nnu{\bf \nu} \def\vvarepsilon{\bf \varepsilon} \def\mmu{\bf \mu} \def\nnu{\bf \nu} \def\ttau{\bf \tau} \def\SSigma{\bf \Sigma} \def\TTheta{\bf \Theta} \def\XXi{\bf \Xi} \def\PPi{\bf \Pi} \def\GGamma{\bf \Gamma} \def\DDelta{\bf \Delta} \def\ssigma{\bf \sigma} \def\UUpsilon{\bf \Upsilon} \def\PPsi{\bf \Psi} \def\PPhi{\bf \Phi} \def\LLambda{\bf \Lambda} \def\OOmega{\bf \Omega} [/math]

Load the leukemia data available via the multtest-package (downloadable from BioConductor) through the following R-code:

# activate library and load the data
library(multtest)
data(golub)

The objects golub and golub.cl are now available. The matrix-object golub contains the expression profiles of 38 leukemia patients. Each profile comprises expression levels of 3051 genes. The numeric-object golub.cl is an indicator variable for the leukemia type (AML or ALL) of the patient.

  • Relate the leukemia subtype and the gene expression levels by a logistic regression model. Fit this model by means of penalized maximum likelihood, employing the ridge penalty with penalty parameter [math]\lambda=1[/math]. This is implemented in the penalized-packages available from CRAN. Note: center (gene-wise) the expression levels around zero.
  • Obtain the fits from the regression model. The fit is almost perfect. Could this be due to overfitting the data? Alternatively, could it be that the biological information in the gene expression levels indeed determines the leukemia subtype almost perfectly?
  • To discern between the two explanations for the almost perfect fit, randomly shuffle the subtypes. Refit the logistic regression model and obtain the fits. On the basis of this and the previous fit, which explanation is more plausible?
  • Compare the fit of the logistic model with different penalty parameters, say [math]\lambda = 1[/math] and [math]\lambda = 1000[/math]. How does [math]\lambda[/math] influence the possibility of overfitting the data?
  • Describe what you would do to prevent overfitting.
ABy Admin
Jun 25'23

[math] \require{textmacros} \def \bbeta {\bf \beta} \def\fat#1{\mbox{\boldmath$#1$}} \def\reminder#1{\marginpar{\rule[0pt]{1mm}{11pt}}\textbf{#1}} \def\SSigma{\bf \Sigma} \def\ttheta{\bf \theta} \def\aalpha{\bf \alpha} \def\ddelta{\bf \delta} \def\eeta{\bf \eta} \def\llambda{\bf \lambda} \def\ggamma{\bf \gamma} \def\nnu{\bf \nu} \def\vvarepsilon{\bf \varepsilon} \def\mmu{\bf \mu} \def\nnu{\bf \nu} \def\ttau{\bf \tau} \def\SSigma{\bf \Sigma} \def\TTheta{\bf \Theta} \def\XXi{\bf \Xi} \def\PPi{\bf \Pi} \def\GGamma{\bf \Gamma} \def\DDelta{\bf \Delta} \def\ssigma{\bf \sigma} \def\UUpsilon{\bf \Upsilon} \def\PPsi{\bf \Psi} \def\PPhi{\bf \Phi} \def\LLambda{\bf \Lambda} \def\OOmega{\bf \Omega} [/math]

Load the breast cancer data available via the breastCancerNKI-package (downloadable from BioConductor) through the following R-code:

# activate library and load the data
library(breastCancerNKI)
data(nki)

# subset and center the data
X <- exprs(nki)
X <- X[-which(rowSums(is.na(X)) > 0),]
X <- apply(X[1:1000,], 1, function(X){ X - mean(X) })

# define ER as the response
Y <- pData(nki)[,8]

The eSet-object nki is now available. It contains the expression profiles of 337 breast cancer patients. Each profile comprises expression levels of 24481 genes. The R-code above extracts the expression data from the object, removes all genes with missing values, centers the gene expression gene-wise around zero, and subsets the data set to the first thousand genes. The reduction of the gene dimensionality is only for computational speed. Furthermore, it extracts the estrogen receptor status (short: ER status), an important prognostic indicator for breast cancer, that is to be used as the response variable in the remainder of the exercise.

  • Relate the ER status and the gene expression levels by a logistic regression model, which is fitted by means of ridge penalized maximum likelihood. First, find the optimal value of the penalty parameter of [math]\lambda[/math] by means of cross-validation. This is implemented in optL2-function of the penalized-package available from CRAN.
  • Evaluate whether the cross-validated likelihood indeed attains a maximum at the optimal value of [math]\lambda[/math]. This can be done with the profL2-function of the penalized-package available from CRAN.
  • Investigate the sensitivity of the penalty parameter selection with respect to the choice of the cross-validation fold.
  • Does the optimal lambda produce a reasonable fit?
ABy Admin
Jun 25'23

[math] \require{textmacros} \def \bbeta {\bf \beta} \def\fat#1{\mbox{\boldmath$#1$}} \def\reminder#1{\marginpar{\rule[0pt]{1mm}{11pt}}\textbf{#1}} \def\SSigma{\bf \Sigma} \def\ttheta{\bf \theta} \def\aalpha{\bf \alpha} \def\ddelta{\bf \delta} \def\eeta{\bf \eta} \def\llambda{\bf \lambda} \def\ggamma{\bf \gamma} \def\nnu{\bf \nu} \def\vvarepsilon{\bf \varepsilon} \def\mmu{\bf \mu} \def\nnu{\bf \nu} \def\ttau{\bf \tau} \def\SSigma{\bf \Sigma} \def\TTheta{\bf \Theta} \def\XXi{\bf \Xi} \def\PPi{\bf \Pi} \def\GGamma{\bf \Gamma} \def\DDelta{\bf \Delta} \def\ssigma{\bf \sigma} \def\UUpsilon{\bf \Upsilon} \def\PPsi{\bf \Psi} \def\PPhi{\bf \Phi} \def\LLambda{\bf \Lambda} \def\OOmega{\bf \Omega} [/math]

Derive the equivalent of Equations (equation) and (equation) for the targeted ridge logistic regression estimator:

[[math]] \begin{eqnarray*} \hat{\bbeta}(\lambda, \bbeta_0) & = & \arg \min_{\bbeta \in \mathbb{R}^p} \mathbf{Y}^{\top} \mathbf{X} \hat{\bbeta}(\lambda; \bbeta_0) - \sum\nolimits_{i=1}^n \log \{ 1 + \exp[ \mathbf{X}_{i,\ast} \hat{\bbeta}(\lambda; \bbeta_0) ] \} - \tfrac{1}{2} \lambda \| \hat{\bbeta}(\lambda; \bbeta_0) - \bbeta_0 \|_2^2, \end{eqnarray*} [[/math]]

with nonrandom [math]\bbeta_0 \in \mathbb{R}^p[/math].

ABy Admin
Jun 25'23

[math] \require{textmacros} \def \bbeta {\bf \beta} \def\fat#1{\mbox{\boldmath$#1$}} \def\reminder#1{\marginpar{\rule[0pt]{1mm}{11pt}}\textbf{#1}} \def\SSigma{\bf \Sigma} \def\ttheta{\bf \theta} \def\aalpha{\bf \alpha} \def\ddelta{\bf \delta} \def\eeta{\bf \eta} \def\llambda{\bf \lambda} \def\ggamma{\bf \gamma} \def\nnu{\bf \nu} \def\vvarepsilon{\bf \varepsilon} \def\mmu{\bf \mu} \def\nnu{\bf \nu} \def\ttau{\bf \tau} \def\SSigma{\bf \Sigma} \def\TTheta{\bf \Theta} \def\XXi{\bf \Xi} \def\PPi{\bf \Pi} \def\GGamma{\bf \Gamma} \def\DDelta{\bf \Delta} \def\ssigma{\bf \sigma} \def\UUpsilon{\bf \Upsilon} \def\PPsi{\bf \Psi} \def\PPhi{\bf \Phi} \def\LLambda{\bf \Lambda} \def\OOmega{\bf \Omega} [/math]

The iteratively reweighted least squares (IRLS) algorithm for the numerical evaluation of the ridge logistic regression estimator requires the inversion of a [math]p \times p[/math]-dimensional matrix at each iteration. In Section Computationally efficient evaluation the singular value decomposition (SVD) of the design matrix is exploited to avoid the inversion of such a matrix in the numerical evaluation of the ridge regression estimator. Use this trick to show that the computational burden of the IRLS algorithm may be reduced to one SVD prior to the iterations and the inversion of an [math]n \times n[/math] dimensional matrix at each iteration (as is done in [1]).

  1. Eilers, P. H. C., Boer, J. M., van Ommen, G.-J., and van Houwelingen, H. C. (2001).Classification of microarray data with penalized logistic regression.In Microarrays: Optical technologies and informatics, volume 4266, pages 187--198
ABy Admin
Jun 25'23

[math] \require{textmacros} \def \bbeta {\bf \beta} \def\fat#1{\mbox{\boldmath$#1$}} \def\reminder#1{\marginpar{\rule[0pt]{1mm}{11pt}}\textbf{#1}} \def\SSigma{\bf \Sigma} \def\ttheta{\bf \theta} \def\aalpha{\bf \alpha} \def\ddelta{\bf \delta} \def\eeta{\bf \eta} \def\llambda{\bf \lambda} \def\ggamma{\bf \gamma} \def\nnu{\bf \nu} \def\vvarepsilon{\bf \varepsilon} \def\mmu{\bf \mu} \def\nnu{\bf \nu} \def\ttau{\bf \tau} \def\SSigma{\bf \Sigma} \def\TTheta{\bf \Theta} \def\XXi{\bf \Xi} \def\PPi{\bf \Pi} \def\GGamma{\bf \Gamma} \def\DDelta{\bf \Delta} \def\ssigma{\bf \sigma} \def\UUpsilon{\bf \Upsilon} \def\PPsi{\bf \Psi} \def\PPhi{\bf \Phi} \def\LLambda{\bf \Lambda} \def\OOmega{\bf \Omega} [/math]

The ridge estimator of parameter [math]\bbeta[/math] of the logistic regression model is the maximizer of the ridge penalized loglikelihood:

[[math]] \begin{eqnarray*} \mathcal{L}^{\mbox{{\tiny pen}}}(\mathbf{Y}, \mathbf{X}; \bbeta, \lambda) & = & \sum\nolimits_{i=1}^n \big\{ Y_i \mathbf{X}_i \bbeta - \log [ 1 + \exp(\mathbf{X}_i \bbeta) ] \big\} - \tfrac{1}{2} \lambda \| \bbeta \|_2^2. \end{eqnarray*} [[/math]]

The maximizer is found numerically by the iteratively reweighted least squares (IRLS) algorithm which is outlined in Section Ridge estimation . Modify the algorithm, as is done in [1], to find the generalized ridge logistic regression estimator of [math]\bbeta[/math] defined as:

[[math]] \begin{eqnarray*} \mathcal{L}^{\mbox{{\tiny pen}}}(\mathbf{Y}, \mathbf{X}; \bbeta, \lambda) & = & \sum\nolimits_{i=1}^n \big\{ Y_i \mathbf{X}_i \bbeta - \log [ 1 + \exp(\mathbf{X}_i \bbeta) ] \big\} - \tfrac{1}{2} (\bbeta - \bbeta_0)^{\top} \mathbf{\Delta} (\bbeta - \bbeta_0), \end{eqnarray*} [[/math]]

where [math]\bbeta_0[/math] and [math]\mathbf{\Delta}[/math] are as in Chapter Generalizing ridge regression .

  1. van Wieringen, W. N. and Binder, H. (2020).Transfer learning of regression models from a sequence of datasets by penalized estimation.submitted
ABy Admin
Jun 25'23

[math] \require{textmacros} \def \bbeta {\bf \beta} \def\fat#1{\mbox{\boldmath$#1$}} \def\reminder#1{\marginpar{\rule[0pt]{1mm}{11pt}}\textbf{#1}} \def\SSigma{\bf \Sigma} \def\ttheta{\bf \theta} \def\aalpha{\bf \alpha} \def\ddelta{\bf \delta} \def\eeta{\bf \eta} \def\llambda{\bf \lambda} \def\ggamma{\bf \gamma} \def\nnu{\bf \nu} \def\vvarepsilon{\bf \varepsilon} \def\mmu{\bf \mu} \def\nnu{\bf \nu} \def\ttau{\bf \tau} \def\SSigma{\bf \Sigma} \def\TTheta{\bf \Theta} \def\XXi{\bf \Xi} \def\PPi{\bf \Pi} \def\GGamma{\bf \Gamma} \def\DDelta{\bf \Delta} \def\ssigma{\bf \sigma} \def\UUpsilon{\bf \Upsilon} \def\PPsi{\bf \Psi} \def\PPhi{\bf \Phi} \def\LLambda{\bf \Lambda} \def\OOmega{\bf \Omega} [/math]

Consider the logistic regression model to explain the variation of a binary random variable by a covariate. Let [math]Y_i \in \{ 0, 1\}[/math], [math]i=1, \ldots, n[/math], represent [math]n[/math] binary random variables that follow a Bernoulli distribution with parameter [math]P(Y_i = 1 \, | \, X_i) = \exp(X_i \beta) [1+ \exp(X_i \beta)][/math] and corresponding covariate [math]X_i \in \{ -1, 1 \}[/math]. Data [math]\{ (y_i, X_i) \}_{i=1}^n[/math] are summarized as a contingency table (Table):

[math]y_i=\texttt{0}[/math] [math]y_i=\texttt{1}[/math]
[math]X_i=\texttt{-1}[/math] 301 196
[math]X_i= \texttt{\textcolor{white}{-}1}[/math] 206 297


The data of Table is only to be used in parts [math]''c)''[/math] and [math]''g)''[/math].

  • Show that the estimating equation of the ridge logistic regression estimator of [math]\bbeta[/math] is of the form:
    [[math]] \begin{eqnarray*} c - n \exp (\beta) [ 1 + \exp ( \beta)]^{-1} - \lambda \beta & = & 0, \end{eqnarray*} [[/math]]
    and specifiy [math]c[/math].
  • Show that [math]\hat{\beta} (\lambda) \in (\lambda^{-1} (c-n), \lambda^{-1} c)[/math] for all [math]\lambda \gt 0[/math]. Ensure that this is a meaningful interval, i.e. that it is non-empty, and verify that [math]c \gt 0[/math]. Moreover, conclude from the interval that [math]\lim_{\lambda \rightarrow \infty} \hat{\beta} (\lambda) = 0[/math].
  • The Taylor expansion of [math]\exp (\beta) [ 1 + \exp ( \beta)]^{-1}[/math] around [math]\beta=0[/math] is:
    [[math]] \begin{eqnarray*} \exp (\beta) [ 1 + \exp ( \beta)]^{-1} & = & \tfrac{1}{2} + \tfrac{1}{4} x - \tfrac{1}{48} x^3 + \tfrac{1}{480} x^5 + \mathcal{O} (x^6). \end{eqnarray*} [[/math]]
    Substitute the [math]3^{\mbox{{\tiny rd}}}[/math] order Taylor approximation into the estimating equations and use the data of Table to evaluate its roots using the polyroot-function (of the base-package). Do the same for the [math]1^{\mbox{{\tiny st}}}[/math] and [math]2^{\mbox{{\tiny nd}}}[/math] order Taylor approximations of [math]\exp (\beta) [ 1 + \exp ( \beta)]^{-1}[/math]. Compare these approximate estimates to the one provided by the ridgeGLM-function (of the porridge-package, [1]).

In the remainder consider the [math]1^{\mbox{{\tiny st}}}[/math] order Taylor approximated ridge logistic regression estimator: [math]\hat{\beta}_{1^{\mbox{{\tiny st}}}} ( \lambda ) = (\lambda + \tfrac{1}{4} n)^{-1} (c - \tfrac{1}{2} n)[/math].

  • Find an expression for [math]\mathbb{E} [ \hat{\beta}_{1^{\mbox{{\tiny st}}}} (\lambda) ][/math].
  • Find an expression for [math]\mbox{Var} [ \hat{\beta}_{1^{\mbox{{\tiny st}}}} (\lambda) ][/math].
  • Combine the answers of parts d) and e) to find an expression for [math]\mbox{MSE} [ \hat{\beta}_{1^{\mbox{{\tiny st}}}} (\lambda) ][/math].
  • Plot this MSE against [math]\lambda[/math] for the data provided in Table and reflect on analogue of Theorem for the ridge logistic regression estimator.
  1. van Wieringen, W. and Aflakparast, M. (2021).porridge: Ridge-Type Estimation of a Potpourri of Models.R package version 0.2.1