Revision as of 22:01, 24 June 2023 by Admin
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
ABy Admin
Jun 24'23

Exercise

PSA (Prostate-Specific Antigen) is a prognostic indicator of prostate cancer. Low and high PSA values indicate low and high risk, respectively. PSA interacts with the VEGF pathway. In cancer the VEGF pathway aids in the process of angiogenesis, i.e. the formation of blood vessels in solid tumors. Assume the aforementioned interaction can -- at least partially -- be captured by a linear relationship between PSA and the constituents of the VEGF pathway. Use the prostate cancer data of [1] to estimate this linear relationship using the ridge regression estimator. The following R-script downloads and prepares the data.

# load the necessary libraries
library(Biobase)
library(prostateCancerStockholm)
library(penalized)
library(KEGG.db)

# load data
data(stockholm)

# prepare psa data
psa <- pData(stockholm)[,9]
psa <- log(as.numeric(levels(psa)[psa]))

# prepare VEGF pathway data
X <- exprs(stockholm)
kegg2vegf    <- as.list(KEGGPATHID2EXTID)
entrezIDvegf <- as.numeric(unlist(kegg2vegf[names(kegg2vegf) == "hsa04370"]))
entrezIDx    <- as.numeric(fData(stockholm)[,10])
idX2VEGF     <- match(entrezIDvegf, entrezIDx)
entrezIDvegf <- entrezIDvegf[-which(is.na(idX2VEGF))]
idX2VEGF     <- match(entrezIDvegf, entrezIDx)
X            <- t(X[idX2VEGF,])
gSymbols     <- fData(stockholm)[idX2VEGF,13]
gSymbols     <- levels(gSymbols)[gSymbols]
colnames(X)  <- gSymbols

# remove samples with missing outcome
X   <- X[-which(is.na(psa)), ]
X   <- sweep(X, 2, apply(X, 2, mean))
X   <- sweep(X, 2, apply(X, 2, sd), "/")
psa <- psa[-which(is.na(psa))]
psa <- psa - mean(psa)
  • Find the ridge penalty parameter by means of AIC minization. Hint: the likelihood can be obtained from the penFit-object that is created by the penalized-function of the R-package penalized.
  • Find the ridge penalty parameter by means of leave-one-out cross-validation, as implemented by the optL2-function provided by the R-package penalized.
  • Find the ridge penalty parameter by means of leave-one-out cross-validation using Allen's PRESS statistic as performance measure (see Section Cross-validation ).
  • Discuss the reasons for the different values of the ridge penalty parameter obtained in parts a), b), and c). Also investigate the consequences of these values on the corresponding regression estimates.
  1. Ross-Adams, H., Lamb, A., Dunning, M., Halim, S., Lindberg, J., Massie, C., Egevad, L., Russell, R., Ramos-Montoya, A., Vowler, S., et al. (2015).Integration of copy number and transcriptomics provides risk stratification in prostate cancer: a discovery and validation cohort study.EBioMedicine, 2(9), 1133--1144