Loss-based Estimation Methodology with Cross-validation: Prediction of Clinical Outcomes Using Microarray Data

Tuesday, September 30, 2003 - 9:30am - 10:20am
Keller 3-180
Sandrine Dudoit (University of California, Berkeley)
We propose a unified loss-based methodology for estimator construction, selection, and performance assessment with cross-validation. In this approach, the parameter of interest is defined as the risk minimizer for a suitable loss function and candidate estimators are generated using this (or possibly another) loss function. Cross-validation is applied to select an optimal estimator among the candidates and to assess the overall performance of the resulting estimator. Finite sample and asymptotic optimality results are derived for the cross-validation selector for general data generating distributions, loss functions (possibly depending on a nuisance parameter), and estimators. This general estimation framework encompasses a number of problems which have traditionally been treated separately in the statistical literature, including multivariate outcome prediction and density estimation based on censored data. Applications to genomic data analysis include the prediction of biological and clinical outcomes (possibly censored) using microarray gene expression measures, the identification of regulatory motifs in DNA sequences, and genetic mapping with single nucleotide polymorphisms (SNP). This talk will focus on tree-based estimation of patient survival with microarray expression measures.

Joint work with: Mark van der Laan, Sunduz Keles, and Annette Molinaro.