Classification of Microarray Gene-Expression Data

Thursday, October 2, 2003 - 9:30am - 10:20am
Keller 3-180
Geoff McLachlan (University of Queensland)
In the context of cancer diagnosis and treatment, we consider the problem of classifying a relatively small number of tumour tissue samples containing the expression data on very many (possibly thousands) of genes from microarray experiments. For the supervised problem where there are tumour samples of known classification, we discuss the need to correct for the selection bias in assessing the error rate of a prediction rule formed from a small subset of selected genes. We also consider the unsupervised problem where the aim is to cluster the tumour samples on the basis of the gene expressions. The associated problem of assessing the number of clusters is addressed. Attention is concentrated on the mixture model-based approach called EMMIX-GENE. Its performance is demonstrated on various microarray data sets available in the bioinformatics literature.