Knowledge Discovery in Microarray
 Gene Expression Data

Data Mining Methodology is Critical!

Overview

Biology and Cells

DNA

Exons and Introns: Data and Logic?

Gene Expression

Molecular Biology Overview

Gene Expression Measurement

Gene Expression Microarrays

Affymetrix Microarrays

Microarray Potential Applications

Microarray Data Analysis Types

Microarray Data Mining Challenges

Data Preparation Issues (MAS-4)

Classification

FALSE POSITIVES PROBLEM

Controlling False Positives

Controlling False Positives with Randomization

Controlling false positives with randomization, II

Controlling False Positives:
SAM (Statistical Analysis of Microarrays)

Feature selection approach

Gene Reduction improves Classification

Wrapper approach to
select the best gene set

Popular Classification Methods

Microarrays: An Example

Results on the test data

Multi-class Data Analysis

Modeling with TreeNet

TreeNet results for multi-class data

Yeast SOM Clusters

Yeast SOM Clusters

Discovery of causal processes

A Model of Galactose Utilization (manually discovered)

Bayesian Causal Network Structure

Bayesian Network Learned for Yeast

Future directions for Microarray Analysis

Slide 38

GeneSpring Demo

Acknowledgements

Thank you!