# Joint Calibration and Fitting of Microarray Data

Friday, October 3, 2003 - 10:20am - 11:10am

Keller 3-180

Terry Therneau (Mayo Clinic)

Joint work with Karla Ballman and Ann Oberg.

In biological assays it is common to have a logistic shaped dose- response curve, where the horizontal axis is the true level of the material we are trying to measure, and the vertical axis is the value derived from the assay. In ELISA assays, it is common to put known controls in several of the wells to estimate the calibration curve for a given plate directly. The analysis issues have long been known as well; see for instance DJ Finney's tutorial paper on radioligand assay (Biometrics, 1976). The non-linearity is most severe when an assay spans a wide range; with values from 20 to 20,000 we would expect microarrays to be particularly affected. Plots of log(dose) vs log(response) from the Affymetrics and Gene Logic spike in data sets show precisely this shape, completely in agreement with Finney's observations.

ropriate normalization would be clear. We fit models that alternate between estimation of the true level for each probe, using a linear model incorporating the experimental design of the study, estimation of the per-chip calibration curves from a plot of true vs observed, normalization of the data based on the calibration curves, refit of the linear model, etc. When the linear model is particularly simple, containing only an intercept per probe, this turns out to be equivalent to the cyclic loess method of normalization (but computationally much faster).

The exciting aspect of this formulation is that it gives a framework in which other aspects of the array can be incorporated, e.g., joint use of the PM and MM probes, or biochemical data on predicted backround binding affinity.

In biological assays it is common to have a logistic shaped dose- response curve, where the horizontal axis is the true level of the material we are trying to measure, and the vertical axis is the value derived from the assay. In ELISA assays, it is common to put known controls in several of the wells to estimate the calibration curve for a given plate directly. The analysis issues have long been known as well; see for instance DJ Finney's tutorial paper on radioligand assay (Biometrics, 1976). The non-linearity is most severe when an assay spans a wide range; with values from 20 to 20,000 we would expect microarrays to be particularly affected. Plots of log(dose) vs log(response) from the Affymetrics and Gene Logic spike in data sets show precisely this shape, completely in agreement with Finney's observations.

ropriate normalization would be clear. We fit models that alternate between estimation of the true level for each probe, using a linear model incorporating the experimental design of the study, estimation of the per-chip calibration curves from a plot of true vs observed, normalization of the data based on the calibration curves, refit of the linear model, etc. When the linear model is particularly simple, containing only an intercept per probe, this turns out to be equivalent to the cyclic loess method of normalization (but computationally much faster).

The exciting aspect of this formulation is that it gives a framework in which other aspects of the array can be incorporated, e.g., joint use of the PM and MM probes, or biochemical data on predicted backround binding affinity.