Can Replication Save Noisy Microarray Data?

Microarray experiments are multi-step processes. At each step—the growth of cultures, extraction of mRNA, reverse transcription, labelling, hybridization, scanning, and image analysis—variation and error cannot be completely avoided. Estimating the amount of such noise and variation is essential, not only to test for differential expression but also to suggest at which level replication is most effective. Replication and averaging are the key to the estimation as well as the reduction of variability. Here I discuss the use of ANOVA mixed models and of analysis of variance components as a rigorous way to calculate the number of replicates necessary to detect a given target fold-change in expression levels. Procedures are available in the package YASMA (http://www.cryst.bbk.ac.uk/wernisch/yasma.html) for the statistical data analysis system R (http://www.R-project.org).


The power of averaging
In the following, I assume that the results of a replicated microarray experiment are log 2 ratios of fluorescent intensities in the two channels of a two-dye experiment comparing an experimental condition with a control condition. Any necessary background correction and normalization has been applied and each gene is associated with a series of log 2 ratio values from replicated experiments. The statistical assumption is that log 2 ratio values are normally distributed around their true mean. Genes with no differential expression have a true mean of 0. Any deviation of a mean log 2 ratio from this base line indicates over-or underexpression.
Averaging is the key to the estimation of the true mean. Figure 1 demonstrates the fact -wellknown from elementary statistics -that the average of a growing number of replicates approaches the true mean with decreasing variability. It shows 50 simulations from a normally distributed random variable with mean µ = 0.5 and variance σ 2 = 1. The sample means of increasingly larger samples are indicated by open circles connected by a line. Notice that even though the variability is large (if these were log 2 ratios of a real gene its fold-change would range from eight-fold under-to eight-fold overexpression), the true mean can be reliably estimated, provided enough replicates are considered. The reason for this improved accuracy with replication is that the variance σ 2 m of the sample mean is inversely proportional to the number n of replicates, σ 2 m = σ 2 /n.

How many arrays do I need?
The exact number of replicates necessary to detect overexpression of a gene reliably depends on several parameters. First, the underlying variance σ 2 must be obtained (in the next section I discuss variance estimates from microarray data). Next, a significance level α needs to be specified for the probability of a type I error, i.e. the probability that a neutral gene with zero mean is falsely called as overexpressed. A gene with sample mean m will be called, Finally, a minimum log 2 ratio value t (target change) for overexpression, as well as a level β for the type II error of not calling an overexpressed gene, must be provided. The probability p that a gene with true mean 0 has a sample mean of at least t 0 is: where σ m is the variance of the sample mean and is the standard normal distribution. Similarly, the probability q that a gene with true mean t has a sample mean smaller than t 0 is: From the requirement that p ≤ α and q ≤ β, after a few transformations we obtain an upper limit for the acceptable variance of the sample mean: With σ given, a minimum number of replications n is now easily derived from σ m = σ/ √ n. (If there are only a few degrees of freedom for the estimation of σ , then needs to be replaced by the t distribution. If n is part of the calculation of the degrees of freedom, the above equation needs to be solved in an iterated fashion.) In the above example, σ = 1. If the target t is a log 2 ratio value of 0.5 (1.41 fold-change), which should be detectable at a significance level of 0.05, then n ≥ 10.82, i.e. 11 replicates are needed if we accept that half of the overexpressed genes might go unnoticed (β = 0.5, although this is a very conservative estimate). If we want to detect at least 75% of overexpressed genes (β = 0.25), the number of necessary replicates rises to 22.

Hierarchical replication
Microarray experiments are done in stages, and at each stage replication can be used for averaging and reduction of variation. Examples are the growing of several cultures of the same mutant, multiple mRNA extractions, multiple spots on one array, or even multiple image analyses. Considering that replications on each level come at different costs, estimating their relative contributions to overall variation can help designing cost-efficient experiments.
To be more specific, let us assume that replication has been conducted in the following way. n C cultures are grown separately, each one hybridized on n A arrays, and each array scanned and analysed as image n I times. This results in a total of n C n A n I array data sets replicated in a hierarchical fashion (hybridizations nested in cultures, scannings nested in hybridizations). One difficulty with the analysis of such experiments is that replicates are no longer independent of each other, e.g. replicates from the same culture will show some correlation in their noise, due to the common underlying variation in the culture.
Assuming linearity of effects and a normal distribution of noise, an analysis of variance components allows us to calculate the variance contributions at different levels in such hierarchical designs (see Oehlert [1]) for an accessible introduction to this topic). Analysis of variance components is similar to a standard ANOVA analysis but recognizes that some effects are random. For example, if the same culture was used in all future microarray experiments, the culture effect would be a fixed factor in an ANOVA analysis. The likelier scenario is that new cultures are grown every time, and the culture effect enters the analysis as a random factor. Usually, we are interested in noise inherent in the process of growing new cultures or hybridizing to new arrays. Consequently, such factors are best analysed as random factors.
In the hierarchical experimental setting described above, the overall variance of cultures σ 2 C and 374 L. Wernisch the overall variance of arrays σ 2 A are usually very small and can be ignored once raw intensity data have been normalized. What remains are the variances related to genes, such as the gene-culture variance σ 2 GC the gene-array variance σ 2 GA and the residual variance σ 2 stemming from multiple image analyses. These variance components all contribute to the variance σ 2 m of the mean of log 2 ratios for a particular gene: If such variance components have been derived from preliminary studies, then this equation combined with equation 1 allows the calculation of the number of cultures, of arrays per culture, and of image analyses per array, necessary to achieve a desired resolution in differential gene expression.
Simply increasing the number n C of cultures is the best way to decrease variability. If different costs are involved in producing cultures and arrays or image analyses, then some balance between numbers of cultures, arrays and image analyses might be better. The best combination of such numbers can be obtained by optimizing σ 2 m under the additional constraints on costs.