Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/cfg.460 Conference Paper Meta-analysis combines Affymetrix microarray results across laboratories

With microarray technology becoming more prevalent in recent years, it is now common for several laboratories to employ the same microarray technology to identify differentially expressed genes that are related to the same phenomenon in the same species. Although experimental specifics may be similar, each laboratory will typically produce a slightly different list of statistically significant genes, which calls into question the validity of each gene list (i.e. which list is best). A statistically-based meta-analytic approach to microarray analysis systematically combines results from the different laboratories to provide a single estimate of the degree of differential expression for each gene. This approach provides a more precise view of genes that are of significant interest, while simultaneously allowing for differences between laboratories. The widely-used Affymetrix oligonucleotide array and its software are of particular interest because the results are naturally suited to a meta-analysis. A simulation model based on the Affymetrix platform is developed to examine the adaptive nature of the meta-analytic approach and to illustrate the utility of such an approach in combining microarray results across laboratories.


Introduction
When a laboratory employs microarray technology to identify genes related to a specific disease or condition of interest, the published result is most often a list of candidate genes determined to be statistically significant. This claim of significance is usually based on some consideration of a measure of differential expression between a control (non-diseased) sample and a treatment (diseased) sample, and should be based on a statistical test to account for variability in the experimental data. Although there are numerous measures of differential expression and statistical tests found in an ever-growing body of literature [7][8][9]16], the results are often vastly different. Even when multiple laboratories investigating the genetic basis of the same biological phenomenon in the same organism, using essentially the same experimental design and microarray platform, report differential expression using the same measure and statistical test, the lists of candidate genes from each laboratory may differ considerably.
In an attempt to draw together the results from each laboratory while acknowledging their differences, we applied a meta-analytic approach [24], which considers each laboratory as a separate study investigating the same effect. The results from each laboratory are combined in a systematic manner to arrive at a more precise understanding of each gene's true relationship to the phenomenon of interest. Where the term 'analysis' is used to describe the quantitative approaches to draw useful information from raw data, the term 'meta-analysis' [13] refers to the approaches used to draw useful information from the results of previous analyses. Such approaches have particular utility with the results from Affymetrix GeneChip  microarrays and other Meta-analysis combines Affymetrix microarray results across labs 117 fabricated arrays, because these results are given in a uniform format and readily lend themselves to comparisons between labs. Of greatest interest are the meta-analytic approaches that yield not only a list of candidate genes based on statistical tests, but also an estimate (and associated standard error) of the degree or magnitude of differential expression. This information is useful not only in identifying which genes are differentially expressed, but also in making statements about the degree to which they are differentially expressed between the tested conditions.
Recently, several applications of meta-analysis to microarray data have appeared in the literature [5,12,[17][18][19][20][21]26]. Most of these approaches have focused on combining significant results rather than on providing estimates of the magnitude of differential expression. Those that have provided quantitative estimates focused on combining results across platforms without fully accounting for the technological differences. An approach that combines the results from microarray experiments to produce a single estimate of the degree or magnitude of differential expression for each gene is presented here.

Differential expression with Affymetrix microarray technology
The Affymetrix GeneChip  microarray [1] is currently one of the most widely-used platforms for gene expression experiments. Affymetrix has developed statistical algorithms [23] that employ individual spot intensities on the microarray for the purpose of estimating the true expression levels of individual genes in single samples. Furthermore, the Affymetrix software MAS 5.0 [2] compares gene expression levels in two different tissues (samples or treatment conditions) and reports a 'signal log ratio' (SLR) with 95% confidence bounds. The signal log ratio is the signed log 2 of the signed fold change (FC) familiar to biologists [25], i.e. FC = 2 SLR if SLR > 0 and FC = (−1)2 −SLR if SLR < 0.
If we letθ i ,k denote the estimate of the SLR θ i ,k for gene k in lab i , andθ is the upper 0.025 critical value of the t distribution with df i ,k = max(0.7(n i ,k − 1), 1) degrees of freedom, where n i ,k represents the number of probe pairs representing gene k on each array in study i . Based on this, lab i may declare gene k significantly differentially expressed if zero is not within the confidence inter- , with α appropriately selected to adjust for multiple comparisons. The estimated variance of the SLR estimateθ i ,k is v i ,k = s 2 i ,k . In order to be combined across studies, quantitative estimates must address the same measure or quantity, be standardized to the same scale, and include some measure of variability [14]. The SLR satisfies these criteria. Specifically, the SLR for a given gene represents the degree of differential expression between two conditions and is directly comparable between labs, since it estimates the same physical quantity. The SLR from Affymetrix is standardized in the sense that a SLR of zero means no differential expression is observed, and the algorithms used to produce the SLR place all SLR estimates on the same scale. A variance for the SLR estimate is provided by the Affymetrix algorithms. In fact, the Affymetrix software MAS 5.0 [2,23] gives the results from each laboratory in a uniform format, providing the necessary components for a meta-analysis: a quantitative estimate (θ i ,k ) and the associated measure of variability (v i ,k ).

Random effects meta-analysis
Three main meta-analytic approaches exist [6]: fixed effects, random effects and hierarchical Bayes. The first two will be summarized here, focusing on the random effects model [10]; the third is beyond the scope of this review. The basic meta-analytic model can be represented as: whereθ i ,k is the observed SLR estimate for gene k from experiment i (i = 1, . . . , n), and ε i ,k is the within-experiment sampling error. Here, θ k is the quantity that experiment i is expected to measure (the true underlying SLR for gene k ), but due to the random deviation δ i ,k , experiment i is in fact measuring θ i ,k . That is, there is a true SLR (θ k ) for gene k but, due to random deviation, the SLR in each experiment (θ i ,k in experiment i ) is slightly different. It is typically assumed that δ i ,k is normally distributed with mean zero and variance 2 k , and ε i ,k is normally distributed with mean zero and variance σ 2 i ,k . The variance σ 2 i ,k is estimated by v i ,k , the estimated variance of the observed SLR.
The fixed effects meta-analysis model carries the additional assumption that 2 k = 0. This assumption, referred to as the homogeneity assumption, has the interpretation of assuming that gene k is expressed the same in all experiments, and that observed differences are due to sampling error only. In practice, the fixed effects model tends to be overly simplistic and will not be considered further in this review.
The quantity of interest in this model is θ k , the underlying 'true' SLR value for gene k . The method of moments [6] can be used to estimate 2 k and then θ k by following a generalized least squares approach [22] To test the significance (i.e. the significant differential expression) of gene k , the test statistic Z k = θ k / v k is employed. Under H 0,k : θ k = 0, Z k approximately follows the standard normal distribution, therefore gene k can be assigned a P value, where is the cumulative distribution function of the standard normal distribution. The meta-analysis declares gene k significantly differentially expressed at significance level α if P k < α, where α may be appropriately adjusted for multiple comparisons.

Simulation model
In order to evaluate the utility and power of the proposed meta-analytic approach, a simple simulation study was conducted. It is useful to use the results from a simulation study to compare with the true simulation setting, or the 'truth', for the purpose of assessing how well the method works (i.e. in this case, the meta-analysis). Specifically, the adaptive nature of the meta-analytic approach can be illustrated by comparing the 'true' SLR values with the estimates from each simulated lab and from the meta-analysis. 'Raw' probe-level data were generated from a model assuming that mismatch intensities (MM) are random background noise, which is an underlying assumption of the Affymetrix approach [23]. Specifically, the model assumes that the mismatch (MM ) intensity for probe l of gene k under treatment j in experiment i followed a gamma distribution: MM ijkl ∼ (α, β). Once the background mismatch intensities were obtained, the perfect match (PM ) intensities were generated via the model: Six experiments were considered with each using the same two treatments (control and experimental). The term ρ k ∼ Bernoulli(p) is 1 if gene k is differentially expressed between conditions j = 1 and j = 2, and is 0 otherwise. The parameter p corresponds to the percentage of genes that are differentially expressed, with higher values resulting in more differentially expressed genes. In this model, L i is the effect of lab i , T j is the effect of treatment j , G k is the effect of gene k , P (G) (k )l is the effect of probe l of gene k , ε (ijk )l is a random error term, and the other terms are interaction effects. To introduce more betweenlab variability, the error variance was allowed to be different in each lab. Various sources of variability in the 'observed' simulated data can be introduced by adjusting the parameters in this model.

Results
Both the simulation and the meta-analysis were conducted using R software [15,25], as well as the affy package [11] from the BioConductor project [4,25]. The simulation was performed based on the Affymetrix rat neuro-chip RN U34, with model parameter settings selected to produce a distribution of MM intensities similar to that observed in real data (Figure 1a, b) and to produce signal log ratio (SLR) estimates within a reasonable range and with some variation between laboratories (Figure 1c, d).
A random effects meta-analysis combined the SLR estimates from the six simulated labs to arrive at a single SLR estimate for each gene. The test of significance H 0,k : θ k = 0 was performed for all 1322 genes, and the P -values for each of the genes were summarized in a histogram of significance P -values (Figure 2a). If there were no differentially expressed genes (i.e., if H 0,k : θ k = 0 were true for all genes k ), then this histogram should be relatively flat. The abundance of smaller P -values is indicative of a larger number of significant genes. These smaller P -values tended to correspond to the largest meta-analysis SLR  Figure 1. Summary of results from simulated labs. The simulation parameters were selected to force the simulated mismatch (MM) intensities (b) to have a distribution similar to that observed in real data (a). The SLR estimates from each lab (c) tended to be close to zero, with deviations from zero indicating differential expression. The simulated results from different labs (d) tended to be similar but also exhibited noticeable differences  Figure 2. Summary of random effects meta-analysis. If there were no differentially expressed genes, the histogram of significance P values (a) would be expected to be relatively flat. The excess of small P values here indicates a large number of differentially expressed genes. Those genes with larger SLR estimates tended to have smaller significance P values (b) because larger SLR values are indicative of differential expression. The horizontal reference line in (b) represents the P value threshold to control the FDR at 0.05. The standard errors of the random effects meta-analysis SLR estimates (c) tend to be lower overall than the standard errors from any single lab (Figure 1c), particularly for genes with SLR values away from zero estimates (Figure 2b). Similar to the results from a single lab (Figure 1c), most meta-analysis SLR estimates were close to zero (Figure 2c; indicative of non-differential expression), but the standard errors were slightly lower overall for the metaanalysis estimates, after combining the SLR estimates across labs. Under a random effects meta-analysis with the false discovery rate (FDR) [3] controlled at 0.05, 72 of the 1322 genes were declared significantly differentially expressed (i.e. H 0,k : θ k = 0 is rejected). Individually, the six labs identified between 44 and 58 significantly differentially expressed genes (controlling the FDR at 0.05 for each lab) ( Table 1). For each lab, most (but not all) of its significant genes were declared significant by the random effects meta-analysis.
The results from this small simulation demonstrate how a meta-analysis handles discrepancies between labs. The meta-analytic approach proved useful in identifying genes that were statistically significantly differentially expressed, while taking into account the additional variation from the contributing labs. It did so without being overly influenced by any one lab that had potential to declare significance due to random variation. Rather than considering the union of all genes declared significant by any of the labs, and rather than simply taking the intersection of the lists of significant genes from each lab, the random effects meta-analysis combined information across all six labs in a well-structured manner and declared 72 genes significantly differentially expressed. In order to gain an understanding of the success rate and power of the meta-analysis, a comparison of the results with the 'truth', or simulation parameters is performed. The numbers of correctly identified differentially expressed genes do not vary drastically between the individual labs ( Table 1), but the meta-analysis tends to correctly identify a higher number of differentially expressed genes. In addition, the meta-analysis SLR estimates tend to be much closer to the true SLR values than do the estimates from individual labs ( Figure 3). This demonstrates that the meta-analysis combines results from multiple labs to arrive at a more precise view of the 'true' degree of differential expression for each gene.

Discussion
Using information from different laboratories interested in the same biological event, we have combined Affymetrix oligonucleotide arrays and the  Figure 3. Comparison of SLR estimates with the 'true' SLR values. The SLR estimates from each lab (a) tend to approximate the true SLR values overall, but there is more deviation from the truth in each lab than there is in the random effects meta-analysis SLR estimates (b). Green squares represent type I errors, genes incorrectly claimed as differentially expressed. Red triangles represent type II errors, genes incorrectly claimed to be not significantly differentially expressed. There are fewer errors in the random effects meta-analysis results than there are in any given lab, and the random effects meta-analysis results tend to more closely approximate the true underlying degree of differential expression for each gene results gained from Affymetrix software MAS 5.0 [2] for testing differential expression to demonstrate the utility and power of a meta-analytic approach. The signal log ratio (SLR), automatically reported by MAS 5.0 [2,23], is naturally suited to serve as an effect size estimate in a meta-analysis. The simulation example illustrated how the final SLR estimates from the meta-analysis models tend to be much closer to the 'true' SLR values than do the SLR estimates from any single lab. The application of meta-analytic approaches to microarray results provides a systematic method to combine results from different laboratories, with the purpose of gaining clearer insight into the true degree of differential expression for each gene. The random effects approach presented here can be extended to incorporate prior knowledge in a Bayesian framework, to account for known differences between experiments by use of covariates, and to adjust for possible dependencies between experiments. These extensions are the subject of ongoing and future work.