Heterologous Array Analysis in Pinaceae: Hybridization of Pinus Taeda cDNA Arrays With cDNA From Needles and Embryogenic Cultures of P. Taeda, P. Sylvestris or Picea Abies

Hybridization of labelled cDNA from various cell types with high-density arrays of expressed sequence tags is a powerful technique for investigating gene expression. Few conifer cDNA libraries have been sequenced. Because of the high level of sequence conservation between Pinus and Picea we have investigated the use of arrays from one genus for studies of gene expression in the other. The partial cDNAs from 384 identifiable genes expressed in differentiating xylem of Pinus taeda were printed on nylon membranes in randomized replicates. These were hybridized with labelled cDNA from needles or embryogenic cultures of Pinus taeda, P. sylvestris and Picea abies, and with labelled cDNA from leaves of Nicotiana tabacum. The Spearman correlation of gene expression for pairs of conifer species was high for needles (r2 = 0.78 − 0.86), and somewhat lower for embryogenic cultures (r2 = 0.68 − 0.83). The correlation of gene expression for tobacco leaves and needles of each of the three conifer species was lower but sufficiently high (r2 = 0.52 − 0.63) to suggest that many partial gene sequences are conserved in angiosperms and gymnosperms. Heterologous probing was further used to identify tissue-specific gene expression over species boundaries. To evaluate the significance of differences in gene expression, conventional parametric tests were compared with permutation tests after four methods of normalization. Permutation tests after Z-normalization provide the highest degree of discrimination but may enhance the probability of type I errors. It is concluded that arrays of cDNA from loblolly pine are useful for studies of gene expression in other pines or spruces.


Introduction
Approaches to studying the genetics and physiology of conifers and other forest trees are being radically altered by the arrival of 'high-throughput' methods. Large-scale DNA sequencing (e.g. Allona et al., 1998) was followed by the introduction of methods to enable large-scale analysis of function, such as the use of high-density arrays of hundreds or thousands of cDNAs printed on glass (Shena et al., 1995) or membranes (e.g. Heller et al., 1997;Richmond et al., 1999;Cairney et al., 1999). These arrays are hybridized with suitably labelled cDNA derived from the material of interest. The intensity of the label over each spot provides an estimate of steady-state mRNA relative abundance for large numbers of genes, potentially all the genes active in the tissue. For a recent review of the methodology Heterologous array analysis in Pinaceae 307 for microarrays on glass, see Hegde et al. (2000). Arrays on membranes and on glass give comparable results, (Richmond et al., 1999), but arrays on glass are preferred when large numbers of genes are being surveyed.
DNA microarrays have already found wide use for plants. Studies of Arabidopsis were included in the earliest report (Schena et al., 1995); and more recent reports include those of Ruan et al. (1998), Schenk et al. (2000), Girke et al. (2000) and Maleck et al. (2000). For woody species there is a study of genes regulating somatic embryogenesis in Pinus taeda (Cairney et al., 1999), of wood formation in Pinus taeda (Whetten et al., 2001) and of wood formation in Populus (Hertzberg et al., 2001a,b).
Microarray analysis will be more widely applicable if arrays constructed from Pinus taeda DNA, currently the only conifer for which extensive sequence information is publicly available, can be used for detection of gene activity in other conifer species. Recently a microarray of 2600 Arabidopsis genes expressed in seeds was hybridized with labelled cDNA from Arabidopsis or from another cruciferous species, Brassica napus; only a minor loss of sensitivity was noted with the heterologous probe (Girke et al., 2000). Presumably the stringency of hybridization does not resolve most of the differences in sequence between these species. The present study compares results obtained when arrays of Pinus taeda cDNA, printed on membranes, are hybridized with labelled cDNA from the same species, from another species of the same genus (P. sylvestris), from another genus of the family Pinaceae or from an angiosperm (Nicotiana tabacum).

Plant material
The array was printed with clones of a cDNA library constructed from RNA isolated from the differentiating xylem of compression or side wood of three 6 year-old Pinus taeda (loblolly pine) trees (Allona et al., 1998), from shoot-tips 2 cm from the apex, or from immature pollen cones. The arrays were hybridized with labelled cDNA derived from: 1. Needles of seedlings of P. taeda raised in growth cabinets for 7 weeks in continuous light from fluorescent tubes supplemented with tungsten filament lamps at about 200 µmol/m 2 /s at 25 • C. The seeds were treated with 1% hydrogen peroxide under fluorescent light, 100 µmol/m 2 /s, with two changes over 3 days, for disinfection and to induce germination before planting in pots containing a grit : sand : vermiculite : perlite mixture (4 : 2 : 2 : 8). From the cotyledonary stage the seedlings were watered with a dilute nutrient solution (Ingestad, 1979). 2. Needles of seedlings of Pinus sylvestris (Scots pine) raised as above except that after 4 weeks at 25 • C the seedlings were transferred to 20 • C for a further 2 weeks before harvesting. The seeds, from a seed orchard at Hultsfred, central Sweden, lot S21 A8210001, were soaked in water overnight before planting. 3. Needles of seedlings of Picea abies raised as for Pinus sylvestris. The seeds were from a seed orchard at Saleby, central Sweden. 4. An embryogenic culture of P. taeda line 344 maintained on a proprietary medium at the Institute of Paper Science and Technology, Atlanta, GA, USA. 5. An embryogenic culture of P. sylvestris, line F2, maintained on DCR medium (Gupta and Durzan, 1986). 6. An embryogenic culture of P. abies line 95 : 88 : 17 maintained on LP proliferation medium (Bozhkov and von Arnold, 1998). 7. Leaves of Nicotiana tabacum cv. xanthi, raised in compost in the greenhouse. In short, material consisted of needles from conifer seedlings, proliferating somatic embryogenic cultures of conifer species, or tobacco leaves.

Genes in the array
Genes from the cDNA libraries were selected to represent various functional categories, together with 10 negative control genes from organisms phylogenetically remote from plants (Table 1) Isolation of RNA and synthesis and labelling of cDNA The cDNA library from differentiating xylem of P. taeda was constructed as described previously (Allona et al., 1998). Total RNA from conifer seedlings was isolated according to Chang et al. (1993), from conifer embryogenic cultures using Qiagen RNeasy Plant Mini Kit columns according to the manufacturer's instructions except that the extraction buffer was that of Chang et al. (1993). Total RNA was extracted from Nicotiana leaves as described by Logemann et al. (1987). DNA was removed by treatment with DNAse I (Sigma kit). For first strand cDNA synthesis, 5 µg denatured total RNA was added to a reaction mixture of final volume 20 µl containing 50 mM Tris-HCl, pH 8.3, 75 mM KCl, 3 mM MgCl 2 , 10 mM DTT, 0.5 mM each dATP, dCTP, dGTP and dTTP, 0.5 µg 17-mer oligo-dT primer mixture with A, C or G at the 5 -prime end to minimize the transcription of long poly-A tails, 40 units RNase Out RNase inhibitor (Life Technologies) and 200 units Superscript II reverse transcriptase (Life Technologies). The mixture was incubated for 1 h at 42 • C and the reaction was terminated by heating to 70 • C for 15 min and cooling on ice. The first strand reaction mixture was then added to a reaction mixture for second strand cDNA synthesis containing secondstrand buffer (Life Technologies) according to the manufacturer's instructions and incubated for 2 h at 16 • C. The cDNA was then precipitated with 0.3 M sodium acetate and 2 vol ethanol, centrifuged down, washed in 70% alcohol, redissolved in 34 µl Tris-EDTA (10 mM-1 mM) buffer, pH 8.0, and denatured for 5 min at 95 • C. The cDNA was labelled with fluorescein-11-dUTP by random priming and Klenow fragment in a reaction volume of 50 µl (ECF random prime labelling system, RPN 5751, Amersham Pharmacia).

Hybridization of the membranes with labelled target and detection and quantification of label
Membranes were prehybridized for 30 min at 60 • C in 15-20 ml 5× SSC, 0.1% SDS, 20-fold dilution of liquid block (Amersham Pharmacia RPN 3601), 5% dextran sulphate (mol weight 500 000) in hybridization bottles (Hybaid). Then labelled target was added to the prehybridization buffer (0.3-1.5 µl/ml buffer) and membranes were hybridized overnight at 60 • C. Membranes were washed for 15 min in preheated 1× SSC, 0.1% SDS at 60 • C and for 10 min in 0.5× SSC, 0.1% SDS at 60 • C. They were then briefly rinsed in freshly made or autoclaved 100 mM Tris-HCl, 300 mM NaCl, pH 7.5 (buffer A) and incubated in the blocking solutions of the kit (RPN 3601) according to the manufacturer's instructions. Membranes were then incubated with antifluorescein antibodies coupled to alkaline phosphatase (Roche) diluted 50 000-fold in detection buffer (0.1 M Tris-HCl, 0.1 M NaCl, pH 9.5) for 1 h at room temperature with shaking, using 0.2 ml/cm 2 membrane. Membranes were washed for 3 × 10 min in 0.3% Tween 20 in buffer A at room temperature. Then wash buffer was drained and 2 ml CDP-Star (Roche) diluted 1 : 5 in detection buffer, was pipetted on to each membrane (15 × 10 cm) for 1.5 min in a room with low lighting. Excess reagent was drained off, the blots wrapped in plastic and exposed for 0.5-60 min to Hyperfilm ECL (Amersham Pharmacia) in a cassette. After development (Figure 1), the autoradiographs were scanned and the images quantified with the Quantity One Image Analysis Program (Biorad Ltd). Local values for background were estimated using the program after initial testing with global values.

Normalization of data
For each replicate, the pixel values (intensities, estimates of gene expression) for each gene were adjusted so that the lowest scoring gene was zero. Then the data from the various replicates were normalized by each of the following procedures: • Normalization by mean: the intensity for each gene in a particular replicate was divided by the mean intensity for all genes in the replicate and multiplied by 100 so that values were expressed on a scale where 100 was the mean intensity. • Student normalization (Richmond and Somerville, 2000): the intensity of each gene in a replicate was divided by the standard deviation of the intensities of all the genes in the replicate. • Z-score normalization (Richmond and Somerville, 2000): (a) from the intensity (χ n ) of the nth gene in the replicate was subtracted the mean value (µ) for all genes in the replicate, and this difference was divided by the standard deviation (σ ) for the intensities of all the genes in the replicate, i.e. Z n = (χ n − µ)/σ ; (b) Zscore normalization as above, except that the data were first logarithmically transformed to base 2 before calculating the Z-score. Before taking logs, 1 was added to each gene intensity to avoid problems with the logarithms of small numbers, i.e. Z n = [log 2 (χ n + 1) − µ]/σ . • Regression normalization (after Hegde et al., 2000): a linear regression line was fitted to relate the intensities of all the genes in the second replicate to the intensities of the first replicate array of the series under comparison. The intensities of each gene in the second replicate were normalized (rescaled) by dividing by the slope of the regression line; the intensities of the genes in the third and subsequent replicate arrays were then similarly rescaled to those of the first replicate. Where different species or cell-types were being compared, the intensities of each array were related to the first replicate of the first species or cell-type. The principle here is that for closely related samples, many of the genes should be expressed at nearly constant levels. Consequently, a scatterplot of the measured intensities in the nth array vs. those in the first array should have a slope of one (Hegde et al., 2000).

Frequency distribution of the intensity of gene expression
The intensities of each gene in a replicate array were normalized by method 1 (see above) and mean values for each gene over the three replicates were calculated. For one set of calculations the mean values for gene expression were transformed to logs to base 2. The proportion of genes whose intensity fell into various intervals ('bins'), such that 16-19 bins covered the range of expression, was graphed to show the frequency distribution of estimated gene expression.

Testing the significance of changes in gene expression
The significance of apparent changes in gene expression from embryogenic callus to needles that are consistent for all three conifer species was tested by various methods to assess the sensitivity and robustness of the system. Data were normalized by methods 1, 3a, 3b or 4. After applying methods 1, 3a or 4, the data were transformed to log base 2; after method 3a, four (the smallest suitable integer) was added to the normalized values before log transformation to avoid logarithms of negative numbers. The significance of the difference in mean expression for each gene was assessed by t-tests.
In addition, a permutation method, essentially as described in Good (1993), was applied as follows: (a) The data were normalized according to method 3b. (b) A t-value, t obs , was calculated for each gene for the difference in means between the nine observations for embryogenic callus and the nine observations for needles. Variances of the two groups were assumed unequal (Satterthwaite method). (c) The labels were rearranged (permuted) between the two groups in all 48 620 possible arrangements, or in a random sample of 1000 arrangements. (d) The permuted t-value, t*, was calculated for each arrangement. (e) A p value was calculated for each gene as the proportion of t* values greater than t obs .

Frequency distribution of gene expression
Since many statistical tests are valid only if the data approximate to a Gaussian distribution, the frequency distribution of expression intensities for all the genes in the array was examined. For untransformed gene expression data normalized by method 1, the frequency distribution was skewed such that the median value was 75-90% of the mean value (Figure 2A and data not shown). This was corrected by the log to base 2 transformation ( Figure 2B) after which the frequency distributions did not differ significantly from Gaussian by the Kolmogorov-Smirnov test (data not shown).

Negative controls
The intensities of the genes were normalized by method 1 and then an average for each gene for the three replicates of each species and both celltypes was calculated. The values were arranged in ascending order of magnitude (data not shown). The first five genes were globin, Sp4, Sp1, an open control and BAR, i.e. four of the 10 putative negative controls were excellent as negative controls, showing only a low background similar to the blank and less than any of the pine genes. Their intensities were all close to 20% of the average gene intensity. A group of negative control genes showing somewhat more hybridization consisted of GFP, Sp2, HPH, the BT toxin and Sp3, respectively in places 21, 30, 37, 48 and 51. The intensities were 38-48% of the average gene intensity. The tenth gene, gusA, encoding βglucuronidase, showed considerable hybridization, occupying place 133 with an intensity of 70% of the average gene. The four most highly expressed (conifer) genes, of intensity 300-330% of the average gene, included one related to a zinc-finger protein, Spalt, one coding for a ribosomal protein, and two of unknown function.

Pairwise comparisons of gene expression
To test the potential for hybridizing labelled cDNA from one conifer species with arrays of cDNA from another conifer species, the intensities of gene expression were compared after heterologous and homologous hybridization with the Pinus taeda array (Table 2). For each comparison, the Pearson correlation coefficients on log-transformed data showed good agreement with, but were slightly higher than, the Spearman (non-parametric) correlation coefficients. The Spearman correlation coefficients were high, 0.88-0.93, for needle comparisons, and nearly as high, 0.83-0.91 for the embryogenic comparisons. For comparisons of different cell-types (embryogenic callus and needles) within the same species, the correlation coefficients were more variable (0.68-0.85). The correlations are graphed for two examples with P. abies material in Figure 3. Some of the calculations were repeated for the subset of 86 unidentified genes, to reduce possible bias towards genes of highly conserved sequence. For needles, the Spearman correlations were slightly higher for this subset than for the complete set, range 0.89-0.94. For embryogenic cultures, the Spearman correlations were lower, range 0.67-0.86.
To test phylogenetically more remote comparisons, labelled cDNA from tobacco leaves was hybridized with the P. taeda arrays. The Spearman correlation coefficients for the comparisons of expression in angiosperm leaves and in conifer needles were still high (0.72-0.79) but were considerably lower than those for the comparisons for the different conifer species (Table 2). The values for the correlation coefficients were little affected by the method of normalization employed (data not shown).

Differences in gene expression between needles and embryogenic callus
The aim here was to find if heterologous probes could be used to identify tissue-specific gene expression over species boundaries. Genes differentially expressed between embryogenic callus and needles across the three conifer species were investigated. We also examined the sensitivity of the statistical tests to the method used for normalizing the data among replicates both within and between cell-types and species, and the level of agreement among alternative methods of statistical testing. For this comparison, data from all three species were combined (i.e. nine arrays from embryogenic callus were compared with nine arrays from needles). First the ratio of the mean expression of each gene (relative to the mean expression of all genes in the array) in embryogenic callus to that in needles was calculated. The results after normalization by method 1 and method 4 were quite  (Table 3). Of the 30 genes, most downregulated in embryogenic callus, the first seven ranked the same by both methods, and 25 were common to both lists. Of the 30 genes most upregulated in embryogenic callus, only the most upregulated gene ranks the same in both lists, but 29 genes are common to both lists. In short, the method of normalization does not affect the classification of genes as up-or downregulated but affects the precise ordering of the genes. Then the significance of the difference in mean expression was calculated by direct t-testing or by the permutation method. Genes showing significant (p < 0.01) two-fold changes across cell-type by the direct ttest are listed in Table 4 with p values estimated after four methods of normalization. The total number of genes showing significant (p < 0.01) up-or downregulation from embryogenic callus to needles across the three conifer species, by the direct t-test assuming unequal variances or by permutation analysis, were also compared ( Table 5). The genes showing significant two-fold changes are the same by direct t-tests and by permutation analysis (Table 4) but the total number of genes showing significant changes, i.e. including changes smaller than two-fold, is 33-67% greater by the permutation analysis (Table 5). The overall correlation between the two methods for all genes was high (e.g. r = 0.9986 after normalization by method 3b). Significance levels are higher after Znormalization than after normalization by the other two methods, both by the direct t-test and by the permutation analysis (Table 4).

Degree of sequence identity required for hybridization to genes in the arrays at various washing stringencies
Our results show that labelled cDNA from the three conifer species hybridized with the Pinus taeda arrays with comparable efficiency (Table 2). This is to be expected in view of the high sequence similarity for those genes that have been sequenced in both Pinus and Picea, usually 90% or higher nucleotide identity ( Table 6). Most of the genes on the array were chosen because they were functionally identifiable from amino acid similarity to known genes, usually angiosperm genes. To that extent the genes in the array are biased toward conservation. To assess the effect of this possible bias, correlations were recalculated for the subset of 86 unidentified genes. For needles, the Spearman correlations were essentially unchanged, in fact slightly higher (see Results). For embryogenic cultures, the correlations were lower for the subset than for the complete set of 383 genes, particularly for pairwise comparisons across genera. This probably reflects the reduced constraints on regular growth in tissue culture, i.e. in part reflects chance differences in the degree to which genes are being expressed, rather than differences in the Pinus and Picea genomes.
How similar does a labelled cDNA have to be with the DNA of the target spot in an array for detectable levels of hybridization? This is usually considered for cross-reactions on microarrays Table 3. Comparison of the ratio of gene expression in embryogenic callus to that in needles (E : N) averaged over the three conifer species (three replicates per species) for two methods of normalization, by mean (method 1) or by slope of the regression line (method 4). For details of the two methods, see Materials and methods. The ratios E : N are sorted in ascending order (columns 2-5) or descending order (columns 6-9) of the first 30 genes. 'Gene nr' identifies the gene in the arrays. Both the size of the gene expression ratio and its position in the ordered sequence change somewhat according to the method of normalization. Bold genes have the same position after both normalization methods, roman genes change position and italic genes appear only in one column among genes that are different members (paralogues) of the same family, since such crossreactions may seriously affect interpretation of the data. Here the question concerns cross-hybridization of the array with labelled cDNA from another species; how closely related must the species be for meaningful results? Recent estimates are that considerable cross-hybridizations, even after standard high stringency washing, occur when the labelled cDNA shows more than 70% sequence identity over a length greater than 200 bp (Richmond et al., 1999;Richmond and Somerville, 2000). A single short region of 70-90% identity caused little hybridization, but shorter regions of identity spread over the length of the target resulted in significant hybridization (Heller et al., 1997). Table 6 shows similarities for some of the Pinus taeda genes in the present array. Only two of the five genes included in the table are regarded as highly conserved, but the P. taeda cDNA on the arrays from all of them, as expected, cross-hybridized with cDNA, not just from the conifers, but also from the angiosperm, tobacco (data not shown). This degree of sequence identity is probably sufficient to explain the quite high Spearman correlation (0.72-0.79) of gene expression in tobacco leaves with that in conifer needles (Table 2) as measured in each case by hybridization with the conifer array. The degree of cross-reaction was reduced somewhat by increasing the stringency of the final wash from 0.5× Table 4. Genes, identified by gi genebank number (or sequencing project designation) and gene number in the array, showing significant two-fold up-or downregulation from embryogenic callus to needles across the three conifer species. The ratio of mean expression in embryogenic callus to that in needles after normalization by mean (method 1) is denoted E : N. Array results were normalized by method 3a (Z-score), method 3b (log transformation followed by Z-score normalization), method 4 (regression) or method 1 (mean). The significance of differences for each gene in mean values for embryogenic callus and needles was estimated by direct t-test, assuming unequal variances (after log transformation except for method 3b) or permutation and P-values are entered in the last four columns, after direct t-test above, (after permutation below in parenthesis)

Normalization method Gene description
Gene nr in array E : N Z log, Z regression mean SSC to 0.1× SSC at 60 • C (data not shown), but the standard washing stringency was comparable with that followed by other workers using fluorescent labels. Hybridizing Arabidopsis cDNA with similar Pinus taeda arrays on glass confirmed the high degree of cross-reaction (data not shown) between angiosperms and gymnosperms. cDNA microarrays often do not distinguish closely related members of gene families. Results reflect the response of the most abundant members. To prevent cross-reaction among members of a gene family, either the cDNAs used to print the arrays need to be restricted to non-conserved regions, or the labelled probe cDNA will have to be shortened to the 3 -prime ends (Hertzberg et al., 2001a). While four of the putative negative control genes showed only a background level of expression equal to that of the open control, the other six, particularly gusA, which appears unsuitable as a negative control, showed detectable cross-reaction (see Results). The gene gusA shows sequence similarity with other genes in the databanks. The number of genes distinguishing pine and spruce is probably small. Humans and chimpanzees differ strikingly in anatomy and behaviour, yet the coding regions of genes studied to date are 98-99% identical, and only one human gene is known to be absent in chimpanzees (Gibbons, 1998;Gagneux and Varki, 2001).

Use of fully randomized replicates
In much early work with microarrays, the replicate spots were printed side-by-side, thus introducing bias. This problem has been avoided here by using fully randomized replicates. The analysis can be performed essentially by regarding the replicates on membranes as blocks in a field trial and applying standard agricultural statistics (Kerr et al., 2000). Lee et al. (2000) considered that at least three replicates were required to distinguish expressed from non-expressed genes in their experiments.
The sensitivity of the procedures described was tested by estimating the statistical significance of small changes in gene expression between embryogenic callus and needles (Tables 4, 5). Here results for the three species were pooled, so that nine arrays were compared for expression in each celltype. After Z-normalization on log-transformed data, (method 3b), 22 genes showed significant changes in expression at p < 0.001 by direct ttests, and 43 genes at p < 0.01; the corresponding figures after permutation analysis were 42 and 72 (Table 5). Some of the changes of 1.6-fold in expression were significant at p < 0.01. When changes of at least two-fold are considered, the results are reasonably independent of the method of normalization (Table 4 and data for the permutation test not shown). The significance levels are, however, higher after Z-normalization than after normalization by the two other methods. Normalization by the regression method, unlike the other methods tested, failed to equalize the mean expression of all genes in the array among replicates, among species, or between tissues, a prerequisite for a meaningful analysis in the present study.
In short, Z-normalization on log-transformed data was the most effective method of normalizing the data. Permutation tests are preferable to direct ttests on theoretical grounds as 'exact methods', but the more convenient and familiar direct t-tests performed well if with less sensitivity. If 384 genes are studied and the critical probability is taken as p = 0.01, three to four genes are expected to show significant change in expression by chance. The choice of critical probability depends on the relative undesirability of type 1 and type 2 errors in the particular circumstances, and on whether the results can be confirmed by alternative experimental methods.
Exact methods, such as permutation tests, for significance testing are practical nowadays with modern computers (Good, 1993). The full permutation analysis examining all 48 620 arrangements of the data gave results in reasonably good agreement with the conventional t-test (Tables 4-5). Reducing the number of permutations in the analysis from the complete series of 48 620 to 1000 leads to serious errors and cannot be recommended; after normalization by method 3b, 39 genes were wrongly classified for significance with p = 0.01 as the cutoff point. More genes showed significant differences in expression between embryogenic callus and needles by the complete permutation analysis than by a direct t-test (Tables 4, 5), but all genes selected by the direct t-test were also selected by the permutation method (data not shown). The permutation analysis is more sensitive in that it detects smaller changes in expression; the changes in expression less than 1.6-2-fold, however, are in general unlikely to have much biological significance, although they may be important in particular circumstances. Two of the genes showing very high significance (p < 0.00001) by the permutation test were not significant even at p < 0.05 by the direct t-test (data not shown).

Differences in gene expression among species and tissues
In pairwise comparisons of gene expression for needles correlations were slightly higher for species of the same genus than for species from different genera (Table 2). Correlations were higher for needles than for embryogenic callus (Table 2), presumably because callus cultures are more variable than needles, both in cell composition and over time. Part of the explanation for the more similar gene expression (at least when the complete set of genes is considered) for callus of P. abies and P. sylvestris than for P. taeda:P. sylvestris or P. taeda:Picea abies may be that the supply of growth regulators was the same for P. abies and P. sylvestris but different for P. taeda.
A detailed study of changes in gene expression from embryogenic cultures to needles was inappropriate in view of the great differences in cultural conditions, but at least some of the genes related to photosynthesis were expected to be less expressed in the dark-grown embryogenic cultures than in needles. From the permutation test after Z-normalization (method 3b), of the 13 genes broadly related to photosynthesis, four (ferredoxin precursor, rubisco, oxygen-evolving enhancer protein, sucrose phosphate synthase) were upregulated highly significantly (p < 0.001) in needles, and CAB was upregulated significantly (p = 0.0170); none were downregulated significantly.

Concluding remarks
The data presented here support the conclusion, expected from the high sequence identity of genes from related species, that arrays printed from one species of Pinus or Picea give useful information from hybridization with labelled cDNA from other species of Pinus or Picea. This will reduce the need for mass cDNA or genomic DNA sequencing projects and allow more forest tree laboratories to exploit the opportunities opened for studying simultaneously a large fraction of the genes expressed in a particular tissue or cell type.