Toward Coalescing Gene Expression and Function with QTLs of Water-Deficit Stress in Cotton

Cotton exhibits moderately high vegetative tolerance to water-deficit stress but lint production is restricted by the available rainfed and irrigation capacity. We have described the impact of water-deficit stress on the genetic and metabolic control of fiber quality and production. Here we examine the association of tentative consensus sequences (TCs) derived from various cotton tissues under irrigated and water-limited conditions with stress-responsive QTLs. Three thousand sixteen mapped sequence-tagged-sites were used as anchored targets to examine sequence homology with 15,784 TCs to test the hypothesis that putative stress-responsive genes will map within QTLs associated with stress-related phenotypic variation more frequently than with other genomic regions not associated with these QTLs. Approximately 1,906 of 15,784 TCs were mapped to the consensus map. About 35% of the annotated TCs that mapped within QTL regions were genes involved in an abiotic stress response. By comparison, only 14.5% of the annotated TCs mapped outside these QTLs were classified as abiotic stress genes. A simple binomial probability calculation of this degree of bias being observed if QTL and non-QTL regions are equally likely to contain stress genes was P (x ≥ 85) = 7.99  × 10−15. These results suggest that the QTL regions have a higher propensity to contain stress genes.


Introduction
Although cotton (Gossypium spp.) exhibits moderately high tolerance during vegetative development, water-deficit stress is one of the major limiting factors in its production. Advancements in genome mapping and functional genomics provide a powerful resource for the genetic dissection of abiotic stress tolerance in crop plants [1]. Large-scale genome projects have generated a mass of knowledge regarding the genome organization and function of stress-responsive genes in plants [2]. Searchable databases and analytic tools available to the research community offer the capacity to query these data. These comparative tools from related fields enable the identification of genes and gene products and may reveal functional relationships between a genotype and observed phenotype [3,4]. Hence, there is an opportunity to make direct and meaningful comparisons from data generated by quantitative trait loci (QTLs) mapping and genome-wide expression analysis to provide solutions for crop improvement.
Several studies have identified QTLs responsible for drought stress-related traits in cotton. Saranga et al. [5,6] described a substantial number of QTLs that explained phenotypic variation in physiological variables such as osmotic potential, carbon isotope ratio, canopy temperature, and chlorophyll a and chlorophyll b content, and measures of crop productivity like dry matter, seed cotton, harvest index, boll weight, and boll number, under water-limited and/or well-watered conditions. Water-deficit stress during cotton boll and fiber development affects fiber quality characteristics. QTLs for fiber length, length uniformity, elongation, strength, fineness, and color detected under water-limited and well-watered conditions have also been reported [7]. QTL mapping alone does not provide knowledge regarding 2 International Journal of Plant Genomics the mechanisms and pathways involved in water-deficit stress tolerance, or about the multitude of genes involved in a plant's response to water-deficit stress [8]. Linking information from QTL mapping and genome-wide expression experiments offers a powerful approach to identify and characterize the key pathways and the genetics underlying water-deficit stress tolerance [4,8]. Integration of QTL information, physiological knowledge, and gene expression data is a significant step towards understanding genes controlling physiological responses that affect production and quality under stressful conditions.
A community-wide effort produced 185,198 Expressed Sequence Tags (ESTs) from 30 cDNA libraries, sampling a variety of tissues and developmental stages, including several subjected to abiotic stresses such as chilling temperatures and water-deficit treatment (http://www.agcol.arizona.edu/cgibin/pave/Cotton/index.cgi/) [9]. Subsequently, this EST collection and others have provided a wealth of sequence data for microarray-based investigation into a number of key biological processes in cotton including fiber development, pathogen response, and water-deficit stress response [10][11][12][13][14]. Even though these arrays do not provide complete transcriptome coverage, particularly for stress-related genes, together with the assembled EST database, described by Udall et al. [9], they have provided a foundation for more robust functional analysis of a multitude of developmental responses in cotton. Additionally, the ability to integrate expressed sequence data with QTLs data may allow the identification of functional regions on chromosomes that contribute to variability in quantitative traits.
A long-term goalis to explore the regulatory networks that control the expression of stress responsive genes. The principal aim of this study was to identify cotton genes implicated in water-deficit stress by integrating information generated by QTL mapping and genome-wide expression analysis. This research examined the utility of genome sequence information as a means to link functional gene expression and QTL knowledge. To bridge this information 3016 mapped sequence-tagged sites (STS) were used as anchored targets to examine sequence homology with 25,118 cotton ESTs derived from various tissues under irrigated and waterlimited conditions. Our hypothesis is that putative stressresponsive genes will colocalize to QTLs associated with phenotypic variation for stress-related traits more frequently than with other genomic regions not associated with these QTLs. Forty-four genes that appear to be functional orthologs of genes associated with stress tolerance responses, differentially expressed in response to water-deficit treatment, and mapped within a QTL likelihood interval were identified as candidate genes. This approach provides a strategy for combinatorial genomic analyses to identify candidate genes that will be useful tool for genetic optimization of fiber productivity and quality under water-limited conditions.

EST Assembly and Annotation.
Tentative consensus sequences (TCs) were generated from 25,118 cotton ESTs derived from 10 libraries: boll (irrigated and water-limited),  [16,17], was used to map the ESTs. Chromosomes 1 to 13, indicated in this study, are the consensus chromosomes on the Consensus Map. Sequence homology between the mapped sequence tagged sites (STS) and the TCs was used as a basis to determine the putative genetic location of TCs on the Consensus Map. Each TC was aligned to 3016 genetically mapped STS [16,17]. These STS loci were derived from cDNAs (abscission tissue and drought-stressed tissue from G. hirsutum, 7-10-day fiber from G. arboreum, putative gene function, disease resistance gene analogs, and Arabidopsis ESTs) and genomic DNA. The BLASTN function was used to search sequences homologous to STSs with known chromosomal locations using a threshold of 90% similarity and a minimum of 100 bp overlap. Our previous research [13] identified 2106 stressresponsive transcripts (ESTs), 879 classified as stress-induced, 1163 stress-repressed, and 64 showing reciprocal expression patterns in leaf and root exposed to water-deficit stress. These transcripts were identified from the Cotton Oligonucleotide Microarray (v1) (http://www.cottonevolution.info/), composed of 12,006 microarrays derived from an assembly of more than 180,000 Gossypium ESTs sequenced from 30 unrelated libraries. The map position of these additional 2106 stress-responsive transcripts was also investigated as described previously.

Quantitative Trait Locus (QTL) Alignment.
The relationship among ESTs and QTLs was investigated based on knowledge from a meta-analysis of polyploid cotton QTLs [15]. Using conserved markers to align the different genetic maps, a total of 432 QTLs were integrated into the Consensus Map. Among them 39 QTLs, 18 related to plant physiology (osmotic potential (OP), carbon isotope ratio ( 13 C), chlorophyll and chlorophyll content (Chla and Chl-b) and canopy temperature (CT)), 17 to plant productivity (dry matter (DM), harvest index (HI), seed cotton (SC), and boll weight (BW)), and 4 to fiber quality (fiber strength (FS), fineness (FF), and length (FL)) were detected only under water-limited conditions [5][6][7] ( Table 2). The seventeen genomic regions that contain these QTLs were the focal point for comparison with the gene expression and gene ontology data that follows. Any TC or EST that mapped within the 99% (2-LOD) confidence interval of a QTL was deemed to colocalization within the QTL.

Cotton-Arabidopsis Synteny.
Comparative analysis linked to synteny-based and expression-based information may provide clues about specific genes and families involved in QTL networks that respond to abiotic stress. Comparative analysis was conducted on significant QTL regions to deduce the cotton-Arabidopsis synteny relationship and examine the correspondence between the 39 QTLs and Arabidopsis abiotic stress responsive genes. Each QTL region was aligned with the corresponding cotton consensus map (http://www.plantgenome.uga.edu/cotton/StartFrame.htm) based on conserved marker loci [15]. Markers that flanked the 99% confidence interval (2-LOD) of each QTL were used in this alignment. Consensus fragments were subjected to both FISH [18] and CrimeStatII [19] analysis to identify putative regions of synteny with Arabidopsis [16]. Gene ontologies were determined for the Arabidopsis genes showing correspondence with the cotton QTL regions.

EST Mapping.
Comparisons of expression data with mapped QTLs were carried out using a subset of stress responsive ESTs. This was done by integrating ESTs with gene expression and QTL data through a Consensus Map [17]. Initially, 25,118 ESTs from several diverse libraries (Table 1) were assembled into 15,784 TCs that represented 13,097 singletons and 2687 assemblies. Interestingly, only 539 unique cotton transcripts were found within these 10 stress-treated cDNA libraries.
The sequence of each TC was compared to the composite set of 3016 STSs on the Consensus Map [17]. The putative map location of 1,906 TCs was determined based on this homology. In all, 815 loci contain a significant level of homology to assign the putative map position of at least one TC. Because unique TCs showed homology to several regions of the same gene it was expected that mapped loci would contain multiple TCs and indeed this was observed at 462 (57%) loci.

Association of TCs and Differentially Expressed
Genes with QTLs. The correspondence of mapped TCs and QTLs revealed that 349 of the 1906 mapped TCs colocalized within the 99% confidence interval of at least a single QTL (Table 2). Putative functions (BLASTX) could be assigned to 243 of which 85 (35.0%) were annotated as genes involved in plant responses to abiotic stress (Table 3). By comparison, only 14.5% (160 out of 1,104) of the annotated ESTs mapped outside the QTL interval were classified as abiotic stress genes (Table 3). A simple binomial probability calculation of this degree of bias being observed if QTL and non-QTL regions are equally likely to contain stress genes-"85 or more stress genes from 243 annotated genes, " where = 0.145-yields a likelihood of only 7.99 × 10 −15 , suggesting that the QTL regions have a higher propensity to contain stress genes. The enrichment of stress-related ESTs that map to stress-related QTLs could not be explained by chance; thus this observation supports the hypothesis that stress responsive genes map to stress-associated QTLs at higher frequency than to non-QTL regions. The 85 TCs mapped to 33 STS loci and 29 stressrelated QTLs (Table 4(a)).
We have previously examined the drought stress transcriptome in cotton exposed to field capacity and waterlimited conditions [13]. Transcript profiling experiments in leaf and root tissues revealed 2106 stress-responsive transcripts, 879 classified as stress-induced, 1163 stress-repressed, and 64 showing reciprocal expression patterns. In this study, 158 genes (84 stress-induced and 74 stress-repressed) were mapped, of which 34 (14 induced, 17 repressed, and 3 reciprocal expression) colocalized within the 99% confidence interval of at least a single QTL (Tables 4(a) and 4(b)). Thirteen (13) showed homology with at least one TC annotated as a plant stress genes and mapped within a QTL region. This number is less than expected and likely due to the limitation in the number of stress-specific genes represented on the microarray.

Candidate Gene Selection.
Candidate genes were identified based on the merger of mapping data, putative function, and expression (microarray or RT-PCR). Three levels of candidate genes have been considered in this study. A schematic presentation of the candidate gene selection/classification process is depicted by a Venn diagram (Figure 1). Three categories including unique transcripts in stress-treated cDNA libraries (539), differential expression (2106), and colocalization with a QTL (349) defined "Level I" candidate genes ( Figure 1). The overlap of two or more Level I categories define "Level II" and "Level III" candidates genes. Levels I and II candidates with putative stress-related gene ontology contain a prime ( ) designation (Table 4). From these categories of possible candidate genes those which show homology to known stress-related genes, colocalized within stress-related QTLs, and/or were differentially expressed in response to drought stress were further selected. Based on this criterion, 44 genes were identified as possible candidates that may have influence on the associated QTLs (Tables 4(a) and 4(b)).

Cotton-Arabidopsis
Synteny. An appreciable degree of synteny and colinearity between cotton and Arabidopsis provides a means to employ genomics approaches to look for additional clues as to the identities of genes influencing the cotton plant's response to abiotic stress. Twenty-six cotton QTLs on Chrs. 2, 4, 5, 7, 8, 10, 12 and 13 were associated with 51 stress-related Arabidopsis genes (Table 5). Forty-eight (48) stress related Arabidopsis genes that could be putatively mapped within a QTL region. Four of these Arabidopsis genes were homologous to and fell on the same map location with four of the cotton candidate genes. These include genes that respond to drought, salt and cold stress, and abscisic acid stimulus and that function in the regulation of transcription.

Relationship among Functional and Structural Data.
Osmotic potential is an important indicator of plant water status. Two QTLs on Chrs. 1 and 11 influenced OP ( Table 2). Four additional OP QTLs on Chrs. 4, 7, 10, and 13 mapped to regions that contained other physiological (Chl-a and 13 C) and productivity (SC, HI, and BW) QTLs. These QTLs contain 136 TCs of which 35 have putative gene function and 9 were differently expressed ( Table 2). Thirty-six TCs representing 13 loci were classified as Level I , II, or II candidate genes (Table 4(a)). Carbon isotope ratio ( 13 C) has been used to assess differential responses to water-deficit stress. A total of 5 QTLs influenced 13 C (Table 2), of those, four (Chrs. 2, 3, 7, and 8) were associated with QTLs for physiological (Chl-a, Chl-b, and OP), productivity (HI), and fiber (FF) traits. A total of 114 TCs delineate these QTL regions. Twenty-one TCs representing 7 loci were classified as Level I or II candidate genes (Table 4(a)).
Plant productivity traits mapped to Chrs. 5, 6, 7, 10, 12, and 13. A total of 185 TCs mapped to these QTL regions. In many cases multiple productivity QTLs fell within the same genomic region. Three separate regions (Chrs. 7, 10, and 13) contain corresponding QTLs for SC, HI, and OP. Twenty-one loci, represented by 58 TCs, associated with these QTLs were classified as Level I , II, or II candidates (Table 4(a)). Fiber quality QTLs on Chrs. 2, 7, and 12 contained 10 TCs classified as Level I , II, or II candidate genes (Table 4(a)).

Discussion
We have mapped 1906 cotton TCs on a Consensus Map that represents the hypothetical ancestor diploid genome [16]. Forty-four candidate genes implicated in water-deficit stress response were identified by merging structural and functional data. The association of these candidate genes with QTLs that influence physiology, plant productivity, and fiber quality traits in cotton under drought stress conditions was investigated. We have used cotton-Arabidopsis comparative analysis to examine association of stress-related genes in Arabidopsis with the drought stress cotton QTLs. The Consensus Map depicts the inferred marker arrangement along the genome of the common ancestor that gave rise to the diploid progenitors of tetraploid cotton about 5-7 million years ago [16,20]. This resource sets the stage for exploring syntenic relationships and thus fosters study of correspondence between the cotton QTLs and genes from Arabidopsis.
Indeed, Gossypium and Arabidopsis are thought to have shared common ancestry about 83-86 million years ago [21], and cotton may be the best crop outside of the Brassicales in which to employ "translational genomics" from Arabidopsis.
The mapping of TCs revealed that 815 STS loci contain a significant level of homology to assign the putative map position of at least one TC. Because unique TCs showed homology to several regions of the same gene it was expected that mapped loci would contain multiple TCs and indeed this was observed at 462 (57%) loci. Gene duplication would compound this effect, considering that homoeologous loci in tetraploid cotton would map to a single locus on the Consensus Map, which was inferred to resemble the DNA marker arrangement of the hypothetical ancestor of the two subgenomes of tetraploid cotton [16]. Interestingly, 59% (1128) of the TCs mapped to multiple loci on the Consensus Map. This observation may support a growing body of evidence suggesting that an ancient gene duplication event (polyploidy) has shaped the genome organization of what is considered diploid cotton [15,16]. If this hypothesis is correct, then one would expect an elevated level of gene redundancy in the Consensus Map. However, we cannot exclude the possibility that multigene families may account for the association of the TCs to multiple loci.
The scientific merit of this research serves as a framework in which information can be combined to simplify a more complex problem. Absolutely the network of genes implicated in stress is numerous and complex and that subsequent experiment will be required to validate project finding. The mapping of TCs provides meaningful knowledge regarding this network of genes. Functional annotation could be assigned to 1347 (out of 1906) TCs and 243 (out of 349) that colocalized within the 99% confidence interval of at least a single QTL. An abiotic stress response annotation could be assigned to 245 (out of 1347) and 85 (out of 243) TCs from outside (non-QTL) or within a QTL interval, respectively. Several key questions can be answered with this knowledge, including that putative stress-responsive genes will map within QTLs associated with stress-related phenotypic variation more frequently than with other genomic regions not associated with these QTLs. Mapping stress responsive QTLs in cotton [5][6][7] revealed a shared network of QTL intervals that control many different traits in response to stress. Because the QTL intervals examined represent a small fraction of the genome size, the number of TCs (or percentage of total) that mapped outside verse within a QTL was not expected to be equal. However, the percent that maps within (69.6%) or outside (70.9%) a QTL with putative function is important to show an equivalent representation of annotated genes in both classed regions ( Table 2). The percent that maps within (35.0%) or outside (14.5%) a QTL with putative stress function is used to test our hypothesis (Table 2). If these numbers were equal or skewed (to intervals outside the QTLs) the research hypothesis would be rejected. However, the observations support the hypothesis and a simple binomial equation was used to calculate the probability of this degree of bias being observed if QTL and non-QTL regions are equally likely to contain stress genes. The very low likelihood (7.99 × 10 −15 )  of this observation strongly suggests that the QTL intervals have a higher propensity to contain stress genes. So as QTL mapping alludes to common regions of the genome that explain the phenotypic variation to a variety of traits, those regions also appear to contain a higher number of putative stress genes. So these results show there is value to examine the structural position of putative stress genes with QTLs to study the network of stress genes. Fifteen candidate genes (Levels I and II ) map to a single genomic region on Chr. 10 that contains QTLs for OP, CT, SC, and HI in response to drought stress (Tables 4(a) and 4(b)). This region is interesting because it harbors QTLs for physiological traits that influence productivity traits and contains candidate genes known to have a significant role in stress responses. In addition, this genomic region showed synteny with four Arabidopsis regions. Five Arabidopsis genes in these syntenic regions include genes that respond to salt stress, cold stress, and abscisic acid stimuli and genes that are involved in metabolic process of reactive oxygen species (Table 5). A strong relationship was found between QTLs for OP and those affecting SC, HI in multiple genomic regions [6]. Quantitative trait loci associated with lower OP and CT values were associated with increased productivity (SC and HI). These findings were further supported by significant phenotypic correlation. Osmotic adjustment has been shown to correlate with increased yield and dry matter production in various studies [22][23][24][25].
This suggests that OP plays a major role in influencing the productivity QTLs. The fifteen candidate genes associated with this region include genes that function in signal transduction (auxin-repressed protein, putative GTPbinding protein, and protein induced upon tuberization), transcription (transcription factor WRKY1, putative ethylene response factor, and homeobox-leucine zipper protein), and cell defense (putative thioredoxin and metallothionein-like protein) [26][27][28][29][30][31][32]. The gene expression analysis revealed that all candidate transcription factors were induced in cotton leaves under water-deficit stress. It is known that a complex network of transcription factors coordinates plant response to adverse environmental conditions [29]. The MYB proteins in Arabidopsis function as transcriptional activators in abscisic acid (ABA) inducible gene expression under water-deficit and salt stress [26]. Arabidopsis MYB protein (At4g09460) has synteny with this QTL region and is homologous to the cotton candidate gene TC 7534. Most of the droughtinducible genes studied are induced by ABA [26]. The other candidate genes like thioredoxin and metallothionein-like protein play a central role in the regulation of reactive oxygen intermediates and abiotic stress signaling in Arabidopsis [29,31]. This group of candidate genes may be involved in osmotic adjustment by osmotic-stress signaling leading to the expression of early response transcriptional activators, which then activate downstream stress tolerance effector genes [33] suggesting that the candidate genes associated with QTLs in this genomic region largely influence OP. Additionally, Arabidopsis genes which respond to osmotic stress, ABA, and other abiotic stresses showed association with the OP QTL on Chrs. 11 and 13.
Seven candidate genes (Levels I and II ) are associated with QTLs for SC, HI, and BW on Chr. 12. Included in this group are transcription factors associated with ABAmediated stomatal movement and plant water balance (AtGPA1), heat shock (Hsp20.1), and transcriptional activation (zinc finger-like protein) [29,34,35]. The Arabidopsis syntenic regions contain three genes involved in signal transduction and in the metabolic process of oxygen and reactive oxygen. Moreover, these productivity QTLs on Chr. 05 also have syntenic regions with Arabidopsis which contain a considerable number of known stress responsive genes.
Five candidate genes (Levels I and II ) on Chr. 11 are associated with water use efficiency ( 13 C). Transcription initiation factor TFIID is involved in the regulation of gene expression and adaptation to osmotic stress [36]. Two proline-rich proteins, known to be important in reducing stress injury in plants [37], were associated with this QTL. Saranga et al. [6] found association between 13 C and chlorophyll content (Chl-a and Chl-b) in two genomic regions (Chr. 22 and LGD05 of the tetraploid genome) that correspond to Chr. 8 on the Consensus Map. They found that QTL alleles associated with higher 13 C under water-deficit conditions coincided with lower chlorophyll content. This QTL region is associated with ATP synthase, an enzyme that catalyzes the synthesis of ATP during photosynthesis and respiration [38]. In the chloroplast, ATP synthase utilizes the free energy released by electron transport and assumes an import role in regulating adjustment of the photosynthetic system to varying environmental conditions [38]. As chlorophyll content and 13 C are associated with photosynthesis this candidate gene may have a role in influencing the QTLs for 13 C and chlorophyll content in this genomic region.
Candidate genes for fiber quality QTLs (Levels I , II and II ) on Chrs. 7 and 12 include epoxide hydrolase, a detoxification enzyme that removes reactive oxygen species during stress conditions [33], omega-3 fatty acid desaturase, a gene involved in cold stress tolerance [39], and a zinc finger-like protein, a gene suggested to play a role in reactive oxygen and abiotic stress signaling in Arabidopsis [29]. Signal transducer and transcription regulator genes in Arabidopsis have synteny with these QTLs. Two hypothetical proteins are also identified as candidate genes for these fiber traits, one induced in both root and leaf tissue and the other repressed in roots in our gene expression profiling. Since the microarray was developed mainly from fiber tissue these genes may be important in fiber development under water-deficit stress conditions.
Combining gene expression data, genetic mapping information, and physiological data is an important step towards understanding the genetics controlling the physiological responses that affect fiber production and quality under arid conditions. This strategy combines the use of a genomewide approach to identify and isolate key candidate genes to specific regions of the genome, with the full benefits of a rich history of phenotypic data accumulated in several studies. In this study, candidate genes that may influence water stress-related QTLs in cotton have been identified using this strategy. Synteny between cotton and Arabidopsis made it possible to identify additional genes involved in stress response. These candidates, in addition to genes from other studies, represent putative functions that are critical during water-deficit stress response in cotton but warrant further functional testing to determine if they or related pathways are directly responsible or could be employed as targets for the improvement of agronomically desired traits for cotton production.