Constructing Physical and Genomic Maps for Puccinia striiformis f. sp. tritici, the Wheat Stripe Rust Pathogen, by Comparing Its EST Sequences to the Genomic Sequence of P. graminis f. sp. tritici, the Wheat Stem Rust Pathogen

The wheat stripe rust fungus, Puccinia striiformis f. sp. tritici (Pst), does not have a known alternate host for sexual reproduction, which makes it impossible to study gene linkages through classic genetic and molecular mapping approaches. In this study, we compared 4,219 Pst expression sequence tags (ESTs) to the genomic sequence of P. graminis f. sp. tritici (Pgt), the wheat stem rust fungus, using BLAST searches. The percentages of homologous genes varied greatly among different Pst libraries with 54.51%, 51.21%, and 13.61% for the urediniospore, germinated urediniospore, and haustorial libraries, respectively, with an average of 33.92%. The 1,432 Pst genes with significant homology with Pgt sequences were grouped into physical groups corresponding to 237 Pgt supercontigs. The physical relationship was demonstrated by 12 pairs (57%), out of 21 selected Pst gene pairs, through PCR screening of a Pst BAC library. The results indicate that the Pgt genome sequence is useful in constructing Pst physical maps.


Introduction
Puccinia striiformis f. sp. tritici (Pst) is the causal agent of stripe rust, one of the most important diseases on wheat in many countries of the world [1,2]. The disease is a major constraint to wheat production and is a serious threat to the global food security. Although the disease is economically important, only limited studies on the genome and functional genomics of the fungal pathogen have been reported [3][4][5][6]. This is an obstacle to our understanding of the pathogen's evolution, especially changes of virulence that often overcome resistance in wheat cultivars [1,2,7,8].
Pst is an obligate biotrophic fungus that completely depends upon its host plants for continuing growth and reproduction. Techniques for transformation, gene knockout, and transient expression are still to be developed. This excludes the use of molecular techniques, such as restriction enzymemediated insertional mutagenesis and gene transformation.
Unlike P. graminis f. sp. tritici (Pgt, the wheat stem rust pathogen) and P. triticina (Pt, the wheat leaf rust pathogen), Pst is a microcyclic rust fungus and has only three spore stages, urediniospore, teliospore, and basidiospore, and does not have known pycniospore and aeciospore stages [1,2]. Because of the lack of the pycnial sexual stage and alternate host for sexual reproduction, it is impossible to study Pst genes through a classic genetic approach and map-based cloning. Thus, gene organization and physical relationships could not be studied for Pst using the molecular mapping approach.
A physical map is useful for studying genome structures, determining gene organization, identifying important genes, and comparing related species for understanding evolutionary relationships. The discovery of conserved chromosomal segments between humans and animals in 1984 [9] led later to the construction of physical maps for human and mouse [10][11][12][13]. Interestingly, comparative gene mapping reveals that chicken, a nonmammalian vertebrate, has conserved genome sequence synteny with humans [14,15]. Comparative genomic approaches have also been widely used to study related species in plants [16][17][18][19] and fungi [20][21][22][23]. These studies demonstrate that comparative genomic analysis is a powerful approach for studying genomes and genes in organisms that are hard to study using traditional genetic approaches.
Recently, several genetic libraries for Pst have become available, including a BAC library [3], a full-length cDNA library from urediniospores [4], germinated urediniospore or germ-tube EST library [5], and a haustorial EST library [6]. A total of more than 15,000 ESTs were sequenced, from which 4,219 unisequences were characterized and their putative functions were identified through sequence comparison with other fungal genes in GenBank databases. However, the physical and genetic relationships of these genes have not been determined. Since Pst genome sequencing has just been started, here we have used the available Pgt genome sequence (http://www.broadinstitute.org/annotation/genome/puccinia group/MultiHome.html) for constructing physical maps for Pst genes. The study was based on the assumption that Pst and Pgt share considerable sequence homology and genome synteny. The specific objectives of this study were to (1) determine the homology of Pst EST unisequences to Pgt genomic sequences, (2) construct physical groups for the Pst genes using the Pgt sequences as the references, and (3) verify the physical relationships of selected Pst genes using PCR screening of the Pst BAC library. Although much of the physical relationship needs to be verified by whole-genome sequence, the physical maps generated in this study should provide a basic framework for assisting Pst sequence assembling and gene annotation with Pgt sequences and also should be useful for localizing functional genes, positional cloning of full-length genes, and generating information about exons and introns for Pst genes.

Data.
Genome-based EST mapping requires the genome map and transcript sequences. The three Pst cDNA libraries were generated from three different growing stages, urediniospores (Ured), germinated urediniospores (Ger-mUred)/germ tubes, and haustoria (Haus). The Ured and Haus cDNA libraries were constructed from mRNA of PST-78, a typical US race [4,6], and the GermUred library was from mRNA of CYR32, a typical Chinese race [5]. A total of 4,219 unisequences, which were obtained from more than 15,000 clones sequenced from the three libraries after removing sequences of poor quality (<100 bp inserts) and repetitions and forming contigs (4, 5, 6, Chen and associates, unpublished), were used in this study for comparing with the Pgt genomic sequence. The Pgt genome sequence was downloaded from the NCBI Genome Project Puccinia graminis Database (http://www.broad.mit.edu/annotation/genome/puccinia graminis), consisting of 392 genome supercontigs and 4,775 contigs.

Mapping Pst EST Sequences against the Pgt Genome
Sequence. All Pst ESTs were mapped against the Pgt genome using the BLASTN program [24]. We used the high-speed service computer system of the Washington State University Bioinformatics Center for BLAST and homology searches. The Pgt genome and Pst EST sequences were transferred to a server computer using the SSH (Secure Shell) software as fasta format files. Sequences of low homologous alignment were filtered out using the e value of 1.00E-5 as a cut point. The alignable ESTs were assembled according to the 4,775 contigs in the 392 supercontigs of the Pgt genome sequence. Detailed alignment information was edited in an Excel file. To see the positions of the Pst ESTs corresponding to the Pgt genome, physical maps were constructed. Physical maps corresponding to Pgt supercontigs illustrated the physical position order of the genes, length of each EST, and the distances between genes. The genes localized in a single contig were marked using a sign of "|" and the alignment start and end positions of the Pgt genome were given in parentheses.
Because the ESTs were transcribed from the genome and the introns were spliced after alternative splicing, the ESTs represent the exon sequences. Therefore, it was important that we were able to get the information about the alternative gene splicing and the intron number from the maps. If a Pst EST sequence was aligned to a location in the Pgt genome as a series of fragments, these genes were likely to show alternative splicing, and the number of exons was marked after the parentheses on the map. All sketch maps of Pst genes are shown in file 1 in Supplementary Material available online at doi: 10.1155/2009/302620.

Verification of Physical Relationships of Selected Pst Genes.
Although Pgt is most closely related to Pst among the fungi whose whole genome has been sequenced so far, their gene sequences and locations could be different for some genes. To validate the veracity of the alignment, we selected 42 genes as 21 pairs. The sequences of the 42 genes were used to design primers. The 42 primer pairs (Table 1) were used to amplify BAC clones. If a single BAC clone was amplified by primers of both genes in a pair, the two genes were concluded to be physically colocated. Because the BAC library has an average insert size of 50 Kb [4], the two genes in each pair were selected based on their distance in between smaller than 50 Kb. For each pair of genes, the primers for one of the genes were used to amplify the entire BAC library of 43,000 clones [3] using a three-dimensional approach as described by Ling and Chen [25]. To be more efficient, the primers for the second gene in the pair were used to amplify only the positive BAC clones from the screening. To speed up the PCR screening, two pairs of primers for two genes with similar annealing temperatures were used in a multiplex PCR amplification.
Multiplex PCR was performed in a GeneAmp PCR System 9700 thermo-cycler. A 20 μL reaction mixture contained

Physical
Groups. The 1,432 Pst genes were aligned to 237 physical groups corresponding to 237 Pgt supercontigs (Supplementary file 1). As an example, Figure 1 shows Pst genes aligned to Pgt supercontig 1. The number of genes for each supercontig from each Pst cDNA library is shown in Table 3. The 237 physical groups ranged from 2,878 to 3,081,398 bp with most of the groups ranging from 5.0 Kb to 2.0 Mb (Figure 2(a)). Overall, the 1,432 genes matched 787,413 bp and spanned over 86.55 Mb of the Pgt genomic sequences. Because the majority of the 1,432 unigenes were aligned to more than one sequence locus, a total of 4,604 gene loci were obtained ( Table 3). The fold of multiple loci per unique gene was unbalanced among the three libraries with 1.30 for the GermUred library, 1.53 for the Ured library, and 10.58 for the Haus library. The number of genes varied from 1 to 153, excluding "Supercontig 392", which contained unassembled sequences, with an average of 19 genes per supercontig (Table 3, Figure 2(b)). Over 70% of supercontigs contained 20 or fewer genes that showed homology to Pst EST sequences while only 4 supercontigs (Supercontigs 1, 2, 3, and 17) had more than 100 genes. The genes from the three Pst libraries were unevenly aligned to the Pgt genome. A total of 712 unisequences were aligned to 134 supercontigs with an average of 5.3 genes per supercontig; 441 unisequences were aligned to 121 supercontigs with an average of 3.6 genes per supercontig; the 279 supercontigs were aligned to 213 supercontigs with an average of 1.3 genes per supercontig. The gene density (the number of base pairs per gene) ranged from 1,020 to 209,493 bp with an average of 18,799 bp (Table 3, Figure 2(c)). The majority of the supercontigs had a gene in a genomic region smaller than 30 Kb, which may be considered to be a relatively gene-rich region. In contrast, a few supercontigs had a gene in genome region larger than 60-Kb, which may be considered as relatively gene-poor region. These results indicated that genes expressed in different Pst growth stages tended to be clustered in different regions of the genome.

Validation of Physical Relationships of Selected Pst Genes.
To validate the physical relationships of Pst genes, a total of 84 forward and reverse primers were designed for 42 genes to form 21 pairs (Table 1). The genes in each pair were selected based on their proximity within 50 Kb in the physic map. Clones that were positively amplified with the first pair of primers resulted from the three-dimensional pooling screening were amplified with the second pair of primers, as illustrated in Figure 3. Of the 21 pairs of genes tested, 12 pairs (57%) were successfully identified in same BAC clones. The results clearly showed that these genes in pairs were truly colocated in the Pst genome.
In this study, we only tested 42 Pst genes in 21 pairs in the PCR screening of the BAC library. In contrast to the 12 pairs that were demonstrated in the same BAC clones, positive results were not obtained for 9 of the 21 pairs. However, the unsuccessful amplification by the second genes in the 9 pairs does not exclude the possibility of physical relationships for the genes in each of these pairs. As the inserts of the BAC clones were relatively short, 50 Kb in average [3], the clones might be too small to harbor both genes in a pair. It is also possible that the Pst genes in each pair may have a longer distance than the reference distance in the Pgt genome, but they may still be linked to each other.
The Pst genes used in this study were from three libraries. The genes from the Ured library gave the highest percentage of genes homologous to Pgt and the genes from the Haus library gave the lowest percentage of homologous genes. The GermUred clones had similar percentage of Pgt-homologous genes to the Ured library, although the two libraries were made from different isolates while the Haus library was made with the same isolate as the Ured library [4][5][6]. The low proportion of the Pst genes from the Haus library similar to the Pgt sequences was surprising as we thought that two fungal species in the same genus should have higher homology than human and mouse that are in very different taxa [9]. Although this phenomenon needs more studies, we have learned from other rust fungi that genes expressed in haustoria tend to be more species specific [26,27]. Comparisons of Pst genes expressed in different growth stages with the Pgt sequences tell us that genes expressed in urediniospore are more conserved among different Puccinia species while those expressed in haustoria are more unique. Such genetic differences may be related to their different requirements in temperature for infection of the same wheat host crop.
It is interesting that the smallest number of unique genes (279) from the Haus library produced the highest number (2,952) of genomic loci along the Pgt genome among the three libraries. The high fold (10.58x) of gene copies may compensate for the low number of homologous genes from haustoria, which may make the overall homology of Pst and Pgt genome sequences reasonably high. The genomic loci were aligned to more supercontigs than the genes from the Ured and GermUred libraries. These results indicate that haustorially expressed genes tend to have multiple copies and spread along the Puccinia genome. This phenomenon needs to be further studied using the whole genome sequence of Pst.
Although much of the physical relationship is still hypothetical and needs to be verified by the whole genome sequence of Pst, the physical groups constructed in this study can serve as references and starting points in assisting sequence assembling and gene annotation. A more detailed dissection of gene sequences, organization, structures, and clusters may allow us to pick genome regions and gene clusters to study their functions and developing molecular markers to tag virulence groups and characterize Pst populations.
In this study, we found that some ESTs could be matched to more than one location. Also, an alignment consisted of multiple exons while others do not have introns. We included the intronless sequences in the physical maps. Intron-less sequences as pseudogenes have coincident nucleotide sequences with coding protein genes ubiquitously existing in the eukaryotes genome [28,29]. Although pseudogenes may be functionless DNA fragments in the genome, they have evolved from mRNA reverse transcription and then reset in the genome. So, pseudogenes do not have introns and promoters but have poly(A) sequences. For a full-scale gene mapping, it represents the real gene transcription and sequence existence. Most of our EST sequences are not fulllength and only have partial information of genes. This might be an explanation why a considerable number of ESTs were aligned to regions of the Pgt genome without introns.
We found that many of the Pst ESTs that matched to Pgt genomic sequences were shorter than 100 bp. These short sequences may be exons, whose lengths can vary greatly. Most vertebrate exons are between 50 and 400 bp long [30]. Using the complementary sequence feature method in humans, Arabidopsis, Cryptococcus, and Plasmodium, Saeys et al. [31] reported that one-third of all exons were smaller than 100 bp. Gudlaugsdottir et al. [32] reported significant variation in exon length for human and fission yeast ranging from 1 to thousands of base pairs. Because exon sizes can vary from a few base pairs to thousands of base pairs, we reserved even the segments smaller than 50 base pairs, which may have saved some unknown information in alignment and make the information available for the future Pst genome research. The number of exons in a gene may indicate its stability or variability, which may allow us to choose genes for studying various aspects of pathogen biology. Genes with only one exon may be chosen to study the genetic relationships at a higher taxonomic level, such as species and formae speciales, and those with multiple exons may be used to study genetic differences among isolates within a forma specialis. Genes with multiple exons may be better candidates for studying traits like virulence and adaptation to different environments as these traits have more variations.
In this study, we produced preliminary physical maps for Pst genes. The 4,604 genomic loci of 1,432 genes were placed on the physical map account about 8% of potential genes, if we assume that Pst and Pgt have a similar number of genes. Because we used only unique genes, some genes belonging to large families could be located on multiple genome sites. In the future, this physical map will be verified and ultimately be improved by the complete set of the Pst genes and connected with nontranscribed sequences. The physical groups should provide insights into gene organization, identification of functionally related genes, positional cloning of full-length genes, information on exons and introns, and assist in sequence assembly and gene annotation for the Pst wholegenome sequencing.