Mapping of Micro-Tom BAC-End Sequences to the Reference Tomato Genome Reveals Possible Genome Rearrangements and Polymorphisms

A total of 93,682 BAC-end sequences (BESs) were generated from a dwarf model tomato, cv. Micro-Tom. After removing repetitive sequences, the BESs were similarity searched against the reference tomato genome of a standard cultivar, “Heinz 1706.” By referring to the “Heinz 1706” physical map and by eliminating redundant or nonsignificant hits, 28,804 “unique pair ends” and 8,263 “unique ends” were selected to construct hypothetical BAC contigs. The total physical length of the BAC contigs was 495, 833, 423 bp, covering 65.3% of the entire genome. The average coverage of euchromatin and heterochromatin was 58.9% and 67.3%, respectively. From this analysis, two possible genome rearrangements were identified: one in chromosome 2 (inversion) and the other in chromosome 3 (inversion and translocation). Polymorphisms (SNPs and Indels) between the two cultivars were identified from the BLAST alignments. As a result, 171,792 polymorphisms were mapped on 12 chromosomes. Among these, 30,930 polymorphisms were found in euchromatin (1 per 3,565 bp) and 140,862 were found in heterochromatin (1 per 2,737 bp). The average polymorphism density in the genome was 1 polymorphism per 2,886 bp. To facilitate the use of these data in Micro-Tom research, the BAC contig and polymorphism information are available in the TOMATOMICS database.


Introduction
Tomato (Solanum lycopersicum) is one of the most important vegetable crops cultivated worldwide. Tomato has a diploid (2n = 2x = 24) and relatively compact genome of approximately 950 Mb [1]. Recently, its genome has been completely sequenced by the international genome sequencing consortium [2].
Genetic linkage maps of tomato have been created by crossing cultivated tomato (S. lycopersicum) with several wild relatives, S. pennellii, S. pimpinellifolium, S. cheesmaniae, S. neorickii, S. chmielewskii, S. habrochaites, and S. peruvianum [3]. Introgression lines generated from a cross between S. lycopersicum and S. pennellii have contributed to the isolation of important loci and quantitative trait loci (QTLs) related to fruit size by utilizing DNA markers on the Tomato-EXPEN 2000 genetic map [4][5][6][7][8][9]. Such interspecies genetic mapping is effective because the divergent genomes provide many polymorphic DNA markers. In contrast, intraspecies mapping is less popular in tomato because of the low genetic diversity within cultivated tomatoes that has resulted from the domestication process and subsequent modern breeding [10]. Recently, we developed SNP, simple sequence repeat (SSR), and intronic polymorphic markers using publicly available EST information and BAC-end sequences (BESs) derived from "Heinz 1706," a standard line for tomato genomics [11,12], and applied these markers to create linkage maps between Micro-Tom and either Ailsa Craig, a greenhouse tomato, or M82, a processing tomato, by mapping 1,137 markers [12]. Micro-Tom, a dwarf cultivar, is regarded as a model cultivar for functional genomics of tomato because of several characteristics, including small size (20 cm plant height), short life cycle (3 months), existence of indoor cultivation protocols under normal fluorescent conditions, and highefficiency transformation methods that have been developed for this line [13][14][15]. The dwarf phenotype of Micro-Tom is the result of mutations in at least two major recessive loci. dwarf (d) encodes a cytochrome P450 protein, which functions in the brassinosteroid biosynthesis pathway [16]. Another locus, miniature (mnt), is suggested to associate with gibberellin (GA) signaling without affecting GA metabolism, but the causal gene has not been identified to date [17]. In Japan, Micro-Tom genomics resources have been extensively accumulated, mainly in the framework of the National BioResource Project (NBRP) (http://tomato .nbrp.jp/indexEn.html). Large-scale ethyl methanesulfonate (EMS) and gamma-ray-mutagenized populations have been created, and visible phenotype data have been accumulated [18][19][20]. The availability of Micro-Tom genome sequence data will accelerate the mapping of mutant alleles.
BAC-end sequencing has been performed in the tomato standard line "Heinz 1706" genome project to order BAC clones along the chromosomes [21]. Currently, about 90,000 BESs are available at the Sol Genomics Network (SGN, http://solgenomics.net/). BAC-end sequencing has been conducted for other crop species. In the rice indica cultivar "Kasalath," 78,427 BESs were generated from 47,194 clones and mapped onto the "Nipponbare" reference genome. As a result, 12,170 paired BESs were mapped that covered 80% of the rice genome [22]. Recently, BAC-end sequencing has been performed in crop plants with higher genome complexity. BESs from a commercial sugarcane variety, an interspecific hybrid with complex ploidy, were generated to analyze microsynteny between sugarcane and sorghum [23]. In wheat, which has a complex hexaploid genome, the short arm of chromosome 3A was flow sorted to make a BAC library, and chromosome arm-specific BESs were generated for DNA marker development [24]. In switchgrass, more than 50,000 SSRs were identified from 330,000 BESs, and this enabled detailed analysis on the evolution of this species [25]. A low level of genetic variation has been observed for cultivated peanuts. Polymorphic SSRs were accumulated from the BESs and successfully used in the construction of a genetic map [26]. BAC-end sequencing can be useful as a resource for performing comparative genomic studies through mapping of the sequences to a reference genome and by facilitating the development of polymorphic DNA markers.
In the present study, we generated 93,682 single-pass end sequences from a Micro-Tom BAC library. To compare the structures between the reference tomato "Heinz 1706" genome, mapping of unique ends was performed, and possible genome rearrangements and polymorphisms were identified.

Micro-Tom BAC Library Construction.
Micro-Tom (TOMJPF00001) seeds were obtained from the NBRP (MEXT, Japan) and sent to the Clemson University Genomics Institute (CUGI) for BAC library construction. The genomic DNA was partially digested, and fragments were cloned into the Hind III site of pIndigoBAC536. A total of 55,296 clones in Escherichia coli DH10B cells were arrayed in 144 384-well plates.

End Sequencing of Micro-Tom BAC Clones.
To analyze BESs, the BAC DNAs were amplified using a TempliPhi largeconstruction kit (GE Healthcare, UK), and the end sequences were analyzed according to the Sanger method, using a cycle sequencing kit (Big Dye-terminator kit, Applied Biosystems, USA) with a type 3730xl DNA sequencer (Applied Biosystems). The resulting sequence reads were quality checked with PHRED [27,28], allowing the identification and removal of low-quality (QV < 20) sequences. The 93,682 reads clearing the quality criteria were submitted to DDBJ/ GenBank with accession numbers FT227487-FT321168.

Mapping to the Reference Genome and
Analyses. BES reads were subjected to similarity search using the BLASTN program [29,30]. To isolate unique sequences from repetitive ones, 93,682 BESs were searched against the repeat database in ITAG2.3 (http://solgenomics.net/) using a cutoff E-value of less than 10 −50 . The remaining sequences were searched against the published version of the "Heinz 1706" genome (SL2.40), which was accessed from the SGN database (http://solgenomics.net/). From all of the BLAST alignments, BESs were extracted according to the following criteria, suggested in a previous report [22]: (1) sequence identity > 90% and alignment coverage > 50%; (2) mapped positions of each pair of ends < 200 kb apart in the same chromosome; (3) direction of each paired end is correct; (4) BLASTN E < 10 −100 ; (5) a minimum of one hit for one of the paired ends; (6) no redundant chromosomal locations. Sequence polymorphisms (SNPs and Indels) between Micro-Tom and "Heinz 1706" were predicted based on the BLASTN alignment. Since we did not allow a gap exceeding 27 bases, only Indels up to 26 bases in length were counted.

Possible Genome Rearrangements.
To assess the occurrence of genome rearrangements, Micro-Tom and the reference tomato "Heinz 1706" were compared. Possible inversions, translocations, and insertions were considered. To eliminate an artificial effect (e.g., chimeric BAC clones), only regions covered by more than two BAC clones were selected. After removing regions that had cleared the criteria for extraction (see Section 2 ) but were either shown to be multicopy by manual evaluation of the BLAST results or displayed similarity to transposable elements, we obtained two cases of a possible rearrangement between Micro-Tom and "Heinz 1706" (Table 2). On chromosome 2, a possible inversion was detected. The size of this inversion could be 20-220 kb  International Journal of Plant Genomics 5 depending on which end of the BAC clone is inversed. Translocation and inversion were observed on chromosome 3. For each of two BAC clones (MTBAC041L05 and MTBAC077O14), one of the ends was mapped to 6,601 kb of chromosome 3, while the other end was mapped to 55,665 kb, more than 49 megabases apart. In addition, both ends were mapped on the minus strand.

Polymorphisms between Micro-Tom and the Reference
Tomato. SNPs and Indels between Micro-Tom and "Heinz 1706" were identified. Among the SNPs and Indels found, 171,792 were mapped on 12 chromosomes, and 2,635 were mapped on pseudomolecules with no chromosomal information (SL2.40ch00 of the tomato whole-genome shotgun chromosomes) ( Table 3 and Supplementary Table 2, see details at TOMATOMICS). According to these results, among the mapped SNPs and Indels, a total of 30,930 polymorphisms were found in the euchromatin (1 out of 3,565 bp), and 140,862 were found in the heterochromatin (1 out of 2,737 bp). The average polymorphism density in the genome was 1 polymorphism per 2,886 bp. Transversiontype SNPs were observed in 83,262 cases, while 60,631 were transition-type SNPs. Among the 30,534 Indels, single-base insertions (on the SL2.40 version of the tomato wholegenome shotgun chromosomes) were observed in 10,740 cases, and single-base deletions were seen in 17,064 cases. The remainder were larger Indels, ranging from 2 to 26 bp (Supplementary Table 2). Classification of polymorphisms regarding genic or intergenic regions is shown in Table 4.

Discussion
By selecting unique end sequences from 93,682 reads, 28,804 paired ends (14,402 pairs) and 8,263 unpaired ends were obtained. The majority of the nonselected sequences (43,598) were derived from repetitive regions. For the rest, 10,943 had redundant hits to the "Heinz 1706" genome, possibly including repetitive sequences that were not represented in the repeat database in ITAG2.3 (http://solgenomics.net/), 2,015 showed weak similarity, and 59 showed no similarity ( Figure 2). Considering that the genome has been previously estimated to be composed of 25% gene-rich euchromatin [31,32], BES selection in this study (39.6%, (28,804 + 8,263)/93,682)) could have eliminated repetitive regions to a moderate degree. We identified 59 reads showing no significant similarity to the "Heinz-1706" genome. Micro-Tom was bred by crossing the home-gardening cultivars, Florida Basket and Ohio 4013-3. The pedigree of Ohio 4013-3 suggested that a wild relative species was used in the breeding history [18,33]. Such introgressed segments may lead to the introduction of genomic regions not harbored by "Heinz 1706." The Micro-Tom genome is now being sequenced (draft sequence data available at DDBJ with the accession number DRA000311), and mapping of orphan BESs to the de novo assembly of Micro-Tom genome data will help to clarify this question.
The total physical length of Micro-Tom BAC contigs was 495,833,423 bp, which covers approximately 65.3% of the DNA from all 12 chromosomes. In the Kasalath rice BES analysis, chromosomal coverage in relation to the reference Nipponbare pseudomolecule was about 80%, despite the lower number (78,427) of analyzed BESs [22]. Because we used the same criteria for repetitive sequence selection (E < 10 −50 ), the discrepancy between the two studies might be due to the larger genome size of tomato (950 Mb) compared with rice (430 Mb) [34]. Our Micro-Tom BAC coverage is reasonable, taking into account the scale of the BAC library used.
Micro-Tom has been considered as a model cultivar to promote functional genomics studies of tomato by taking advantage of its characteristics. Currently, many tools and platforms have been developed, and some of these are already available to the research community. The present study characterized the overall polymorphisms found between Micro-Tom BESs and the reference tomato "Heinz 1706" genome. In addition, two possible genome rearrangement events, on chromosome 2 and chromosome 3, were observed ( Table 2). In the case of translocation and inversion on chromosome 3, a gene annotated as reverse transcriptase was found in the flanking region (Solyc03g104840.1). We speculate that this region was translocated by the activity of a retrotransposon, as it was in the case of SUN. Enhanced expression of SUN caused by a gene duplication event mediated by the retrotransposon Rider led to an elongated fruit shape [35]. In the future, we plan to sequence the entire BAC and expect that this will help us to characterize these events in more detail. In the case of the other rearrangement possibility, on chromosome 2, we could not find any trace of a retrotransposon. Since these rearrangements took place in euchromatin, which is rich in genes, these regions could represent an interesting target to investigate their possible effects on phenotypic variation between Micro-Tom and the reference tomato.
We mapped the polymorphisms and depicted them, alongside maps showing covered regions and gaps, in Figure 3. On chromosomes 2, 5, and 11, polymorphisms seemed to be concentrated in the heterochromatic regions; however, this tendency was not clearly observed in the other chromosomes. For the other regions, the polymorphism discovery rate seemed to be somehow correlated with the BAC coverage. Although our analysis indicated little possibility of large-scale genome rearrangement between Micro-Tom and "Heinz 1706" (Table 2), this uneven polymorphism distribution suggests the existence of highly divergent chromosomal regions. The gaps in the hypothetical Micro-Tom BAC contigs could have resulted from low coverage of the BAC library, but the occurrence of chromosomal segments specific to either Micro-Tom or "Heinz 1706" is also possible. The ongoing Micro-Tom genome sequencing and de novo assembly of the Micro-Tom genome will clarify the genome structure in detail, enabling a more solid assessment of the differences between Micro-Tom and "Heinz 1706." We had previously developed SNP markers among several cultivated tomatoes [12]. By selecting SNPs through in silico analysis using public EST information and previously developed SSR markers, 1,137 markers were obtained and successfully mapped on linkage groups between Micro-Tom 6 International Journal of Plant Genomics   Previously, large-scale Micro-Tom full-length cDNA analysis and comparison of exon regions with those on the "Heinz 1706" genome revealed a mean sequence mismatch of 0.061% (1/1,640 bp) [36]. One possible explanation for the difference is the quality of the reference "Heinz 1706" genome sequence used in the two studies. We used the published version of the "Heinz 1706" genome sequence, which has higher coverage, giving rise to greater accuracy, although our selection may still contain sequence errors because BESs are single-pass sequences.
The information provided in this study will be useful in the development of DNA markers between Micro-Tom and cultivated tomatoes, which will facilitate a better understanding of the physiological and metabolic differences between them. It would also be useful in the genetic mapping of Micro-Tom mutants through the generation of F 2 segregating populations.