The Korea Brassica Genome Project: a Glimpse of the Brassica Genome Based on Comparative Genome Analysis With Arabidopsis

A complete genome sequence provides unlimited information in the sequenced organism as well as in related taxa. According to the guidance of the Multinational Brassica Genome Project (MBGP), the Korea Brassica Genome Project (KBGP) is sequencing chromosome 1 (cytogenetically oriented chromosome #1) of Brassica rapa. We have selected 48 seed BACs on chromosome 1 using EST genetic markers and FISH analyses. Among them, 30 BAC clones have been sequenced and 18 are on the way. Comparative genome analyses of the EST sequences and sequenced BAC clones from Brassica chromosome 1 revealed their homeologous partner regions on the Arabidopsis genome and a syntenic comparative map between Brassica chromosome 1 and Arabidopsis chromosomes. In silico chromosome walking and clone validation have been successfully applied to extending sequence contigs based on the comparative map and BAC end sequences. In addition, we have defined the (peri)centromeric heterochromatin blocks with centromeric tandem repeats, rDNA and centromeric retrotransposons. In-depth sequence analyses of five homeologous BAC clones and an Arabidopsis chromosomal region reveal overall co-linearity, with 82% sequence similarity. The data indicate that the Brassica genome has undergone triplication and subsequent gene losses after the divergence of Arabidopsis and Brassica. Based on in-depth comparative genome analyses, we propose a comparative genomics approach for conquering the Brassica genome. In 2005 we intend to construct an integrated physical map, including sequence information from 500 BAC clones and integration of fingerprinting data and end sequence data of more than 100 000 BAC clones. The sequences have been submitted to GenBank with accession numbers: 10 204 BAC ends of the KBrH library (CW978640–CW988843); KBrH138P04, AC155338; KBrH117N09, AC155337; KBrH097M21, AC155348; KBrH093K03, AC155347; KBrH081N08, AC155346; KBrH080L24, AC155345; KBrH077A05, AC155343; KBrH020D15, AC155340; KBrH015H17, AC155339; KBrH001H24, AC155335; KBrH080A08, AC155344; KBrH004D11, AC155341; KBrH117M18, AC146875; KBrH052O08, AC155342.


Introduction
The Arabidopsis genome has been sequenced completely by an international consortium (the Arabidopsis Genome Initiative, 2000). Arabidopsis and Brassica diverged 14.5-20.4 million years ago from a common ancestor (Bowers et al., 2003). Comparative genetic mapping has revealed co-linear chromosome segments (Kowalski et al., 1994;Lagercrantz et al., 1996;Paterson et al., 2000Paterson et al., , 2001Schmidt et al., 2001) in the family Brassicaceae and linkage arrangements between Arabidopsis and B. oleracea (Lukens et al., 2003). The genomes of Brassica species have duplicated, perhaps triplicated, counterparts of the corresponding homeologous segments of Arabidopsis (O'Neill and Bancroft, 2000;Rana et al., 2004).
Brassica is one of the core genera in the family Brassicaceae. Six Brassica species are cultivated worldwide; three diploids: B. rapa (AA, 2n = 20), B. nigra (BB, 2n = 16) and B. oleracea (CC, 2n = 18), and three amphidiploids (allotetraploids): B. juncea (AABB, 2n = 36), B. napus (AACC, 2n = 38) and B. carinata (BBCC, 2n = 34) (U. 1935). The species B. rapa (syn. campestris), with 529 Mb per haploid genome equivalent (Johnston et al., 2005), was prioritized for sequencing by a multinational collaboration. The Multinational Brassica Genome Project (MBGP) and Brassica rapa Genome Sequencing Project (BrGSP) are aiming to completely sequence the genome of Brassica rapa inbred line 'Chiifu' (http://www.brassicagenome.org; http://www.brassica-rapa.org). Korea launched the Korea Brassica Genome Project (KBGP) for complete sequencing of the cytogenetic chromosome 1 using BAC-by-BAC shotgun sequencing. In-depth comparative sequence analyses of the sequenced B. rapa BAC clones revealed overall co-linearity with a homeologous region of the Arabidopsis genome. Comparative sequence analyses suggest that we can use the Arabidopsis genome as a backbone for in silico clone validation of seed BAC clones and physical mapping as in the report of Love et al., 2004. Here we propose an efficient clone validation method for selecting chromosome-specific seed BACs using comparative physical mapping and BAC end sequences. In 2005, KBGP aims to sequence 500 BAC clones that correspond to the majority of Arabidopsis euchromatin regions. The 500 BACs will be distributed and mapped on B. rapa chromosomes through sequence tagged site (STS) or simple sequence repeat (SSR) markers. BAC end sequences of 100 000 BACs (STC) and fingerprinting polymorphism-based BAC contigs (FPC) will be available soon. Hence, the sequence and map information of 500 BACs can be integrated with STCs and FPCs, resulting in an integrated physical map. The integrated physical map will provide a high resolution genome wide comparative map with Arabidopsis and will be supplied to MBGP to accelerate the Brassica genome sequencing.

DNA sequencing
Shotgun sequencing libraries were constructed in pCUGIblu31 for average insert size of 3 kb (Kim et al., 2004;Yang et al., 2004;Yang et al., 2005). BigDye terminators chemistry v3.0 (ABI) was used for the reactions. The sequences were analysed using ABI3730 automatic DNA sequencers (ABI). Base-calling was performed automatically using phred, and vector sequences were removed by CROSS MATCH . High quality, vector-trimmed sequences were thus used for the sequence assembly of each BAC clone, using phrap and consed (Gordon et al., 1998).

Fluorescence in situ hybridization
Our FISH protocol was adapted from Lim et al., (2001Lim et al., ( , 2005a with minor modifications. FISH signals were pseudo-coloured and further improved for optimal brightness and contrast with Adobe Photoshop image processing software.

Results and discussion
Overview of Brassica rapa genome structure A genetic map of Brassica rapa, using segregating doubled haploid lines of Chiifu and Kenshin, covering 1046 cM with 494 markers on 10 linkage groups, was constructed with 895 DNA markers, AFLP, PCR-RFLP, ESTP, CAPS and SSR (http://www.brassicagenome.org). We have constructed another EST-RFLP genetic map of B. rapa using 478 tissue-specific cDNA clones consisting of 176 cDNAs from immature flowers, 252 cDNAs from anthers and 50 from dark-grown seedlings of B. rapa ssp. pekinensis cv. Jangwon. This molecular map covered 3412 cM on 10 linkage groups. Aligning RFLP marker sequences on the counterpart Arabidopsis chromosomes shows syntenic co-linearity, resulting in a highly informative comparative genetic map (Kim, 2001). The karyotypes of B. rapa chromosomes were studied previously (Fukui et al., 1998;Snowdon et al., 2002;Koo et al, 2004). We further characterized chromosomes in detail using fluorescence in situ hybridization (FISH) using repetitive DNAs, such as 45S rDNA, 5S rDNA, centromeric repeats (CentBr) and centromere-specific retrotransposons (Lim et al., 2005a). The cytogenetic chromosomes were integrated with genetic maps by painting with chromosome-specific BAC clones identified by unique EST clones from each linkage group (LG1-LG10) (Lim et al., 2005b). The cytogenetic chromosome numbers, our linkage groups (LG1-LG10) and the international standard linkage numbers (R1-R10) (Lombard and Delourme, 2001) will be integrated soon.
We have sequenced four BAC clones that form the counterpart of an Arabidopsis chromosomal region (chromosome 5: 3.1-3.2 Mb) containing flowering locus C (FLC). Comparisons of the sequenced Brassica BAC clones with the homeologous regions of Arabidopsis showed overall colinearity with 81% sequence similarity. The average sequence similarity between Brassica BACs is 82% with exceptionally high similarity (97%) of two clones, 117M18 and 52O08, representing two regions that have recently been duplicated. The colinear 125 kb Arabidopsis sequence was reduced by up to 40% by deletions of DNA segments in Brassica BAC clones (Table 1). Among 36 genes in the 125 kb of Arabidopsis sequence, only 24, 17, 13, and 13 homologues remained in the common sequence of each BAC clone, 80A08, 4D11, 52O08 and 117M18, respectively. Only four genes remain in all four BAC clones, with 77-96% similarity in amino acid sequences. Newly emerged (or inserted) genes including transposons are detected six, three, two and one times in each BAC clone, respectively. The data support previous reports (O'Neill and Bancroft, 2000;Rana et al., 2004) and provide in depth information about how triplicated Brassica genome sequences are modified after divergence with Arabidopsis at around 20 million years ago (Bowers et al., 2003).

Pericentromeric heterochromatin blocks in the Brassica rapa genome
The centromeric region of Brassica is occupied by 176 bp tandem repeats (Harrison and Heslop-Harrison, 1995). The 176 bp centromeric repeat of Brassica (named CentBr) occurred in 30% of our BAC end sequences (10 204 BAC ends of the KBrH library; GenBank accession numbers CW978 640-CW988 843) as tandem arrays, indicating that the CentBr is a major component of the B. rapa centromere. The CentBr sequences are subdivided into two classes, named CentBr1 and CentBr2, based on sequence similarity (82-84% between two classes and over 92% between members in each class). CentBr1 and CentBr2 occupy the centromeres of eight and two chromosomes, respectively (Lim et al., 2005a). We have sequenced two centromeric BAC clones, KBrH015B20 (102 kb) and KBrH001P13 (17 kb), containing centromeric tandem repeats for increased understanding of major elements in the (peri)centromeric region of the Brassica genome. Careful sequence analysis revealed several families of centromere-specific retrotransposons of Brassica (CRB). Among these, two long terminal   comparative map (Figure 2). Based on the comparative physical map and micro-co-linearity between the Brassica and Arabidopsis sequences, we have proposed an efficient and novel clone validation method for sequencing in advance of the complete physical map. The Brassica BAC clones were allocated to Arabidopsis chromosomes by in silico allocation based on unique, significant (<1E-6), and directional matches: one BAC end is forward and the other end is the reverse, with a complement match within a 30-500 kb interval. BAC-FISH and STS mapping using BAC end sequences on the counterpart Arabidopsis chromosomal region showed the real locations of the BAC clones on the chromosomes. At least one in three BAC clones is mapped onto the expected region of chromosome 1 due to the triplicated nature of the Brassica genome. All the sequenced BAC clones provide a further starting point for selection of seed BAC clones for extending the sequence.

Integrated physical mapping
Successful clone validation based on in silico allocation to counterparts of chromosome 1 suggests a novel strategy for integrated physical mapping, using comparative mapping of BAC ends onto Arabidopsis chromosomes. The integrated physical mapping strategy encompasses in silico allocation of B. rapa BAC clones to the counterpart locations of Arabidopsis chromosomes, based on significant BLAST matches. A Brassica BAC clone (average size 120 kb) covers an average of 190 kb Arabidopsis sequence based on a co-linearity index of 1.6 (= co-linear Arabidopsis sequence/Brassica  Figure 4. Schematic representation of the in silico landing on Arabidopsis chromosome and estimated real position on Brassica chromosomes. Minimum tiled BACs on in silico comparative allocation will be scattered onto three Brassica chromosomes. If the minimum tiled BACs are sequenced and mapped, fewer than 240 kb physical gaps will remain between BACs in each chromosome BAC nucleotide) ( Table 2). We have analysed 91 000 BAC end sequences (Table 3). Among them, a total of 45 232 BAC end sequences (50%) show significant sequence similarity with unique Arabidopsis sequences, and a total of 4317 BAC clones (9.5%) are allocated on Arabidopsis chromosomes by significant matching with both ends within 30-500 kb interval (  (Figure 3). A total of 500 BACs with an average 120 kb of insert will cover around 80 Mb of the euchromatin regions of the Arabidopsis genome (almost all of the euchromatin). The 500 BACs will be scattered into the triplicated regions on Brassica chromosomes (e.g. Figure 4). The actual chromosomal location of a sequenced BAC can be mapped on the genetic map through SSR or STS-PCR using its sequence information. Recently, we have selected the minimum tiled 629 Brassica BAC clones spanning 86 Mb of Arabidopsis from the in silico allocation (data is available at our website: www.brassica-rapa.org).
Each BAC clone will be mapped on the Brassica chromosomes by STS mapping and FISH analyses. About 75 Mb from gene rich euchromatin regions of Brassica will be obtained from sequencing of the 629 BACs (average insert 120 kb) that may be distributed into 10 B. rapa chromosomes (average 60 BACs for each chromosome) with an average 240 kb gap ( Figure 4). All the sequenced BAC clones will be provided to MBGP and used as a starting point for the selection of seed BAC clones extending to the flanking sides with minimum overlap based on sequence tagged connectors (STC). The results will provide in depth information about the comparative genomics between Brassica and Arabidopsis.
Complete sequencing of Brassica rapa will give great opportunities to increase our understanding of the evolution of the polyploidized genome and of agricultural aspects, especially for breeding and molecular farming, through finding novel or useful genes, not only in B. rapa but also in other important crops in the genus Brassica.