Construction and Characterization of a Bacterial Artificial Chromosome Library for the A-Genome of Cotton (G. arboreum L.)

A bacterial artificial chromosome (BAC) library for the A-genome of cotton has been constructed from the leaves of G. arboreum L cv. Jianglinzhongmian. It is used as elite A-genome germplasm resources in the present cotton breeding program and has been used to build a genetic reference map of cotton. The BAC library consists of 123,648 clones stored in 322 384-well plates. Statistical analysis of a set of 103 randomly selected BAC clones indicated that each clone has an average insert length of 100.2 kb per plasmid, with a range of 30 to 190 kb. Theoretically, this represents 7.2 haploid genome equivalents based on an A-genome size of 1697 Mb. The BAC library has been arranged in column pools and superpools allowing screening with various PCR-based markers. In the future, the A-genome cotton BAC library will serve as both a giant gene resource and a valuable tool for map-based gene isolation, physical mapping and comparative genome analysis.


Introduction
Cotton, one of the most important economical crops in the world, provides most natural textile fiber and some of oil resources for people. Cotton cultivars are diploids and tetraploids derived from natural hybridization of an African/Asian A-genome and an American D-genome species [1]. As one of the ancestors of tetraploids, the Asiatic cotton (G. arboreum L.) has been domesticated and cultivated for almost 2000 years in China since it was first introduced from India. It is still used as germplasm resources in the present cotton breeding program due to their early maturity, strong tolerance to stress, resistance to disease and insects, high fiber strength, and excellent plasticity. So decoding the A-genome will help for the diging of potential gene in Asian cotton, understanding the origin and evolution of allpolyploid, studying the genome structure of important agronomic trait and the relationship of gene structure and function, and elucidating the influence to the trait by the interaction between genes, which is important in crop improvement.
The bacterial artificial chromosome (BAC) cloning system has become an invaluable tool in many research areas of genome research because of its ability to stably maintain large DNA fragments and its ease of manipulation [2][3][4]. BAC libraries have been developed for a number of major crops including rice [5][6][7], soybean [8,9], wheat [10][11][12][13], maize [14,15], sorghum [16], and tomato [17]. For cotton, at least eight BAC/BIBAC libraries have been reported and made available to the public. These libraries were all made from different genotypes of AD-genome allotetraploid Gossypium species. Seven cotton libraries were made from upland cotton species, including Tamcot HQ95, Auburn 623 (http://hbz7.tamu.edu/homelinks/bac est/bac.htm), TM-1 [18,19], Maxxa [20], Suyun7235 [21], Zhongmiansuo12 [22] and 0-613-2R [23], and one from G. barbadense L. cv Pima90-53 [24]. Herein, we report the development of a deep-coverage BAC library for the diploid A-genome cotton species, which served as the maternal parent during polyploidization to generate the precursor of the commercially important allotetraploid species. G. arboreum L. cv. Jianglinzhongmian was selected for this aim. It is a diploid species of A-genome, widely used in breeding programs because of its high level of resistance to Fusarium wilt disease and its superior fiber strength. These BAC and BIBAC libraries will provide resources essential for advanced genomics and genetics research of cotton.

BAC Library Construction.
Cotton plants were grown in the dark for about 7 days. Etiolated cotyledons were used as the source of high-molecular-weight (HMW) DNA preparation. Nuclei were isolated, lysed and megabase DNA was purified as described by Yin et al. [25]. Agrose plugs preran to eliminate small fragments in the megabase genomic DNA. Then, megabase nuclear DNA was partially digested with HindIII to identify an appropriate partial digestion condition. Six plugs were prerun and used for large-scale digestion with optimal amount of HindIII. The DNA separation was performed in two stages. First, partially digested DNA was put into the center wells of a 1% agarose gel and separated by PFGE (6 V/cm, 50 s switch time, 18 hours run time, 12.5 • C ). The region of the compression zone containing DNA fragments in the size range from 150 to 450 kb was excised from the unstained gel and divided into three equal sections. For the second step, the three excised gel slices were embedded into a second gel and compressed by PFGE (6 V/cm, 3-5 s switch time, 16 hours run time, 12.5 • C ). The compressed DNA band was excised and recovered from the agarose gel slices using an Electroeluter model 422 (Biorad, CA, USA). Eluted DNA was ligated to vector pIndigoBAC-5 (HindIII-cloning Ready, Epicentre Technologies, Madison, WI, USA) and incubated under temperature-cycle conditions [26]. 2 μL of ligation mixture was used to transform 18 μL of ElectroMAX DH10B competent cells (Invitrogen, CA, USA) by electroporation at 17 kV/cm, 100 Ω, and 25 μF. Clones were picked by hand into 384-well plates containing LB freezing media. Plates were incubated overnight, replicated, and then frozen at −80 • C.

BAC Clones Characterization.
The BAC clones were picked from the library and inoculated into 3 mL 2 × TY medium containing chloramphenicol (12.5 μg/ml) and incubated at 37 • C for 24 hours. BAC-DNA was miniprepared by the method of Sambrook [27] with some modifications. To estimate, insert size and determine distribution of clone size, a total of 103 BAC clones were selected at random throughout the library. The BAC-DNA was digested with 5U of NotI enzyme (3 hours at 37 • C). The digestion products were separated by PFGE (6 V/cm, 5-15 s switch time, 14 hours run time, 12.5 • C). The insert size was estimated by comparison with the midrange PFGE marker II (New England BioLabs, MA, USA).
Southern blots of size-separated BAC inserts were performed by standard protocols after UV nicking the DNA (Gene Linker, UVP, CA, USA). Total genomic cotton DNA digested by HindIII, and HaeIII used as probe and labeled with DIG by standard random priming techniques (High Prime DNA Labeling and Detection Starter Kit I, Roche, Mannheim, Germany).
Six BACs selected from the library at random for stability testing (with a volume of 5 μL) were cultured in 3 mL 2 × TY medium with antibiotic at 37 • C for 24 hours. 5 μL of this culture was used to inoculate a subsequent 3 mL 2 × TY. This procedure was continued for five cycles. Every 24 hours period was considered to represent about 20 generations [28]. DNA samples isolated from the 1st and 5th day cultures (0-generation cells and 100-generation cells) were digested with HindIII and ran on 0.8% agarose gel for 16 hours with 2.0 V/cm. The gel was stained with ethidim bromide for 30 minutes, destained in water for 30 minutes, and then photographed.

BAC Library
Pooling. Under the pooling strategy by Yin et al. [25], the BAC library was arranged in two levels of pools (column pools; Super pools) allowing screening with various PCR-based markers ( Figure 3). The strategy consists of a two-step approach. Firstly, for every 384-well plate, the clones (5 μL cultures of each clone) in individual row A to P in the same column were combined into a pool containing 16 individual clones by the 12-channel transferpettor. Each column pool composed of 16 sequential 384-well plates in this way. In the second step, every column pool in the same raw was mixed by the 8-channel transferpettor. So the entire BAC library of 123,648 clones was organized into 322 super pools, each consisting of 384 unique clones.

Results
As cotton leaves are particularly rich in polyphenols and polysaccharides, we modified the general library construction method to meet the needs of cotton BAC library construction. Modifications included using etiolated cotyledon as the DNA preparation source, addition of PVP40, and increasing the β-mercaptoethanol concentration in the extraction-washing buffer. The prepared DNA, consisting of about 1 Mb nearly free of protein and organelle DNA, was suitable for BAC library construction. HMW-DNA embedded in LMP agarose plugs was partially digested with the enzyme HindIII. The optimal concentration range of HindIII found to produce the maximum number of 150-450 Kb DNA fragments was 8-12 U/plug. The partially digested DNA was gel separated and size selected twice. The DNA fragments from the second size selection were electroeluted from the gel, and DNA concentration was at least 3 ng/μL. Total twelve separate ligation reactions gave rise to the BAC library of 123,648 individual clones stored in 322 384-well plates.
To analyze the distribution of insert size and estimate the average insert size in the BAC library, DNA samples were isolated from 103 clones. BAC-DNA was digested by NotI to release the insert and fractionated by PFGE (Figure 1(a)). Statistical analysis indicated the insert size of clones ranged from 30 to 190 kb, with an average size of 100.2 kb (Figure 1(c)). No clone was found without an insert. Based on an A-genome size of 1697 Mb [29], the coverage of the library is approximately 7.2 haploid genome equivalents. This accounts for an over 99% probability of hitting a specific BAC clone containing any sequence in the genome. NotI is a GC-8 base cutter, while cotton genome is relatively AT-rich, so digestion with NotI should generate one or two insert bands plus a vector band (7.5 kb), which is consistent with our data. A Southern blot of the gel shown in Figure 1 probed with total cotton genomic DNA (Figure 1(b)) indicated that the source of cloned DNA originated from cotton.
To test the stability of BAC clones in E. coli, we analyzed the HindIII restriction patterns of six BAC clones in the 0 and 100 generations. No visible changes in fingerprints were seen between 0 and 100 generations (Figure 2), indicating the stability of the BACs.

Discussion
Cotton has a relatively large genome with about 30%-60% of the genome consisting of highly repetitive sequences. Moreover, many homologous regions exist between the At and Dt genomes of the tetraploids. These characteristics hamper standard analysis of the cotton genome. Therefore, construction of a BAC library provides an important alternative tool for genomic research. Here, we described the development and characterization of a high-quality BAC library from the A-genome diploid species G. arboreum L. cv. Jianglinzhongmian, an elite cultivar. G. arboreum L. is an old cultivated cotton species that was planted extensively in China. To date, this species is still used in the tetraploid cotton-breeding program as an elite germplasm line, due to unique properties that include early maturity, environmental resistance, biotic resistance, high fiber strength, and excellent plasticity. Additionally, G. arboreum L. has historically been chosen as a model system to study fiber development. The A-genome produces spinnable fiber, whereas the Dgenome alone is worthless in terms of fiber production [30]. Mei et al. [31] found among seven detected QTLs for six fiber-related traits, five were distributed among Asubgenome chromosomes, suggesting that the A-subgenome of the allotetraploid cotton contributes to the superior fiber production. Therefore, a BAC library established from Agenome species is useful in cotton genome studies.
Efficient library screening is crucial for all applications of the library. Screening can be performed either by hybridization on high-density filters or by PCR. Both methods are feasible, but the main advantage of hybridization is the ability to combine probes for screening the entire BAC library and identifying clones in a single experiment. PCR screening, Step I (16 clones in the same column) Step II ( Figure 3: The scheme representations of BAC pooling and PCR-based screening strategy. " → " indicated the direction of BAC pooling. Step I, one column pool was mixed with each 384-well column and 16 clones in the same column (gray dot) were combined into a column pool.
Step II, One super pool was mixed with each column pool in the same raw and the column pools in same raw (black dot) were mixed. The entire BAC library of 123, 648 clones on 322 384-well plates was organized into 322 super pools. "" indicated the direction of three rounds of PCR-based screening. Round I, screening against 322 Super pools. Round II, screening against the individual row of column pools. Round III, screening against the individual column of particular plate(s).
however, is much more reliable, faster, and efficient with higher specificity owing to effective avoidance of false positive clones identified by repeat sequences in probes by hybridization. Here, we described a three-step PCR screening procedure based on the BAC library pool system. The BAC pool strategy was sensitive enough to identify single positive clones among superpools containing 384 BAC clones. BAC clones cultured overnight served as PCR template directly, rather than using prepared BAC-DNA. This modification considerably simplifies the procedure and shortens the time required for library screening. With this change, the BAC library can be screened using SSRs, RAPD, and other PCRbased molecular tagging and will facilitate the development of whole-genome integrated physical/genetic maps of cotton. In additon, the G. arboreum L. cv. Jianglinzhongmian BAC library will provide a valuable tool for a diverse range of other studies including comparative genomics, physical mapping, map-based cloning of gene(s) of QTL(s), and marker development based on BAC-end sequencing and genoming sequencing. Recently, we developed a BAC library from TM-1 [19], which is a tetraploid genetic standard line for upland cotton. In the future, the availability of these two cotton BAC libraries will allow us to perform further comparative studies between A-genome and ADgenome species and these comparisons will further reveal the evolution of cotton genome.