Utilization of Super BAC Pools and Fluidigm Access Array Platform for High-Throughput BAC Clone Identification: Proof of Concept

Bacterial artificial chromosome (BAC) libraries are critical for identifying full-length genomic sequences, correlating genetic and physical maps, and comparative genomics. Here we describe the utilization of the Fluidigm access array genotyping system in conjunction with KASPar genotyping technology to identify individual BAC clones corresponding to specific single-nucleotide polymorphisms (SNPs) from an Amplicon Express seven-plate super pooled Amaranthus hypochondriacus BAC library. Ninety-six SNP loci, spanning the length of A. hypochondriacus linkage groups 1, 2, and 15, were simultaneously tested for clone identification from four BAC super pools, corresponding to 28 384-well plates, using a single Fluidigm integrated fluidic chip (IFC). Forty-six percent of the SNPs were associated with a single unambiguous identified BAC clone. PCR amplification and next-generation sequencing of individual BAC clones confirmed the IFC clone identification. Utilization of the Fluidigm Dynamic array platform allowed for the simultaneous PCR screening of 10,752 BAC pools for 96 SNP tag sites in less than three hours at a cost of ~$0.05 per reaction.


Introduction
Sequence-based molecular markers (e.g., SNPs), genetic linkage maps, and expressed sequence tagged libraries are important molecular tools needed for advanced genomic studies. One genomic tool of particular importance has been BAC libraries (large insert DNA libraries). BAC libraries are critical for identifying full-length genomic sequences, correlating genetic and physical maps [1], comparative genomics [2] and traditionally were the first step towards whole genome sequencing projects [3]. BAC libraries have been successfully developed for numerous species, including economically important crop species, secondary/emerging crop species, and model organisms [4][5][6][7][8].
Traditionally, the identification of specific BAC clones corresponding to specific DNA sequences (aka BAC library screening) was accomplished by probing high-density nylon membranes (single or double) spotted with individual BAC clones representing the entire or portions of the BAC library. This hybridization method, while reliable, requires the problematic use of radioactively labeled probes. If many probes are required to be screened, for example, to establish a connection between a linkage map and a physical map, either multiple copies of the spotted library are required for simultaneous screening (increased cost) or the library must be probed serially (significant time requirement), taking care that the blots are not exhausted before all probes have been screened-normally 3-5 hybridization events. Moreover, the presence of repeat elements in the labeled probe themselves often confounds the hybridization results [9].
PCR-based screening of BAC libraries is an attractive alternative to hybridization-based screening, since the PCR screening can be significantly cheaper, and the efficiency of the PCR-based screening can be significantly enhanced by dimensional pooling of the BAC library [1]. In the dimensional pooling scheme, clones are often pooled by plate, then by row and lastly by column. PCR screening of the different pools allows for the identification of specific 2 Journal of Biomedicine and Biotechnology BAC addresses (plate, row, and column) containing the corresponding sequence tag site (STS) [10].
The genus Amaranthus (Caryophyllales: Amaranthaceae) encompasses about 60 species of worldwide distribution [11]. The grain amaranths (A. hypochondriacus L., A. cruentus L., and A. caudatus L.) produce edible seeds and are an important food crop in several areas of Latin America and Africa [12]. Amaranth seed protein has an exceptional balance of amino acids and an average seed protein content (15% on a dry matter basis) that is notably higher than most cereal grains [13,14]. Despite the relative minor status of the grain amaranths as an alternative crop, important genomic tools are being developed that should aid in the genetic improvement of the species. These tools include the development of (i) molecular markers (RAPDs, AFLPs, microsatellites, and SNPs) used to resolve taxonomic questions [15][16][17][18][19][20], (ii) a 10X BAC library utilized for genomic sequencing of herbicide target genes [21], (iii) a densely populated SNP-based linkage map needed to facilitate quantitative trait loci (QTL) discovery experiments [22], and (iv) a deeply sequenced transcriptome generated from stress response leaf and stem tissues [23].
In this study, we describe a proof-of-concept approach that utilizes the Fluidigm IFC technology (96.96 Dynamic Array) with an Amplicon Express seven-plate super pooled dimensional library of an A. hypochondriacus BAC library [21] to accomplish a high-throughput screening of SNP loci from the recently released A. hypochondriacus linkage map [22].

BAC Library.
The amaranth BAC library utilized consisted of 36,864 clones and was constructed with a HindIII partial digestion of the A. hypochondriacus cultivar "Plainsman" [21]. The average insert size of the library was 125 Kb and the genome coverage was estimated at 10.6X. In this proof-of-concept experiment, a subset consisting of the first 10,752 clones (28 384-plates) was used in the pooling strategy, providing coverage of 3.4X genome equivalents.

BAC DNA Isolation.
BAC DNA isolation and pools were produced by Amplicon Express (Pullman, WA). Each BAC clone was grown independently in 2X YT broth for 16 h at 37 • C with 12.5 ug/mL chloramphenicol. Following the growth period, equal quantities of culture for each of 7 plates were pooled as described below. BAC plasmid DNA was isolated using an optimized alkaline lysis method and resuspended in TE and then pooled to create the super pools as described by Bouzidi et al. [24], with the minor modification that the DNA was suspended at high concentration (20 ng/uL).

Pooling Strategy.
Twenty-eight 384-well plates were pooled into four distinct super pools (SP1-4). For each super pool, BAC DNA was pooled into five pools corresponding to each of seven 384-well plates, 8 pools corresponding to each of the 384-well plate rows of all seven plates, and 10 pools corresponding to each of the 384-well plate columns of all seven plates ( Figure 1). For each super pool, each plate of the seven plates in the super pool is found in two wells, each row of the seven plates is found in two wells and each column of the 7 plates is found in two wells. The final 96-well plate consisted of four super pools ( Figure 1).

Screening of the BAC Library.
The BAC pooled library was screened simultaneously with 96 PCR-based SNP assays. The 96 SNP assays, including primer sequences and genetic locations, were described by Maughan et al. [22]. The assays are based on competitive allele-specific PCR KASPar chemistry (KBioscience Ltd., Hoddesdon, UK) and were performed on a Fluidigm (Fluidigm Corp., South San Francisco, CA) nanofluidic 96.96 dynamic array [25]. For PCR on the 96.96 IFC using the KASPar chemistry, a 5 µL sample mix, consisting of 2.25 µL BAC DNA (20 ng/µL), 2.5 µL of 2X KASP reagent Mix (KBioscience Ltd.) and 0.25 µL of 20X GT sample loading reagent (Fluidigm Corp., South San Francisco, CA) was prepared for each BAC DNA sample. Similarly, a 4 µL 10X KASP Assay, containing 0.56 µL of the KASP assay primer mix (allele-specific primers 12 µM, common reverse primer 30 µM), 2 µL of 2X Assay Loading Reagent (Fluidigm Corp., South San Francisco, CA), and 1.44 µL DNase-free water were prepared for each SNP assay. The assay mix and sample mix were then loaded onto a 96.96 dynamic array chip, mixed, and thermalcycled using an IFC Controller HX and FC1 thermal cycler (Fluidigm Corp., South San Francisco, CA) according to the manufacturer's protocol. Thermal cycling consisted of an initial thermal mix cycle (70 • C-30 min; 25 • C-10 min) a hotstart Taq polymerase activation step (94 • C-15 min), followed by a touchdown amplification protocol as follows: 10 cycles of 94 • C for 20 sec, 65 • C for 1 min (decreasing 0.8 • C per cycle), 26 cycles of 94 • C for 20 sec, 57 • C for 1 min; hold at 20 • C for 30 sec.

BAC Clone Deconvolution and Verification.
End-point fluorescent images of the 96.96 IFC were acquired on an EP-1 imager (Fluidigm Corp., South San Francisco, CA). The data was analyzed with Fluidigm SNP genotyping Analysis Software. Amplification patterns were used to identify candidate BAC clones from the positive amplification data. Verification of candidate BAC clones was accomplished using standard PCR technology and through sequencing using standard protocols for 454-pyrosequencing as a service at the Brigham Young University DNASC (Provo, UT) using a Roche-454 GS FLX instrument, MID barcoding for each BAC clone, and Titanium reagents (Branford, CT). DNA for PCR and 454-pyrosequencing was obtained using a Sigma PhasePrep BAC DNA kit Sigma-Aldrich, St. Louis, MO) according to the manufacturer's protocol.   Fluidigm 96.96 Dynamic Array. We chose a seven-plate super pooling strategy (as opposed to a deeper pooling strategy of 10 or 12 plates) based on the need for maximizing genome coverage, while minimizing the complexity of the pools to facilitate deconvolution of candidate BAC clone addresses. The A. hypochondriacus BAC library used was described by Maughan et al. [21] and consists of 36,864 clones with an average insert size of 125 kb, 6.9% contamination with extranuclear DNA (chloroplast and mitochondrial), and approximately 1.8% of the clones containing empty vectors. Taking this information into account, we calculated that a 3.4X genome equivalent sublibrary, consisting of 10,752 clones, would result in a 92.8% probability that any specific DNA target sequence screened would be present in the sublibrary (P = 1 − e ((N * ln(1−(i/GS)))) , where P is the probability; N is the number of clones; i is the average insert size of clones and GS is the haploid genome size). The 3.4X genome equivalency also suggested that smany of the positive clones could be deconvoluted to specific BAC clone addresses. We note that if a single BAC super pool contains more than one positive clone for the target sequence, the exact address of the positive candidate BAC clone cannot be unambiguously ascertained.

PCR Screening.
The high-throughput screening of the pooled library employed a Fluidigm 96.96 Dynamic Array genotyping platform using KASPar genotyping chemistry. The Fluidigm platform uses an integrated fluidic chip to create 9,216 9.6 nL PCR reactions. Thus four super pool libraries, each consisting of 23 pooled samples (Figure 1), could be screened simultaneously with 96 STS targets. We also included two positive controls (genomic DNA from the cultivar "Plainsman") and two no template controls (NTCs). Setup of the IFC can be accomplished with an 8channel multichannel pipettor in less than 1 hour. The use of the fluorescence KASPar genotyping chemistry eliminates the need for detection of the amplified PCR product using radiography or electrophoresis, since successful amplification is detected by a fluorescent signal. Thus in our proof-ofconcept experiment we screened 10,752 BAC clones, arrayed in four super pools, with 96 SNP assays using a single IFC. The SNP assays utilized represented SNP loci that were distributed across three linkage groups of a recently published A. hypochondriacus linkage map, specifically linkage groups 1, 2, and 15 ( Figure 2). The complete list of SNP markers utilized, along with their GenBank accession number, SNP type, and primer sequences can be found in LG1 LG2 LG15 AM19395  AM25309  AM21432  AM22004  AM18598  AM23112  AM19431  AM24661  AM22471  AM26133  AM22487   0  5  10  15  20  25  30  35  40  45  50  55  60  65  70  75  80  85  90  95  100  105  110  115  120  125  130 135 140 AM27607 Figure 2: Distribution of the SNP markers screened on linkage groups 1, 2, and 15 [22]. Map distances are in cM, corrected with the Kosambi function. SNP markers in the grey boxes were screened on the BAC library (n = 96). BAC clones corresponding to SNP markers from LG2, indicated with a black arrow, were sequenced using 454-pyrosequencing.
Supplemental  [26]) or that the BAC library, developed through partial HindIII digestion, may be underrepresented in some genomic regions. Sixty-eight (70.8%) of the clones amplified in one or more of the BAC pool samples, of which 44 (64.7%) could be unambiguously assigned to specific BAC clone library addresses. The remaining 24 (35.3%) could not be assigned to an unambiguous BAC clone address, since two or more positive clones were present in a single super pool. We note that while specific BAC clone addresses could not be identified for these 24 SNP assays, specific plates, and often, specific columns and/or rows could be determined. Representative images of the BAC pool screening results are shown in Figure 3.

BAC Clone Verification.
Standard PCR amplification and next-generation sequencing using 454-pyrosequencing were used to verify the Fluidigm IFC screening of the BAC sublibrary. Of the 44 clones that could be unambiguously assigned to specific BAC clone addresses, all produced specific amplification products of predicted lengths using standard PCR and gel electrophoresis (data not shown), indicating that BAC clones with the correct target sequence had been identified. Furthermore, we 454-pyrosequenced eight BAC clones targeted by eight SNP loci distributed across linkage group 2 as further evidence of the successful identification of targeted BAC clones. Based on the BAC library average insert size, the selected eight BAC clones represented an estimated 1.0 Mb of the A. hypochondriacus genome. The BAC clones, together with their corresponding SNP assay, can be found in Table 1. DNA from each clone was MID-barcoded and pooled in equimolar amounts and sequenced on a quarter portion of a 454-pyrosequencing picotiter plate. A total of 218,786 reads were obtained, producing 82 Mb of total sequence with an average read length of 376 bp. Newbler assembler (v. 2.6) was used to partition reads into their respective barcode pools and to remove the MID-barcode and to trim the sequences of any contaminating BAC vector sequences (pAGIBAC1). Contigs, specific to each BAC, were constructed using the Newbler, and only large contigs (≥500 bp; 81% of all contigs) were used in all subsequent analyzes. The number of contigs assembled for each BAC clone ranged from a low of 4 to a high of 15, with an average N50 Contig size across all BACs of 60,364 Kb. The largest contig assembled was for the BAC clone detected with SNP AM25953, which spanned greater than 147 Kb. The final assembly statistics for each BAC clone are given in Table 1. Verification of the SNP target sequence in the targeted BAC clone was determined using Basic Local Alignment Search Tool (BLASTn), where the SNP marker and its flanking sequence (200 bp) were used as the query sequence and the large contigs from all clones as the search database. For each SNP sequence query, the only significant (E-value < 1E − 10) hit was with a contig corresponding to the specific BAC clone identified via the IFC-based PCR screen of the super pools (Table 1), further verifying the screening methodology. All significant hits spanned the SNP itself, as well as the entire flanking sequence with 100% nucleotide identity. All hits had E-values < 1E − 100 (Table 1).

Conclusions
We report the development and verification of a highthroughput method for screening BAC libraries for specific DNA sequence tag sites (STSs). In this method we utilize a Fluidigm Access Array platform combined with KBioscience KASPar genotyping chemistry to screen a super pooled BAC library. A single Fluidigm 96.96 IFC is capable of producing 9,216 PCR reactions in a single run (∼3 hours) with little technical expertise, and since each PCR reaction is done on a nanoliter scale (9.6 nL), the consumable reagent costs (i.e., Taq polymerase, primers, and IFC chip) are only ∼ $0.05 for PCR reaction. The seven-plate pooling strategy of the BAC library allowed for the simultaneous screening of 10,752 BAC clones (four super pools, representing 28 384well plates from the BAC library) with 96 DNA sequence tags sits (here we used SNP tag sites), with nearly 50% of the STS unambiguously identifying specific BAC clone addresses. A significantly greater proportion of unambiguously identified addresses could be obtained by simply increasing the number of seven-plate pools screened-in this proof-of-concept experiment we screened only 4 of the 14 seven-plate pooled libraries possible from our 10.6X (36,864 clone) BAC library. The approach presented here provides a simple and costeffective method to rapidly and reliably screen a BAC library with multiple (up to 96 simultaneously) sequence tag site.