Whole-Genome Profiling of a Novel Mutagenesis Technique Using Proofreading-Deficient DNA Polymerase δ

A novel mutagenesis technique using error-prone DNA polymerase δ (polδ), the disparity mutagenesis model of evolution, has been successfully employed to generate novel microorganism strains with desired traits. However, little else is known about the spectra of mutagenic effects caused by disparity mutagenesis. We evaluated and compared the performance of the polδ MKII mutator, which expresses the proofreading-deficient and low-fidelity polδ, in Saccharomyces cerevisiae haploid strain with that of the commonly used chemical mutagen ethyl methanesulfonate (EMS). This mutator strain possesses exogenous mutant polδ supplied from a plasmid, tthereby leaving the genomic one intact. We measured the mutation rate achieved by each mutagen and performed high-throughput next generation sequencing to analyze the genome-wide mutation spectra produced by the 2 mutagenesis methods. The mutation frequency of the mutator was approximately 7 times higher than that of EMS. Our analysis confirmed the strong G/C to A/T transition bias of EMS, whereas we found that the mutator mainly produces transversions, giving rise to more diverse amino acid substitution patterns. Our present study demonstrated that the polδ MKII mutator is a useful and efficient method for rapid strain improvement based on in vivo mutagenesis.


Introduction
Random mutagenesis is a powerful tool for generating enzymes, proteins, metabolic pathways, or even entire genomes with desired or improved properties [1]. Due to the technical simplicity and applicability to almost any organism, chemical or radiation mutagenesis is frequently used for the generation of genetic variability in a microorganism. However, these methods tend to be inefficient because they can cause substantial cell damage when performed in vivo [2].
A novel mutagenesis technique using error-prone DNA polymerase δ (polδ), based on the disparity mutagenesis model of evolution [3] has been successfully employed to generate novel microorganism strains with desired traits [4][5][6][7][8][9][10][11]. In the disparity model, mutations occur preferentially on the lagging strand, due to the more complex, discontinuous DNA replication that takes place there. Computer simulation shows that the disparity model accumulates more mutations than the parity model, in which mutations occur stochastically and evenly in both strands [3]. In addition, the disparity model produces greater diversity because some offspring will have mutant DNA while some offspring will have nonmutated, wild-type DNA.
Several studies have shown that the disparity mutagenesis method often achieved more satisfactory results (i.e., higher mutation rate and quick attainment of the desired phenotype) than conventional methods such as the chemical mutagen, ethyl methanesulfonate (EMS) [5,10], which is known to produce mainly G/C to A/T transitions [12]. However, little else is known about the spectra of mutagenic effects caused by disparity mutagenesis.
With the recent advent of next-generation sequencing technologies, an accurate characterization of the mutant genome, relative to the parental reference strain, is now achievable. In fact, Flibotte et al. have analyzed the mutation spectra induced by various mutagens, such as EMS, ENU, and UV/TMP, in the whole genome of Caenorhabditis elegans [12]. Another group has also used these sequencing technologies to analyze the genetic variations between a parental and EMS-mutagenized strain of yeast [19].
In this study, we evaluate the performance of the polδMKII mutator, which expresses the proofreadingdeficient and low-fidelity polδ in S. cerevisiae haploid strain, compared with the commonly used chemical mutagen EMS. This mutator strain possesses exogenous mutant polδ supplied from a plasmid, thereby leaving the genomic one intact. We measured the mutation rate of this mutator strain and found that the mutation frequency of polδMKII was approximately 7 times higher than that of EMS. We also performed high-throughput next generation sequencing with Illumina GAII to analyze the genome-wide mutation spectra produced by the 2 different mutagenesis methods and found that the mutator strain exhibited more pleiotropy and gave rise to more diverse amino acid substitution patterns. Our present study has demonstrated that a proofreadingdeficient and low-fidelity polδMKII mutator is a useful and efficient method for rapid strain improvement based on in vivo mutagenesis. This mutator is also useful for studying the acceleration of evolution.

Mutator
Mutagenesis. YCplac33/polδMKII vector (and YCplac33 empty vector as nonmutator control) was introduced into S. cerevisiae BY2961 strain cells using the LiCl method, and the transformants (mutator strains) were selected on synthetic complete (SC)-agar plates without uracil. Five mutator strains were picked and independently cultivated in 1 mL SC medium at 30 • C for 24 h (about 30 generations) in order to introduce mutations into their chromosomes. To determine the mutation frequencies of the 5 mutator strains, aliquots were spread on SC-agar plates containing L-canavanine sulfate salt (0.06 mg/mL) (Sigma, St. Louis, MO, USA) to identify CAN1 mutants, and incubated until resistant colonies were formed. The mutation frequencies were calculated as the number of drug-resistant colonies divided by the number of colonies on SC-agar plate without drug. Forward mutation rates at CAN1 were determined by fluctuation analysis using these 5 independent cultures [21]. In order to fix mutations, another aliquot of the mutator culture was spread on SC-agar plates containing 5-fluoroorotic acid monohydrate (Wako) to obtain demutatorized cells curing from YCplac33/polδMKII vector. The genomic DNA was prepared from the demutatorized cells using the procedure described in the following section.

EMS Mutagenesis.
S. cerevisiae BY2961 strain cells were suspended in 0.1 M phosphate-buffered saline (PBS) (pH 7.0) containing 1.5, 2.0, 2.5, or 3.0% ethyl methanesulfonate (EMS) and were incubated at 30 • C for 1 h to introduce chromosomal mutations. The cells were washed 3 times with 5% sodium thiosulfate, suspended in sterilized water, and spread on SC-agar plates containing L-canavanine sulfate salt (0.06 mg/mL) (Sigma) to identify CAN1 mutants. The mutation frequencies were calculated as described above. Another aliquot of the EMS-treated cell suspension was spread on a YPD-agar plate to isolate single clones. The genomic DNA was prepared from 5 single clones derived from the cells treated with 1.5% EMS using the procedure described in the following section.

Library Preparation for Illumina
Sequencing. The genomic DNA from S. cerevisiae was extracted using the DNeasy Blood and Tissue kit (Qiagen, Valencia, CA, USA). Each sequenced sample was prepared according to the Illumina protocols. Briefly, 3 μg of genomic DNA was fragmented to an average length of 200 bp by using the Covaris S2 system (Covaris, Woburn, MA, USA). The fragmented DNA was repaired, a single "A" nucleotide was ligated to the 3 end, Illumina Index PE adapters (Illumina, San Diego, CA, USA) were ligated to the fragments, and the sample was size selected for a 300 bp product using E-Gel SizeSelect 2% (Invitrogen, Grand Island, NY, USA). The size-selected product was amplified by 18 cycles of PCR with the primers InPE1.0, InPE2.0, and the Index primer containing 6-nt barcodes (Illumina). The final product was validated using the Agilent Bioanalyzer 2100 (Agilent, Santa Clara, CA, USA).

Sequencing and Data
Analysis. The 11 barcoded libraries (the parental strain BY2961, 5 colonies from the mutator strain, and 5 colonies from the EMS-treated strain) were used for cluster generation in several multiplexed flow cell lanes in the Illumina Genome Analyzer II system. Ninety-one cycles of multiplexed paired-end sequencing was performed, running phi X 174 genomic DNA as a control in a separate lane of the flow cell. After the sequencing reactions were International Journal of Evolutionary Biology 3 complete, Illumina analysis pipeline (CASAVA 1.6.0) was used to carry out image analysis, base calling, and quality score calibration. Reads were sorted by barcode and exported in the FASTQ format. The quality of each sequencing library was assessed by evaluating the quality score chart and the nucleotide distribution plot using FASTX-Toolkit (http://hannonlab.cshl.edu/fastx toolkit/).
Once the raw sequence data were curated, the reads of each sample were aligned to the S288c reference genome (http://www.yeastgenome.org/) using the BWA software (Ver. 0.5.1) with default parameters [22]. To avoid false positives and mutations from repetitive regions, we removed repetitive reads from the alignment files. We then used the SAMtools software (Ver. 0.1.9) [23] to produce the lists of mutations. To identify mutations that were produced by mutagenesis, we applied the following filtering criteria to the lists of mutations: (a) the coverage at the mismatch positions should be at least 10; (b) the variant is not present in the sequenced parental strain; (c) indels meet a SNP quality threshold of 50 and substitutions meet a SNP quality threshold of 20 (SAMtools assigns SNP quality, which is the Phredscaled probability that the consensus is identical to the reference); (d) samples meet a mapping quality of 30 (SAMtools assigns Mapping quality, which is the Phred-scaled probability that the read alignment is wrong); (e) the percentage of reads showing the variant allele exceeds 90%.
A variant must pass this filter to be considered a mutation. Alignments of all mutations were inspected by Integrative Genomics Viewer (IGV) [24]. The lists of mutations were then annotated using COVA (comparison of variants and functional annotation) (http://sourceforge .net/projects/cova). COVA was specifically designed to annotate the large number of identified mutants using the Genbank annotation files.

Determination of Mutation Frequencies.
In this study, we evaluated the performance of the polδMKII mutator, compared with that of the commonly used chemical mutagen, EMS. To assess EMS efficiency, S. cerevisiae BY2961 cells were treated with different concentrations of EMS. The lethality and mutation frequencies of the canavanine resistant colonies are shown in Table 1. At an EMS concentration of 1.5%, the mutation frequency was approximately 18-fold higher than that in the control (untreated) strain. Above 2.0% EMS, the survival rate decreased with no increase in mutation frequency. Based on this result, we decided to use cells treated with 1.5% EMS for whole-genome sequencing.
To assess the effectiveness of the mutator, we transformed the haploid BY2961 strain with a yeast expression plasmid,  YCplac33/polδMKII, expressing the polδ mutant allele containing both the mutation to inactivate the proofreading activity (D321A and E323A) and the mutation to decrease the fidelity of replication (L612M). The mutator strain harboring the YCplac33/polδMKII plasmid will be referred to from here on as "mutator." We determined the mutation frequency by resistance to canavanine. As summarized in Table 2, the mutation frequency of the mutator was approximately 132-fold higher than in the cells containing the empty vector. The forward mutation rate at the CAN1 (arginine permease) locus was calculated to be 7.9 × 10 −6 /cell division. These results show that the plasmid-generated mutated polδ protein effectively competes with the endogenous wildtype polδ protein that is produced from the chromosome, and the semidominant negative expression of mutated polδ was effective in introducing mutations. These results also demonstrate that the mutation frequency of the mutator was approximately 7 times higher than that of EMS.

Whole-Genome Sequencing.
To analyze the genome-wide mutation spectra of the 2 different mutagenesis methods, we implemented a parallel sequencing approach with the Illumina Solexa technology (GAII instrument). We sequenced the parental haploid strain BY2961, each of the 5 clones from the mutator strains, and each of the 5 clones from the EMStreated strains under nonselective conditions. Sequencing reads were aligned to the S288c reference genome using the BWA software [22]. To avoid false positives due to mutations from repetitive regions, reads mapped to multiple locations were discarded, and only uniquely mapped reads were used for subsequent analysis. In the current study, the average genomic coverage ranged from 32× to 87× (Table 3). On average, 94.18% of the S288c reference genome was covered with at least 1 uniquely mapped read at each base. Subsequently, we analyzed the data for 2 kinds of mutational events: single nucleotide variants (SNVs) and small insertions and deletions (Indels). Illumina sequencing found 6,766 genetic differences between our parental strain BY2961 and the S288c. Mutations induced by these mutagens were identified by subtracting the parental mutations. Sequence-processing details can be found in Section 2.

The Mutation Spectra of Mutator and EMS.
We compared the average number of mutations between mutator strains and EMS-mutagenized strains (Figure 1). Mutator produced fewer SNVs than EMS (7.2 versus 55.8 per strain, resp., P < 0.05). Mutator and EMS produced few deletions (1.6 versus 2.8 per strain, resp.), as well as few insertions (0.2 versus 0.6 per strain, resp.). An average of 1.14 × 10 7 nucleotide sites fulfilled our criteria of read depth (≥10), with an average base-substitutional mutation rate estimate of EMS: 4.87 (SE = 1.34)×10 −6 per site, Mutator: 2.09 (0.55)× 10 −8 per site per cell division (about 30 generations). The rate we calculated for the mutator is 100-fold higher than the previously reported spontaneous mutation rate, 3.3 (0.8) × 10 −10 , based on 454 analyses of 4 mutation-accumulation (MA)-lines [26]. The 2 mutagens generate mutations that are distributed similarly across the various gene features although the mutator did produce more SNVs within exons than did EMS (Figure 2).
The mutation spectra are shown in Figure 3(a). In the genome-wide profile, we found that the mutator primarily induced transversions (72%) while EMS primarily induced transitions (97%), well in accord with the known mutagenic specificity of EMS [12]. Similarly, the mutator primarily induced transversions (69%) in the nonsynonymous substitutions in exons (Figure 3(b)), similar to what has been seen in pol3-01 study using URA3 reporter gene [16]. EMS treatment was also in agreement with the genome-wide spectra, induced transitions with a prevalence of 98%.

Amino Acid Substitution Patterns.
The mutation spectra of a given mutagenesis method influences the repertoire of changed amino acids at the protein level, and we were able to evaluate the amino acid substitution patterns generated by our 2 protocols (Table 4). Initially, we classified mutations into those that preserved the corresponding amino acid, International Journal of Evolutionary Biology  changed the amino acid, or generated a stop codon. A clear difference was seen between mutator and EMS. Of the total mutations, the mutator changed the amino acid in approximately 85%, whereas EMS changed the amino acid in approximately 61%. The mutator also generated more stop codons than EMS (7% versus 2%, resp.). While mutator generated more changes to the first or second nucleotide of the codon, EMS generated changes in all 3 positions in approximately equal proportions. Amino acid changes were classified into conservative and nonconservative substitutions, where a conservative substitution changed the encoded amino acid to a similar amino acid according to the criteria of the BLOSUM62 matrix [25]. Of the amino acid changes, mutator produced more nonconservative substitutions than EMS (83% and 53%). For the comparison of random mutagenesis methods, Wong et al. [27] proposed a useful structure indicator that takes into account Gly and Pro substitutions as well as stop codons. In our study, the mutator produced an equivalent number of Gly/Pro and stop codon substitutions, whereas EMS generated only stop codon substitutions.

Discussion
In this study, we evaluated the performance of a novel mutagenesis technique using error-prone proofreading-deficient and low-fidelity DNA polymerase δ by determining the mutation rate of the strain harboring the enzyme. We also analyzed the spectra of mutations across the entire S. cerevisiae genome and then assessed the diversity of mutation types at the amino acid level.
Proofreading-deficient polδ mutants, such as pol3-01 strain, and several low-fidelity polδ mutants, such as L612M, have been shown to present a mutator phenotype and to elevate the mutation rate [14][15][16][17][18]. We generated a BY2961 strain expressing a polδMKII mutator, polδ mutant allele containing a combination of mutations to inactivate the proofreading activity (D321A and E323A) and to decrease the fidelity of replication (L612M). This mutant allele acts as a strong mutator, as evidenced by the high frequency of spontaneous mutations (131-fold over control, compared to 18-fold for EMS strains). Vencatesan et al. reported the forward CAN1 mutation rates of polδ mutants as 1.5 × 10 −6 in L612M, and 5.6 × 10 −6 in pol3-01 [18]. These mutant strains were constructed by integrating the pol3-01 or pol3-L612M allele into the chromosomal POL3 gene by targeted integration, thereby disrupting the endogenous POL3 gene. In contrast, our mutator plasmid expressing the polδ mutant allele produced a mutation rate of 7.9 × 10 −6 , which shows a high mutation rate as well as chromosomal integration. The use of the polδMKII mutator plasmid allows the continued expression of the endogenous wild-type POL3 and provides for an efficient restoration of the wild-type mutation rate by curing the yeast strains of the mutator plasmid. Once the desired trait(s) has been selected, curing the cells from the mutator plasmid can stabilize the newly obtained phenotype.
In general, all random mutagenesis methods developed to date are biased toward transition mutations, although efforts have been made to overcome this [28]. While transition bias was observed in EMS, we actually observed transversion bias with the mutator (Figure 3(a)). Because of this, the mutator yielded a broader spectrum of nucleotide changes across the entire genome. The mutator was also biased toward transversions in the nonsynonymous substitutions (Figure 3(b)). For EMS, the spectrum of mutation events we observed is similar to what has been reported by others [12].
At the protein level, the amino acid substitution pattern differed between the mutator and EMS (Table 4). Mutations generated by the mutator resulted in amino acid substitutions more often than did mutations generated by EMS (85% versus 61%, resp.). Most of the substitutions made by the mutator were nonconservative, whereas only half of the substitutions made by EMS were nonconservative. In addition, the mutator generated more structure-disturbing amino acid changes (Gly/Pro). The transversion bias of non-synonymous substitutions by the mutator generates more diverse amino acid substitution patterns than does the transition bias of EMS.
Although the average base-substitution mutation rate of EMS was approximately 100 times higher than that of the mutator, the mutation frequency of the mutator was approximately 7 times higher than that of EMS. This gap between a higher apparent mutation frequency and fewer mutations may be explained by the higher proportion of amino acid changes and the diversity of amino acid substitutions by the mutator. This suggests one plausible explanation for the effectiveness of the disparity mutagenesis.
The disparity mutagenesis technique has been successfully applied to not only eukaryotic microorganisms such as S. cerevisiae [5,[7][8][9], S. pombe [9], and Ashbya gossypii [10], but also to prokaryotic microorganisms such as Escherichia coli [4] and Bradyrhizobium japonicum [6]. We believe that  this novel mutagenesis technique has the potential to be applied to a wide variety of microorganisms.
Our present study has demonstrated that a proofreadingdeficient and low-fidelity polδMKII mutator is a useful and efficient method for rapid strain improvement based on in vivo mutagenesis. It has been suggested that organisms may accelerate evolution by decreasing the fidelity of the proofreading activity of polδ in nature [29]; therefore, this mutator may also be useful for studying the acceleration of evolution.

Data Access
The raw reads used in this study are available on the DDBJ Sequence Read Archive (DRA) under accession DRA000522.