Discovery of Single Nucleotide Polymorphisms and Mutations by Pyrosequencing

Comparative genomics, analyzing variation among individual genomes, is an area of intense investigation. DNA sequencing is usually employed to look for polymorphisms and mutations. Pyrosequencing, a real-time DNA sequencing method, is emerging as a popular platform for comparative genomics. Here we review the use of this technology for mutation scanning, polymorphism discovery and chemical haplotyping. We describe the methodology and accuracy of this technique and discuss how to reduce the cost for large-scale analysis.

Determination of mutations and polymorphisms in a genome is one of the most important tasks in the studies of biological systems today. Three DNA sequencing platforms are now being used to scan for mutations and polymorphisms. These include Sanger DNA sequencing [22] hybridizationbased sequencing [4,6,10,23,24] and Pyrosequencing [17,20]. Pyrosequencing is based on the detection of released pyrophosphate (PPi) during DNA synthesis. In a cascade of enzymatic reactions, visible light proportional to the number of incorporated nucleotides is generated ( Figure 1). The cascade starts with a nucleic acid polymerization reaction in which inorganic pyrophosphate (PPi) is released as a result of nucleotide incorporation by polymerase. The released PPi is subsequently converted to ATP by ATP sulfurylase which provides the energy to luciferase to oxidize luciferin and generate light. Since the added nucleotide is known, the sequence of the template can be determined ( Figure 1). Pyrosequencing has the potential advantages of accuracy, flexibility, parallel processing and can be easily automated. Furthermore it dispenses with the need for labelled primers, labelled nucleotides and gel-electrophoresis [21]. The methodological performance of Pyrosequencng in determination of difficult secondary DNA structures [19], mutation detection [2], cDNA analysis [12,14,18], re-sequencing of disease-associated genes [3,8], bacterial typing [11], viral typing [9] and single-nucleotide polymorphism analysis [1,5,7] has been shown. Most recently, we reported on multiplexing of Pyrosequencing [15] and showed the usefulness of single-stranded DNA-binding protein in the Pyrosequencing reaction system for long read sequencing and sequence determination of difficult DNA templates [16].
Current Pyrosequencing strategy using a commercial machine allows more than 50 nucleotides to be de novo sequenced routinely. Pyrosequencing may be the method of choice for sequencing of difficult secondary DNA structures which cannot be sequenced by conventional sequencing. In this review we discuss the use of Pyrosequencing for mutation scanning, SNP scanning and haplotyping, and describe the cost reduction efforts for largescale studies.
DNA mutations can be classified as known or unknown mutations. If the mutation is known, the region containing the mutation can be analyzed at or nearby the mutation site. If the mutation is unknown, re-sequencing of the region for determination of the nature of the mutation is required. When DNA from biopsy material is being resequenced, a quantitative method for determining the ratio between wild and mutated template is desired. When analyzing heterozygous samples, conventional DNA sequencing does not reveal the exact ratio of mutated DNA to wild type and quite often cannot even detect the mutation when the ratio is below 0.5. However, Pyrosequencing has been shown to produce quantitative data and has been used to detect alleles with a frequency as low as 5% (www.pyrosequencing.com). Although this accuracy is only obtained with known polymorphisms, a ratio of 0.3 can be detected with a relatively high accuracy while scanning for mutation ( Figure 2). We recently reported on the use of this technology for mutation scanning of the p53 gene in DNA extracted from biopsies and could detect new mutations in blind tests [8]. Pyrosequencing has also been used in mutation scanning of mitochondrial DNA ( Figure 3). The sensitivity in mutation detection may be improved by the use of specially designed software programs for comparison of the obtained sequencing data with a reference data. When scanning for mutation in disease-associated DNA samples, a programmed dispensing order can be used allowing longer reads to be obtained which facilitates mutation detection. In addition, Pyrosequencing analysis has the potential to determine the allelic distribution of mutations in samples, which carry more than one mutation. This information could contribute to a better understanding of the effects of gene alterations in different diseases and lead to improved clinical interpretation.

Single nucleotide polymorphism discovery
The Pyrosequencing strategy using commercial machines allows 60 nucleotides to be sequenced routinely. This will allow scanning for polymorphisms across a DNA template. Figure 4 demonstrates polymorphism scanning on a 500 nucleotide long DNA fragment. An average read-length of more than 60 nucleotides was obtained and we successfully detected the single nucleotide polymorphisms. Both homozygous ( Figure 4a) and heterozygous (Figure 4b) templates were sequenced. Comparison of the sequences were performed manually, however, a higher accuracy in SNP discovery can be obtained when the pyrograms are compared by specialized software.

Haplotyping
Pyrosequencing is based on sequencing-by-synthesis. Therefore, different phases at polymorphic regions

Challenges in reading polymorphic regions by Pyrosequencing
It is possible to detect polymorphisms when using Pyrosequencing as a platform for SNP discovery. If the polymorphism is a substitution, it will be possible to obtain a synchronized extension after the substituted nucleotide. If the polymorphism is a deletion or insertion of the same kind as the adjacent nucleotide on the DNA template, the sequence after the polymorphism will be synchronized. However, if the polymorphism is a deletion or insertion of another type the sequencing reaction can become out of phase making the interpretation of the subsequent sequence difficult. If the polymorphism is known, it is always possible to use programmed nucleotide delivery to keep the extension of different alleles synchronized after the polymorphic region. It is also possible to use a bidirectional approach [19] wherein the complementary strand is sequenced in order to decipher the sequence flanking the polymorphism. Another inherent problem in Pyrosequencing is the difficulty in determining the number of incorporated nucleotides in homopolymeric regions due to the non-linear light response following incorporation of more than 5-6 identical nucleotides. The polymerization efficiency of eight sequential G nucleotides and ten sequential G nucleotides is demonstrated in Figures 4 and 3 respectively. However to elucidate the correct number of incorporated nucleotides it may be necessary to use specific software algorithms that integrate the signals. For re-sequencing it is possible to add the nucleotide twice for a homopolymeric region to ensure complete polymerization as demonstrated in Figures 4  and 3. Software for pyrogram analysis SQA Software was recently developed to analyze tag sequences obtained by Pyrosequencing (www.pyrosequencing.com). The software operates under Windows 2 2000 system and provides a basecalling algorithm which automatically scores the nucleotide sequence and calculates a quality value, which is displayed as a color code for each nucleotide scored. The assignment of quality values is based on a number of different parameters including difference in match between the best and next best choice of nucleotide peak, agreement between expected and obtained sequence around each peak, signal-to-noise ratios variance in peak heights in the sequence and peak width. The software also calculates out of phase signals to produce a synchronized processed sequence. In addition, the tag software allows multiple additions of the same nucleotide to ensure complete polymerization in homopolymeric regions. Currently, the software does not provide comparison of pyrograms which would be useful for polymorphism discovery and mutation scanning.

Cost reduction efforts in Pyrosequencing technology
The cost for analysis of samples can be reduced by either improving the technology or decreasing the use of chemicals. We have recently developed and improved Pyrosequencing technology through reducing the cost per analysis. Most notably are developments of multiplex Pyrosequencing [15], use of a three primer system for amplification [7], development of enzymatic template preparation strategies [13,14] and use of Sepharose beads for immobilization of PCR products for Pyrosequencing (www.pyrosequencing.com). Another approach for cost reduction is to decrease the volume of the reaction and thereby to use less chemicals. Development of a 384-well based Pyrosequencing machine (PTP 384) has lowered the cost at least four folds. It is expected that miniaturization will reduce the cost for Pyrosequencing chemicals by one to three orders of magnitude. We are currently working on microfluidics and array formats for low volume Pyrosequencing analysis.