Novel, Non-Radioactive, Simple and Multiplex PCR-cRFLP Methods for Genotyping Human SP-A and SP-D Marker Alleles

We have previously identified an allele of the human SP-A2 gene that occurs with greater frequency in an RDS population [12]. Because of the importance of SP-A in normal lung function and its newly emerging role in innate host defense and regu-lation of inflammatory processes, we wish to better characterize genotypes of both SP-A1 and SP-A2 genes. It has been determined that SP-D shares similar roles in immune response. Therefore, in this report we 1) describe a novel, non radioactive PCR based-cRFLP method for genotyping both SP-A and SP-D; 2) describe two previously unpublished biallelic polymorphisms within the SP-D gene; 3) present the partial sequence of one new SP-A1 allele (6A14) and describe other new SP-A1 and SP-A2 alleles; and 4) describe additional methodologies for SP-A genotype assessment. The ability to more accurately and efficiently genotype samples from individuals with various pulmonary diseases will facilitate population and family based association studies. Genetic poly-morphisms may be identified that partially explain individual disease susceptibility and/or treatment effectiveness.


INTRODUCTION
Pulmonary surfactant is essential for normal lung function. It is a lipoprotein complex and consists of phospholipids and surfactant proteins (SP)-A, B, C, and D. Deficiency of surfactant or impaired surfactant activity could lead to a variety of pulmonary diseases [6,8]. Although compromised or altered surfactant states in a normal healthy individual may not be problematic, in the face of a variety of stresses these may manifest themselves into a disease. The surfactant proteins have been shown in the last 10-15 years to play important roles in surfactant physiology and biology [5,6,8]. In addition, SP-A and SP-D are involved in the innate local host defense and the regulation of inflammatory processes in the lung [23].
Alterations in the levels of the surfactant proteins have been associated with respiratory distress syndrome (RDS) in the prematurely born infant [2,21], with acute respiratory distress syndrome (ARDS) [9], and with other pulmonary diseases [6]. Genetic alterations in surfactant protein genes have been associated with disease as well. Mutations in the SP-B gene that lead to SP-B deficiency have been described in congenital alveolar proteinosis [7,20,22] and certain SP-A and SP-B alleles have been observed with higher frequency in infants with RDS [12]. Genetic alterations in the genes responsible for the lipid component of surfactant have not been described, but these are likely to affect other organs, and thus may be lethal and not observed.
Novel, Non-Radioactive, Simple and Multiplex PCR-cRFLP Methods for Genotyping Human SP-A and SP-D Marker Alleles Therefore, the available information indicates that the surfactant protein genes may be good candidate genes in the study of genetic contribution to the pathogenesis of pulmonary disease.
The SP-A and SP-B loci are shown to be polymorphic and methods to determine SP-A [4] and SP-B [19,20] genotypes have been published. The human SP-A locus consists of two (SP-A1 and SP-A2) functional genes in opposite transcriptional orientation and one pseudogene [10]. A number of alleles have been characterized for each human SP-A gene [5]. Because both SP-A genes and their respective alleles share a high degree of sequence similarity (> 95%), SP-A genotyping has been challenging.
We have previously described a method to distinguish the two human SP-A genes, SP-A1 and SP-A2, and their corresponding alleles [4]. In this method we took advantage of the biallelic nucleotide variants at specific positions within the coding region as well as the ordered sequence of these variants within the various SP-A sequences in order to make gene specific and allelic designations possible. Biallelic nucleotide variants that result in a change of the encoded amino acid and distinguish the SP-A1 gene and its corresponding alleles from the SP-A2 gene and its corresponding alleles occur at residues 66, 73, 81 and 85. Biallelic SP-A1 nucleotide variants that change the encoded amino acid and distinguish SP-A1 alleles from one another occur at amino acids 19, 50 and 219, and for SP-A2 at amino acids 9, 91 and 223. In addition nucleotide differences that do not change the encoded amino acid of the SP-A alleles have also been characterized, and used for the designation of the individual alleles. Therefore, the pattern of various combinations of polymorphisms (at specified amino acids) serves as a key to distinguish one allele from another.
The genotyping method we described previously [4] is based on gene specific PCR amplification and allele specific hybridization. Using this hybridization and direct sequencing we have reported 5 alleles (6A, 6A 2 , 6A 3 , 6A 4 and 6A 5 ) for the SP-A1 gene and 6 alleles (1A, 1A 0 , 1A 1 , 1A 2 , 1A 3 , 1A 4 ) for the SP-A2 gene. Families and twins were genotyped and showed consistent patterns of allelic inheritance. An association between certain genotypes and low levels of mRNA [14] was possible through this genotyping method. Moreover, using the genotype data from > 200 unrelated individuals and the EH linkage program [25] we were able to identify SP-A haplotypes [4]. Although all of these findings confirmed that this method is reliable, genotyping human SP-A variants by hybridization remained cumbersome and problematic.
In the present report, we describe a new method for SP-A and SP-D genotyping. These methods are based on a PCR-cRFLP analysis described previously in detail for SP-B [20]. PCR primers are designed to contain nucleotide(s) mismatched to the sequence flanking a single base polymorphism in order to create a restriction enzyme recognition site including the SNP. These are then used to amplify gene specific products that contain sequences from both alleles. The converted PCR products are digested with the appropriate enzyme, separated on PAGE and visualized with ethidium bromide staining, in order to assess whether the given sample is homozygous for either allele or heterozygous. The advantages of the new method include the lack of use of radioactivity, clearer results, time efficiency (no film exposures, filter stripping), and better reproducibility. The ability to identify and characterize surfactant protein genotypes in a timely, efficient, and reproducible manner, will facilitate population and family based association studies. Populations with different pulmonary diseases can be genotyped in the hope of identifying genetic polymorphisms that may in part, explain the genetic basis of individual variability to disease susceptibility, and therefore identify disease subgroups and/or response to treatment.

PCR Amplification
SP-A1 and SP-A2 gene specific 3.3 kb fragments are amplified from genomic DNA prepared from blood. All PCR is done in an Ericomp Twin Block System thermocycler. Gene specific sense primers 326 (SP-A1) or 327 (SP-A2) are used with common antisense oligonucleotide 68A in a total volume of 30 / (see below for primer sequences). The reaction mixture contains 1.5 / [ %XIIHU 1 and 1.5 / 10x Buffer 2 (Roche), 2.4 / dNTPs (1.25 mM each dNTP), 0.3 / RI HDFK oligo (#327-100 ng/λ; #326 and #68A-50 ng/λ), and 0.52 units of Expand Long enzyme (Roche). The cycling conditions are an initial denaturation of 95 ºC for 2 min., followed by 33 cycles of 95 ºC 30 sec., 58 ºC 30 sec. and 72 ºC 3 min. followed by a final extension at 72 ºC for 5 min. Genomic clones containing an SP-A1 allele and an SP-A2 allele are routinely included as controls of gene specificity in each set of reactions. PCR products from each gene specific reaction are run on a 1% agarose gel and stained with ethidium bromide to assess the intensity of each product, and then diluted either 1:9 or 1:99 in H 2 O (based on ethidium bromide DNA-stained intensity). These dilutions are used as template for further analysis in PCR-based cRFLP genotyping method as described below.
As shown schematically in Table 1, to identify the allele for an SP-A1 gene, one needs to determine nucleotides at five positions, aa 19, 50, 62, 133 and 219, and each codon corresponds to a different amino acid. The nucleotide change may or may not change the encoded amino acid. Similarly, for the designation of the SP-A2 alleles four such nucleotide determinations, aa 9, 91, 140 and 223, are needed. Table 2 shows the converted primers used for each amplification, the mismatch needed to create the recognition site, and the restriction endonuclease enzyme Table 1 The biallelic polymorphisms for both SP-A1 and SP-A2 SP-A1 aa 9 + aa 19 * aa 50 * aa 62 aa 85 aa 91 + aa 133 aa 140 aa 219 * aa 223

or C Asn or Thr Pro or Ala Lys or Gln
Various combinations of these biallelic markers on any given allele result in an allelic designation 1A x or 6A x . Currently, 15 alleles have been identified for SP-A1 and 15 alleles for SP-A2. Amino acid (aa) numbers followed by a * indicate bp differences that result in a change of the particular amino acid between two alleles of SP-A1 and a+ indicates bp differences that result in a change of amino acid between two alleles of SP-A2. Amino acid 85 is one of the invariant amino acids that distinguishes SP-A1 from SP-A2 and the codon for this amino acid is used to confirm gene specificity. The corresponding amino acid appears below the nucleotide designation. The nucleotide changes at the other amino acid locations do not change the encoded amino acid.
used. Table 3 shows the other primers used in this study. To determine SP-A1 alleles, con-verted PCR (cPCR) is used for aa 19  The 30 / reaction conditions for the cPCR are as follows: 1 / (of 1:9 or 1:99 dilution from above) of the gene specific 3.3 kb PCR product, In addition to using genomic clones to confirm the specificity of the initial 3.3 kb gene specific reaction, we also do a converted PCR reaction of the PCR gene specific products using primer pair, 807/18. This checks the nucleotide at aa 85 which is considered to be an invariant nucleotide and is used to distinguish the SP-A1 gene from the SP-A2 and their corresponding alleles [13]. The PCR conditions match those of primer pair 726/96 and the cycling conditions match those of primer pair 799/28 except 35 cycles are performed instead of 30.
To determine the presence of the 11 bp insertion in the SP-A2 3'UT (as discussed in the Results and Discussion Section) primer pair 854/855 is used in the PCR reaction. The reaction conditions are those of primers 726/96, with cycling conditions at 95 ºC for 2 min followed by 35 cycles of 95 ºC 30 sec., 58 ºC 1 min., 72 ºC 1 min. and then a final extension of 72 ºC for 5 min.

PCR Amplification
The cRFLP method is used for two biallelic polymorphisms at the SP-D codons for aa11 and aa160 (our unpublished observations). To improve the final PCR product yield and specificity, a larger fragment is first amplified from genomic DNA, which then serves as template for a nested reaction using one converted primer. Converted primers and the corresponding information are shown in Table 2, all other primers are listed in Table 3. For the polymorphism at the codon for amino acid 11, the first reaction is with primer pair 930/999 and the nested PCR with primers 825/920. For the polymorphism at the codon for amino acid 160, the initial reaction is with primer pair 904/936 and the nested PCR with primers

Cloning and sequencing analysis of a new SP-A1 allele, the 6A 14
Based on the genotyping results of > 2500 samples, the frequency of 6A 14 was > 0.01. Therefore, it became important to verify the 6A 14 sequence by cloning and sequencing of the 6A 14 Table 3 Non-converted primers for cRFLP analysis of SP-A and SP-D

Scoring of known SP-A alleles
We have developed a non-radioactive method for genotyping of human SP-A1 and SP-A2 alleles. This method is based on PCR-cRFLP analysis and yields reliable results for all known SP-A alleles in a time efficient manner. The scoring of the SP-A alleles is done after evaluation of the digested products of the converted PCR and comparison to the corresponding undigested PCR product. From this, it can be determined whether a sample is completely uncut, completely cut or heterozygous at the biallelic polymorphic site. In some heterozygous samples the digested product is sometimes of low intensity upon ethidium bromide staining, compared to the uncut or in some homozygous cut samples a trace of uncut DNA may be visible. By testing many samples, the appropriate appearance of bands for a heterozygous sample can be determined. Thus, the pattern that is consistently reproduced for a given set of primers is used as the standard to determine all heterozygotes for that specific primer pair. Figure 1A  (lanes 5, 8) depicts an example of a digested product with low staining intensity compared to the uncut allele, and Figure 1B lane 2 depicts two products of equal intensity following enzyme digestion (see details below). Figures 1 and 2 show the expected digestion patterns for all primer pairs used in this genotype analysis. Primer pairs and the corresponding amino acid are listed below the lanes of the gels. After digestion, a sample will appear as homozygous uncut, heterozygous, or homozygous cut depending on the nucleotide contributed by the polymorphic site. The resulting fragment sizes reflect the specific nucleotide present at the polymorphic site. Table 2 shows the restriction enzyme, the 2 possible alleles, and the resulting fragment sizes for each allele. The details of alleles indicated by each band are shown in figure legends. Figure 1, Panel A shows the gel with the PCR digestion products for the 5 amino acids necessary to assign a genotype for SP-A1. Lanes 1-3 are for aa 19, lanes 4-6, aa 50, lanes 7-9 are for both aa 62 (lower portion of lane) and aa 133 (upper portion of lane), and lanes 10-12 are for aa 219. For example, in lanes 1-3 amplified product using oligos 765/787 has been digested with Bbv1. If the "T" allele is present, no digestion will occur and a 143 bp band is visible. If a "C" allele is present the 143 bp product will be digested to yield 2 bands of 103 bp and 40 bp (the 40 bp product is not visible on the gel). The   Figure 2 shows the digestion patterns for the two biallelic polymorphisms in SP-D. Lanes 1-3 are for aa 11 and lanes 4-6 for aa 160.
For any given amino acid, genomic controls are determined, sequenced at the polymorphic site, and then used in subsequent reactions as controls for the given digestion.

Genotype analysis reveals new SP-A patterns
When the genotyping results of a given genomic DNA sample reveals an SP-A biallelic variant pattern that has not been previously observed then the SP-A coding regions are sequenced to confirm the particular pattern. For example, when alleles 1A 4 and 6A 5 were first A B Fig. 1. Allelic patterns for SP-A1 and SP-A2. The fragments containing SNPs were amplified by converted PCR with the primer pairs listed below the gel. The PCR products were digested with appropriate restriction enzymes (Table 2), separated by 8% PAGE, and stained with ethidium bromide. Panel A shows all possible SP-A1 allelic patterns for aa 19, aa 50, aa 62, aa 133, and aa 219 as amplified by the primer pairs listed below the gel. Corresponding fragment sizes can be found in Table 2 discovered [13] and were completely sequenced these were found to contain nucleotide differences at amino acid 133 (SP-A1), and at amino acids 140 and 202 (SP-A2). Although these 3 polymorphisms had not been previously targeted with the hybridization procedure [4], these polymorphic sites were subsequently incorporated into the current genotyping methodology. After studying an additional 200 samples, amino acid 202, which was useful only for the detection of the rare 1A 4 allele [13], was omitted from the methodology, due to its limited value. Because we are now using biallelic markers at aa 133 and aa 140, we are able to distinguish more alleles than was possible with the old hybridization method (Tables 4 and 5).

Challenges in assigning genotypes
As combinations of nucleotides at specified amino acids are determined, problems can occur in assigning genotypes. A genotype of 1A 0 1A 1 has the same pattern as a genotype of 1A 2 1A 3 (Table 4), and thus it is not possible, without additional information or prior relevant knowledge, to determine which SP-A2 genotype is correct. This difficulty stems from the fact that both SP-A2 alleles are observed simultaneously and we cannot determine which nucleotides are contributed by each SP-A2 allele. The SP-A1 and SP-A2 genes are linked [10] and association studies of unrelated individuals have previously provided SP-A haplotype information that we use at times to distinguish between possible genotypes [4]. The 1A 0 allele has been shown to associate with 6A 2 , 6A 3 , or 6A 4 , the 1A 1 allele with 6A 2 or 6A 3 , and the 1A 2 has been shown to associate only with the 6A 4 allele. Therefore, if the SP-A1 genotype is known and a 6A 4 allele is not present we assume that the SP-A2 genotype is 1A 0 1A 1 and not 1A 2 1A 3 . Currently, all assigned genotypes of 1A 0 1A 1 have corresponded to 6A 2 6A 3 . Data from CEPH families (our unpublished observations) have confirmed haplotypes 1A 1 6A 3 and 1A 0 6A 2 .
There is no difficulty in scoring all currently known fully sequenced (Table 5) SP-A1 alleles, individually or in combination. However, a A G T C 1A 10 C C T A 1A 12 C C C A 1A 12 A C C C 1A 13 A C T A * Fully sequenced alleles.  challenge may arise to correctly assign a 6A 2 or 6A 3 allele when the sequence pattern of either one of these appears in combination with a possibly new, unknown allele. To correctly score the 6A 2 and 6A 3 alleles, there is an additional PCR reaction that takes advantage of an 11bp size difference found in 3'UTR approximately 400 bp past the translation termination codon [18]. All of the more abundant SP-A2 alleles tested (1A, 1A 0 , 1A 1 , 1A 2 , 1A 3 , 1A 5 ) and only the SP-A1, 6A 2 and the rare 6A 15 , alleles contain this 11 bp insertion. The 11 bp determination has become essential for distinguishing 6A 2 and 6A 3 under certain circumstances. In this reaction, we use gene-specific template to amplify the region flanking the 11 bp and then the size difference is visualized by PAGE. A band size of 187 bp indicates that the 11 bp insertion is present and a 6A 2 can be assigned. Alternatively, a 176 bp band indicates that the 11 bp is missing and a 6A 3  can be assigned (Figure 3). Once the correct designation of the 6A 2 or 6A 3 allele is made, the remaining pattern is designated as the other (new) allele and can be assessed at the specified nucleotide positions. Using this method we genotyped more than 2500 samples to determine the frequency of the SP-A alleles in the general population. The distribution of the more abundant (> 1%) SP-A1 and SP-A2 alleles is shown in Figure 4. Rare alleles (< 1%) are not shown.

Comments
Human SP-A exhibits extensive complexity and heterogeneity at the genetic [5,14], mRNA level [13], and protein levels [16,26], and plays multiple roles in two major groups of functions: the surfactant related functions [3,11,17,24] and those of innate host defense and the regulation of inflammatory processes [23]. Altered levels of SP-A have been observed in several diseases, and SP-A has been implicated in the susceptibility to and/or pathogenesis of several pulmonary diseases [6]. We believe that the genotype method described in this report will be essential for SP-A genotype-phenotype correlations in population and family based studies for several disease groups.