Prevalence of Human Papillomavirus Variants and Genetic Diversity in the L1 Gene and Long Control Region of HPV16, HPV31, and HPV58 Found in North-East Brazil

This study showed the prevalence of human papillomavirus (HPV) variants as well as nucleotide changes within L1 gene and LCR of the HPV16, HPV31, and HPV58 found in cervical lesions of women from North-East Brazil.


Introduction
Cervical cancer is the second most significant cause of cancer in women worldwide, with more than 529,000 new cases diagnosed and 275,000 deaths in 2011 [1]. Among these, 85% of the total number of cervical cancer cases occur in developing countries [1]. In Brazil, cervical cancer is the third most common cancer among women [2].
It is well-established that persistent infections caused by human papillomavirus (HPV) is a key aetiological factor in the development of cervical lesions and cervical cancer [3]. To date, 184 HPV types have been identified (http://www.hpvcenter.se/html/refclones.html) and 62 belong to the Alphapapillomavirus genus. Among these, epidemiological data showed that HPV16 and HPV18 are responsible for 70% of cases of invasive cervical cancer worldwide [4]. Moreover, other Alphapapillomavirus genotypes, such as HPV31, HPV33, HPV35, HPV45, HPV52, and HPV58, are involved in 18% of cases of squamous cell carcinoma cancer worldwide [4].
Previous studies revealed that different variants of HPV16 coevolved with the three main human phylogenetic branches: African, Caucasian, and Asian [5,6]. Hence, variants of HPV16 were grouped into five distinct categories spread across different geographical regions: Europe (E), Asia (As), Asian-America (AA), Africa 1 (AF-1), and Africa 2 (Af-2) [5,6]. However, recent studies have redefined variant as a nucleotide sequence that differs by approximately 1% between two or more variants of the same HPV type [7][8][9]. In addition, sublineage also was redefined as a nucleotide sequence that differs from 0.5 to 0.9% within a full genome of the same HPV type [7][8][9]. Hence, according to this analysis the HPV16 has four variant lineages (A, B, C, and D) and nine sublineages; HPV31 showed three viral lineages (A, B, and C) and seven 2 BioMed Research International  sublineages; and HPV58 has four variant lineages (A, B, C, and D) and seven sublineages [7].
Thus, the aim of this study was to detect nucleotide changes within L1 and LCR of HPV16, HPV31, and HPV58 in cervical samples obtained from North-East Brazil. In silico prediction of B-cell and T-cell epitopes in the L1 gene of HPV16, HPV31, and HPV58 was performed. Moreover, binding sites of transcriptional factors prediction were also performed in LCR of HPV16, HPV31, and HPV58. Finally, a phylogenetic analysis was conducted to determine which variants of HPV16, HPV31, and HPV58 are found in North-East Brazil.  Types 16,31,and 58. The cervical cells were collected by using cytobrush and placed in polyethylene tubes containing phosphate-buffered saline and transferred to the Molecular Studies and Experimental Therapy Laboratory (LEMTE) and stored at −80 ∘ C until analysis. Nucleic acids were extracted by means of the DNeasy Blood and Tissue Kit 135 (Qiagen), in accordance with the manufacturer's instructions. A polymerase chain reaction (PCR) was performed with the MDM2 gene to avoid the false negative and to assess the quality of the extracted DNA. Positive HPV16, HPV31, and HPV58 DNA were detected by using PCR with degenerate primers MY09/11 followed by direct sequencing. The positive HPV DNA was purified with the Invisorb Fragment Cleanup (Invitek) Kit and sequenced (in duplicate) by using BigDyeTM Terminator Cycle Sequencing Read Reaction Kit (Applied Biosystems) and ABI PRISM (Applied Biosystems) to obtain both the forward and reverse sequences. found in cervical samples were further characterized by amplification of partial sequence of L1 and LCR by means of the specific primer pairs described in Table 1. The reactions were performed with a final volume of 25 L containing 50 ng of DNA, 20 pmol of each primer, and 1X PCR Master Mix (Promega). The PCR cycling conditions were as follows: initial denaturation at 95 ∘ C for 5 minutes, 35 cycles of denaturation at 95 ∘ C for 30 seconds, annealing at 56 ∘ C for 1 minute, elongation at 72 ∘ C for 2 minutes, and a final extension at 72 ∘ C for 10 minutes. The PCR products were run on the agarose gel (1%). Following this, the amplicons were purified with the Invisorb Fragment Cleanup (Invitek) Kit and nucleotide sequences were obtained by means of fluorescent BigDyeTM Terminator Cycle Sequencing using v 3.1 Ready Reaction ABI PRISM (Applied Biosystems) to obtain both the forward and reverse sequences. PCR and sequencing were performed in duplicate.

Data Analysis.
The obtained sequences were assembled by means of the Staden package [38]. They were then evaluated to determine the nucleotide divergence relative to the nucleotide sequences of HPV16 (K02718), HPV31 (J04353), and HPV58 (D90400). Sequence comparisons were carried out using the Basic Local Alignment Search Tool (BLAST) and multiple alignments were performed by using the CLUSTALW (Mega 5.2, Beta version) program [39].
The Neighbor-Joining algorithm and the Kimura 2-Parameter model trees, with 1000 bootstrapped replicates, were built by using the MEGA package, version 5.2 [39]. Phylogenetic analyses were performed with LCR sequences of HPV16, HPV31, and HPV58. The partial sequence of L1 and LCR genes of the HPV16, HPV31, and HPV58 was deposited in the NCBI GenBank database, under the following accession numbers: HPV16 L1 gene: KJ467225-467238; HPV16 LCR: KJ452220-452242; HPV31 L1 gene: KJ452216-452219; HPV31 LCR: KJ435060-435067; HPV58 L1 gene: KJ467239-477246; HPV58 LCR: KJ567247-467252. The references for the viral sequences used to construct the phylogenetic branches were collected from the GenBank sequence database and are listed in Table 2.

B-Cell and T-Cell Epitope Prediction.
The putative impact of the HPV variants was estimated in silico by predicting the B-cell and T-cell epitopes. In this study, it was assumed that changes in the amino acid sequences of L1 proteins within the B-cell epitope regions could affect the binding affinities of the neutralizing antibodies and in the case of the T-cell did not initiate an epitope-specific immune response. Thus, the B-cell epitope of prototype sequences was predicted by means of the BcePred server, which is available from URL: http://www.imtech.res.in/raghava/bcepred/. The prediction was carried out with the aid of physicochemical parameters, such as hydrophilicity, flexibility/mobility, accessibility, polarity, exposed surface, turns and antigenic propensity [40].

Characteristics of the Population.
A total of 206 cervical smear tests were carried out to detect HPV DNA and the results showed that 121 (59%) were positive for HPV. Among these, 94/121 (77.7%) were infected with one HPV type and 27/121 (22.3%) were infected with more than one HPV type ( Figure 1). As a single infection, 40 (Figure 1). Positive samples of HPV16, HPV31, and HPV58 were separated from the total, for molecular characterization.   Table 3.
With regard to the HPV16 LCR sequences, thirty nucleotide changes were observed, in which 24/30 (80%) are embedded in the binding sites of transcriptional factors. The most common variations, insC7434, C7436G and delA7869, were found in 100% of the samples and are embedded in the YY1, NF-1, and E2 binding sites of the transcriptional factors, respectively. Moreover, the G7521A variation was found in 10/23 (43.5%) of the total number of samples, followed by A7489C (39.1%), G7493A (39.1%), C7693A (39.1%), C7768T (39.1%), and C7790T (39.1%). Among these variations, C7693A and C7790T are embedded in the E2 and YY1 binding sites, respectively. Some of these nucleotide changes are "diagnostic SNPs" that are conducted to detect the lineages and sublineages of HPV16 [15]. The detected variations are summarized in Table 4.
The phylogenetic analyses showed that 63.63% ( there was no evidence of premature stop codons in the HPV31 L1 gene of variants. The detected variations are summarized in Table 5. With regard to the HPV31 LCR nucleotide sequence, fragments of 883 pb were analyzed. Among these, thirty nucleotide changes were observed, 14/30 (47%) of which are embedded in the binding sites of the transcriptional factors. The most common variations were a deletion of TGTTCCT-GCT at positions 7341-7450 (8/8, 100%) and located within the transcriptional binding sites of NF-1. C7480T and T7871G were found in 20% of the samples and are located within the binding sites of the E2 transcriptional factor. The detected variations are summarized in Table 6.
The phylogenetic analysis showed 62.5% of variants are clustered into A branches ( = 5) and 37.5% are clustered into C branches ( = 3) (Figure 3). However, there were no observed variants clustered into B branches.

HPV58.
The HPV58 L1 gene was analyzed through an alignment of the 1264 pb nucleotide sequence. Altogether, thirty-five single nucleotide polymorphisms were found, seven of which (7/35) were nonsynonymous variations. The most common are nucleotide changes were A6540G (I335M), C6828A (N422D), A6014C (L150F), G5994A (V144I), A6799G (I412V), G6823A (D420N), and C6689A (T375N); these are either located in the external loop (DE/HI loop) or alpha helix (H2 and H3) regions. Moreover, these polymorphisms are embedded in B-cell and/or T-cell epitopes. Compared with the prototype HPV sequence (D90400.1), insertion and deletion events were not identified and there was no evidence of premature stop codons or nucleotide  Table 7.
With regard to the HPV58 LCR sequences, thirty-five nucleotide changes were observed, in which 12/35 are embedded within the binding sites of transcriptional factors. The most common variations C7745A and A7794G were found in 50% of the samples and are embedded within the NF-1 and E2 binding sites of transcriptional factors, respectively. The detected variations are summarized in Table 8.
In addition, the phylogenetic analyses showed 50% of isolates belong to the A variant, followed by B (16.6%), C (16.6%), and D (16.6%) variants ( Figure 4).   Table 2. Viral lineages analyzed in this study (KJ452220-452242) are clustered into A and D branches.
Only bootstrap values above 50% are represented in the branches.

Discussion
Several studies have demonstrated that variants of HPV16, HPV31, and HPV58 may affect the oncogenicity, persistence, and progression of viral infection [6,[10][11][12][13][14][15][16][17][18][19][20][21][22][24][25][26]. In this study, we evaluated the genetic diversity within L1 and LCR of HPV16, HPV31, and HPV58 in cervical samples from North-East Brazil. With regard to the HPV16, 23 nucleotide changes in L1 gene and 30 nucleotide changes in LCR were found. In addition, 9 nucleotide changes were found in L1 gene of HPV31 and 30 nucleotide changes also were found in LCR of HPV31. Moreover, 35 nucleotide changes in the L1 gene and LCR of HPV58 were found. Some of these nucleotide changes are putatively found in T-cell or B-cell epitope and in binding sites of transcriptional factor. Furthermore, two nucleotide changes in LCR of HPV31 and one deletion of seven base pair in LCR of HPV58 were described for the first time in this study. As far as we are aware, this is the first study of the genetic diversity of HPV16, HPV31, and HPV58 L1 and LCR in cervical samples from North-East Brazil. Nucleotide changes within the HPV16 L1 gene can play an important role in the structure of the capsid protein, immune recognition, and viral neutralization [45]. Hence, viral polymorphisms in the L1 gene can affect the self-assembly of L1 protein in virus-like particles (VLPs) [46]. As a result, Kirnbauer et al. demonstrated that nucleotides change   Table 2. Three clusters were identified as lineages A, B, and C. Viral lineages analyzed in this study (KJ435060-KJ435067) are clustered into A and C branches. Only bootstrap values above 50% are represented in the branches.  Table 2. Viral lineages analyzed in this study (KJ467247-KJ467252) are clustered into A, B, C, and D branches. Only bootstrap values above 50% are represented in the branches.
C6240G, and this leads to a change in the amino acid at position H202D, which is self-assembled within the VLPs with more efficiency in a heterologous system than with a prototype sequence [47]. In addition, it was found that variations in the 83-97 residues of the L1 gene have an impact on the yield of the L1 protein [48]. The nonsynonymous variations found in the L1 gene of HPV16, HPV31, and HPV58 of this study were reported in previous studies [49][50][51][52][53][54][55][56][57][58]. Some of these polymorphisms are located within hypervariable immuno-dominant regions (BC, DE, EF, FG, and HI loops) of L1 protein, which can be recognized as conformational epitopes of HPV [59,60]. For instance, the A6436G polymorphisms (T292A) found in HPV16 and A6025G (T267A) and C6379A (T274N) found in HPV58 L1 genes are located in the FG loop of the L1 protein. In addition, the A6697C polymorphism (T379P) of HPV16 and C6689A (T375N) of HPV58 are located within the HI loop. Both the FG and HI loop constitute the immunodominant epitope region [61]. Furthermore, the polymorphisms found in helix 4 (including threonine and serine at 448 and 465 of L1 protein of HPV16) are implicated in the VLP formation [60]. Nucleotide variation within LCR may influence the binding affinity of the cellular and viral transcriptional factor. For instance, nucleotide changes may result in a loss or insertion of transcriptional factors that regulate the transcription of the of HR-HPV genes [62]. Hence, nucleotide changes in LCR of specific variants of HPV16, HPV31, and HPV58 may be involved in the alteration in the E6 and/or E7 oncogenes expression which could explain the potential carcinogenesis of some variants [62]. Some of the variations reported in this work are embedded in the putative binding sites for E2, C/EBPbeta, YY1, AP-1, NF-1, and Oct-1 transcriptional factors. These viral and cellular transcriptional factors are involved with early viral genes and differentiation of the epithelium, respectively. Hence, the nucleotide changes found in LCR of HPV16, HPV31, and HPV58 could be an impact directly or indirectly in the expression of E6 and E7 oncogenes.
In addition, we performed a phylogenetic analysis of HPV16 by using fragments of LCR. The results showed 63.63% of isolates belong to the A variant and 33.36% belong to the D variant for HPV16. These results are similar to the previous study performed in Central-West Brazil [63], which showed high prevalence of A and D variants and low frequency of B and C variants. In contrast, a recent study performed in South-Eastern Brazil showed A and C variants as the most prevalent, followed by D and B variants [64]. A previous study in 27 countries and using 953 cervical samples showed the A variant as the most prevalent, followed by C, B, and D variants [8]. These differences in the prevalence of HPV16 variants in different regions of Brazil and worldwide may be explained by geographic origin and ethnicity of the infected patients.
The LCR contains more phylogenetic information than other regions of the HPV16 genome and can distinguish both the lineages and sublineages [8]. Due to the lineage fixation and a putative nonrecombination process, studies have proposed diagnostic polymorphisms to classify both HPV16 lineages and sublineages [8,9]. Cornet et al. proposed that variant lineages could be detected by using 32 SNP combinations in the LCR of HPV16 [8]. In the light of this, some of these diagnostic SNPs were found in the present study. For instance, the T7747G found in seven isolates of this study are diagnostic SNPs for the AA1 sublineage. Furthermore, G7891G found in seven isolates are diagnostic SNPs for the AA2 sublineage. Both AA1 and AA2 sublineages belong to lineage D [8].
With regard to the HPV31, the phylogenetic trees showed the presence of A and C variants in the North-East Brazil. These results are similar to the results obtained by Chagas et al., which reported high prevalence of A and C variants and very low prevalence of B variant in North-East Brazil [31,32]. In addition, a recent study performed in Northern China also showed high prevalence of A and C variants [65]. In this study, we did not find any variants that belong to variant B, which was probably due to the small number of isolates analysed or the low frequency of this isolate in the North-East Brazil.
With regard to HPV58, variants that belong to the A, B, C, and D variants were found in North-East Brazil. In this study, A variant was the most prevalent (50%), followed by B (16.6%), C (16.6%), and D (16.6%) variants. In contrast, variant distribution worldwide and in the American continent showed the A variant as the most prevalent, followed by C, D, and E variants [66]. Additional studies should be performed  to clarify whether these differences in prevalence of HPV58 variants are due to small number of isolates analysed or differences in prevalence of HPV58 variants in North-East Brazil.
In summary, this study reported the prevalence of HPV16, HPV31, and HPV58 variants and sequence variations in the L1 gene and LCR of HPV16, HPV31, and HPV58 isolates from North-East Brazil. Some of the polymorphisms found in the L1 gene are embedded within B-cell or T-cell epitopes. Moreover, some of the variations found in LCR are located within binding sites of transcriptional factors. Further studies should be carried out to throw light on both the pathological differences and the prevalence of these variants in different geographical regions.