Polymorphism of E6 and E7 Genes in Human Papillomavirus Types 31 and 33 in Northeast China

Persistent infection with human papillomavirus (HPV) types 31 and 33 is an important causative factor for cervical cancer. The E6/E7 genes are key oncogenes involved in the immortalization and transformation of human epithelial cells. Genetic polymorphism may lead to differences in the virus' carcinogenic potential, the immune reaction of the host, and the potencies of vaccines. Few studies on HPV31/33 E6/E7 genetic polymorphism have been carried out. To study the genetic polymorphism of HPV31 and HPV33 E6/E7 genes in northeast China, these genes (HPV31 E6/E7, n = 151; HPV33 E6/E7, n = 136) were sequenced and compared to reference sequences (J04353.1, M12732.1) using BioEdit. Phylogenetic trees were constructed by the neighbor-joining method using MegaX. The diversity of the secondary structure was estimated using the PSIPred server. The positively selected sites were analyzed using PAML4.9. The major histocompatibility complex (MHC) class I and MHCII epitopes were predicted using the ProPred-I server and ProPredserver. B-cell epitopes were predicted using the ABCpred server. In the 151 HPV31E6 sequences, 25 (25/450) single-nucleotide mutations were found, 14 of which were synonymous mutations and 11 were nonsynonymous. In the 151 HPV31E7 sequences, 8 (8/297) nucleotide mutations were found, 3 of which were synonymous mutations and 5 were nonsynonymous. In the 136 HPV33E6 sequences, 17 (17/450) nucleotide mutations were observed, 7 of which were synonymous mutations and 10 were nonsynonymous. C14T/G (T5I/S) was a triallelic mutation. Finally, in the 136 HPV33E7 sequences, 9 (9/294) nucleotide mutations were observed, 3 of which were synonymous mutations and 6 were nonsynonymous. C134T/A (A45V/E) and C278G/A (T93S/N) were triallelic mutations. Lineage A was the most common lineage in both HPV31 and HPV33. In all of the sequences, we only identified one positively selected site, HPV33 E6 (K93N). Most nonsynonymous mutations were localized at sites belonging to MHC and/or B-cell predicted epitopes. Data obtained in this study should contribute to the development and application of detection probes, targeted drugs, and vaccines.


Introduction
Cervical cancer is the most common genital malignancy among females. Nearly 530,000 cases of it are newly diagnosed every year globally [1], more than 130,000 of which are in China. It is widely accepted that persistent infection with high-risk human papillomavirus (HR-HPV) is the most important risk factor for cervical cancer [2]. Te risk of developing cervical cancer in HPV-infected patients is 50fold higher than that in uninfected women [3,4]. High-risk human papillomavirus (HR-HPV) includes 15 diferent types: HPV16, HPV18, HPV31, HPV33, HPV35, HPV39, HPV45, HPV51, HPV52, HPV56, HPV58, HPV59, HPV68, HPV73, and HPV82. Each type can also be divided into multiple sublineages according to genetic polymorphism. Studies have shown that the distribution of diferent sublineages of the same type of HR-HPV difers signifcantly among age groups and regions, which may lead to the corresponding discrepancies in cervical lesions in the infected population [5][6][7]. Furthermore, genetic variations of HR-HPV show ethnic and geographical diferences [8,9]. Some of these genetic variations can change the amino acid sequences, which can in turn afect the carcinogenic potential of the virus, the immune reaction of the host, and the potency of vaccines. Te proteins E6 and E7, which are expressed at an early stage of viral infection, are the oncoproteins that have been studied most in HR-HPV. Tey are the key oncoproteins involved in the immortalization and transformation of human epithelial cells and act through their interplay with multiple host proteins [10].
Te p53 tumor suppressor protein is a main target of E6. E6 also activates telomerase expression and regulates the activity of PDZ domain-containing and tumor necrosis factor receptors [11]. E7 proteins have been deemed classic targets of the retinoblastoma (Rb) family of proteins. Tese proteins control the activity of E2F transcription factors' degradation, leading to an increase in the expression of E2Fresponsive genes. Some researchers have shown that knocking down the expression of E6 and E7 in cervical cancer cells can suppress their growth and induce apoptosis [12]. Terefore, HR-HPV E6 and E7 are potential targets for diagnosis and therapeutic vaccine design [13].
In summary, the genetic polymorphism of the E6 and E7 genes in HR-HPV warrants considerable attention. A large number of studies have also shown that HPV16 and HPV18 E6 and E7 gene polymorphisms are of great signifcance for the biological function of viruses and related disease progression. HPV31 and HPV33 are particularly prevalent in Western countries rather than Asian ones. Tey also have strong carcinogenic potential. Several nonsynonymous mutations have been found in HPV31/HPV33E6 and E7 globally as well as in southwest China. Tese nonsynonymous mutations may afect the structures and functions of proteins by changing their amino acid sequences, thereby afecting the viability of viruses, their carcinogenic potential, and their interactions with host cells. Tis may in turn afect the detection of viruses, the treatment of virus-related diseases, and the efectiveness of vaccines. Research has also shown that sublineages of HPV31 and HPV33 show diferences in E6 and E7 depending on the geographical distribution. However, there have been few studies on the genetic polymorphism of the E6 and E7 genes in HPV31 and HPV33 found in northeast China. Terefore, in this study, we used PCR amplifcation to identify single-nucleotide polymorphisms of the E6 and E7 genes in HPV31 and HPV33 in northeast China. Ten, we analyzed their genetic polymorphisms, intratypic variations, phylogeny, and positive selection to characterize the distribution of polymorphism in the E6 and E7 genes in HPV31 and HPV33 in northeast China. Although the available data are still limited, our results provide robust information that will contribute to the development of diagnostic probes, targeted therapies, and vaccines based on the E6 and E7 genes in HPV31 and HPV33. 1.0 μl each of primers F1 and R1, and 7.5 μl of nuclease-free water. PCR reactions were preheated for 2 min at 95°C; followed by 30 cycles of 95°C for 30 s, 55°C for 30 s, and 72°C for 90 s; and then a fnal extension step at 72°C for 10 min. Next, we used the products of the previous step to perform a second amplifcation in a 50 µl reaction volume containing 2.0 μl of products of the previous step, 25 μl of GoTaq ® Green Master Mix, 2×, 2.0 μl each of primers F2 and R2, and 19 μl of nuclease-free water. Tese reactions were performed by the same procedure as described above. Finally, the PCR products were visualized by 1% agarose gel electrophoresis and sent to Invitrogen for sequencing.

Variant Analysis and Phylogenetic Tree Construction.
All nucleotide sequences containing E6 and E7 were aligned and compared with the corresponding reference sequences HPV31J04353.1 and HPV33 M12732.1, respectively, by BioEdit. Te nucleotide sequences were translated into proteins using MEGAX Alignment Explorer for the determination of the amino acid changes caused by nucleotide polymorphism. Te secondary structures of the reference proteins were predicted using the PSIPred server (https:// bioinf.cs.ucl.ac.uk/psipred/) with the default settings. Neighbor-joining phylogenetic trees of the E6 and E7 variants in HPV31 and HPV33 were subsequently constructed using MEGAX with the Maximum Composite Likelihood model. Te tree topology was evaluated by employing bootstrap resampling 1000 times. Te reference sequences used in this study to construct the phylogenetic branches were obtained from the GenBank sequence database, as described in previous studies. Numbers above the branches indicate the bootstrap values. All data were confrmed by repeating the PCR amplifcation and sequence analyses at least twice.

Selective Pressure Analysis.
To identify sites potentially undergoing positive selection in the E6 and E7 genes of HPV31 and HPV33, the CodeML program in PAML 4.9 software (Phylogenetic Analyses by Maximum Likelihood, https://abacus.gene.ucl.ac.uk/software/paml.html) was used. Tis program performed likelihood ratio tests (LRTs) to infer nonsynonymous and synonymous nucleotide divergence for coding regions, employing the method described by Nei and Gojobor [14,15]. Te proteins encoded by the E6 and E7 genes of HPV31 and HPV33 were aligned by MegaX.

Statistical Analysis.
Te distributions of the identifed variants associated with cervical disease were assessed by Fisher's exact test. P < 0.05 was considered to represent a statistically signifcant diference.

Results
Among all of the HPV31 and HPV33 samples, 151 sequences of the E6 and E7 genes of HPV31 and 136 sequences of those of HPV33 were amplifed and sequenced successfully. Te failed cases may have occurred because of the small number of copies of infected HPV in host cells and the limited amplicons obtained for sequencing.
In the E7 gene, we found 8 (8/297) nucleotide positions that difered from the reference. 3 (37.50%) of them were synonymous substitutions while 5 (62.50%) were nonsynonymous. Te isolates were divided into 8 subtypes named HPV31E701-HPV31E708, based on their mutations. No isolates showed complete homology with the reference. HPV31E707-HPV31E709 were found for the frst time. All nucleotide variations and corresponding amino acid changes of the HPV31 E6 and E7 genes are listed in Tables 2  and 3. Te results showed that E7 was clearly more conserved than E6. None of the mutations resulted in a frameshift or a premature stop codon. Te secondary structures of the HPV31 E6 and E7 proteins are shown in Figures 1 and 2, while the secondary structures associated with the nonsynonymous mutations of HPV31 E6 and E7 are listed in Tables 2 and 3. Among the 151 HPV31 E6 sequences, lineage A (74,49.01%) had the highest frequency, followed by lineage C (62,41.06%) and lineage B (15,9.93%) ( Table 2 and Figure 3(a)). In addition, among the 151 HPV31E7 sequences, lineage A (74,49.01%) showed the highest frequency, followed by lineage C (71,47.02%) and lineage B (6,3.97%) ( Table 3 and Figure 3(b)).
In the E7 gene, we found 9 (9/294) nucleotide positions that difered from the reference. 3 (33.33%) of them were synonymous substitutions, while 6 (66.67%) were nonsynonymous. In this gene, we also found 2 triallelic mutations. Both of them were nonsynonymous mutations, leading to the amino acid changes A45V/E and T93S/N. Te isolates were divided into 9 subtypes, named HPV33E701-HPV33E709, based on their mutations. HPV33E701 showed complete homology with the reference. Te number of HPV33E701 cases was 103 (75.74%). HPV33E706, HPV33E708 and HPV33E709 were found for the frst time. All nucleotide variations and corresponding amino acid changes of the E6 and E7 genes of HPV33 are listed in Tables 4 and 5. Te E7 gene of HPV33 was more conserved than E6, as was the case for HPV31. None of the mutations produced a frameshift or a premature stop codon. Te secondary structures of the E6 and E7 proteins of   HPV33 are shown in Figures 4 and 5, while the secondary structures associated with the nonsynonymous mutations in E6 and E7 of HPV33 are listed in Tables 4 and 5.

Gene Sequences and Histological Characteristics.
Among the 151 sequences of the E6 and E7 genes of HPV31 and the 136 such sequences of HPV33, 115 and 108 samples underwent histological diagnoses, respectively. Te associations of the identifed variants with cervical disease are shown in Tables 6-9. We only found a statistically signifcant association between the identifed variants and cervical disease in HPV33E6. Te diferent HPV33E6 sequences difer in terms of their potential for associations with cervical pathogenesis. However, more data on this issue are needed.

Selective Pressure Analysis.
We used PAML4.9 software to test for variation in the dN/dS ratios among the diferent lineages detected in this study. K93N (P > 95%) in HPV33 E6 was the only positively selected site that we identifed in this study. No positively selected sites were found in E6 and E7 of HPV31 or in E7 of HPV33.

Predicted MHC and B-Cell Epitopes.
Te MHC epitopes and B-cell epitopes were predicted and are shown in Figure 7; only MHC epitopes binding no fewer than 10 HLA  Lineage  21  29  67  111  136  149  184  228  Reference J04353.1  G  A  C  C  G  C  A  T  A  HPV31E701  ------G  -3  A  HPV31E702  A  --T  A  -G  -62  C  HPV31E703  --T  ---G  -69  A  HPV31E704  --T  T  A  -G  -6  B  HPV31E705 -   E7. For MHC II, we found 7 good epitopes in HPV31E6, 5 in HPV31E7, 6 in HPV33E6, and 6 in HPV33 E7. For B-cell epitopes, 5 high-scoring epitopes were found in HPV31E6, 2 in HPV31E7, 1 in HPV33E6, and 3 in HPV33E7. All of the results are shown in Figure 7. Te nonsynonymous mutations may cause changes in epitopes' binding abilities.

Discussion
Persistent infection with HR-HPV is a key factor associated with cervical cancer. HPV16   E7 genes of HPV31 and HPV33 have been carried out. Tis study provided an overview of these polymorphisms in northeast China.

M R G H K P T L K E Y V L D L Y P E P T D L Y C Y E Q L S D S S D E D E G L D R P D G Q A Q P A T A D Y Y I V T C C H T C N T T V R L C V N S T A S D L R T I Q Q L L M G T V N I V C P T C A Q Q
Considering the previous fndings that conservation of the E7 gene is critical to the association of HPV16 and HPV18 with carcinogenesis [19], E7 can be considered a more appropriate target for the diagnosis of HPV31 and HPV33 than E6. Te nonsynonymous mutations leading to changes in the secondary structures of E6 and E7 proteins in HPV31 and HPV33 may result in changes to polarity, hydrophobicity, and amino acid side chains, which would potentially alter coprotein folding [20]. Terefore, the nonsynonymous mutations may afect the carcinogenic potential of the virus, the immune reaction of the host, and the potency of vaccines by altering the structure of E6 and E7 oncoproteins. In HPV31E6, C413T (A138V) was the most common nonsynonymous mutation. In HPV31E7, A184G (K62E) was the most common nonsynonymous mutation.  HPV33E701  80  31  8  10  31  0  HPV33E702  10  7  1  0  2  0  HPV33E703  6  3  0  0  3  0  HPV33E704  4  2  0  1  1  0  HPV33E705  1  0  0  0  1  0  HPV33E706  2  1  1  0  0  0  HPV33E707  4   In HPV33E6, 105C (K35N) was the most common nonsynonymous mutation. Finally, in HPV33E7, A190T (Q97L) was the most common nonsynonymous mutation. Tese mutations should be taken into account when E6 and E7 are chosen as targets for primer design or diagnosis. In this study, 10HPV31E6 sequences (HPV31E609−HPV31E614, HPV31E616−HPV31E618, and HPV31E620), 3HPV31E7 sequences (HPV31E707-HPV31E709), 7HPV33E6 sequences (HPV33E604, HPV33E605, HPV33E607, HPV33E608, HPV33E610, HPV33E612, and HPV33E614), and 3HPV33E7 sequences (HPV33E706, HPV33E708, and HPV33E709) were found for the frst time. For further functional research and study of the potential for cervical pathogenesis, there is still a need for a large number of samples to be analyzed and many more experiments to be performed.
Xi et al. reported that HPV31 lineage A (41.7%) is the most common lineage in women in the USA, followed by lineage C (37.2%) and lineage B (21.1%) [21]. Our study showed the same distribution in northeast China. However, in Italy, lineage C (65.8%) was found to be the most common, followed by lineage B (29.3%), and lineage A (4.9%) [22]. Terefore, there is a need to bear in mind the possibility of geographic variation and ethnic diferences in the frequencies of these lineages. Chen et al. also reported that HPV33 lineage A was the main lineage in Asia [23], which matches our study. In addition, here we only found one positively selected site, K93N in HPV33E6. Tis site was also previously identifed by Chen et al. [13] in southeast China. K93N has also been observed in HPV58E6, so it may have evolutionary signifcance in enabling viruses to adapt to the environment and improve their survival.
Amino acid positions 145-149 form the PDZ binding domain in the E6 protein; amino acid positions 21-29 form a short linear motif responsible for Rb binding in the E7 protein; whereas positions 58, 61, 91, and 94 act as Zn binding sites in the E7 protein [23,24]. In this study, HPV31E6 C430T (R144C) was observed next to residues 145-149, and A439G (T147A) was observed in residues 145-149. HPV31E7 C67T (H23Y) was observed in residues 21-29, and A184G (K62E) was observed beside residue 61. Moreover, HPV33E7 T65C (L22P) was found in residues 21-29, and C278G/A (T93S/N) was found beside residue 94. Tese mutations may alter the structures of oncoproteins, which could in turn afect the carcinogenic potential of viruses, the immune reaction of the host, and the potency of vaccines.
Modern immunoinformatics provides new strategies for the design and identifcation of antigen-specifc epitopic sites for use as vaccine targets. Nearly all of the nonsynonymous mutations observed in this study are located in MHC I/II and/or B-cell epitopes. Tese nonsynonymous mutations may decrease or increase epitope binding ability, or even make epitopes appear or disappear. Tey can also make the scores of predicted B-cell epitopes lower or higher. Terefore, these nonsynonymous mutations may infuence the interplay between viruses and host cells. Te epitopes predicted in this study may contribute to the development of vaccines against specifc HPV variants in the Chinese.
However, further experiments are still required to confrm the predictions obtained through immunoinformatics.
Tis is the frst study on the genetic polymorphism of the HPV31 and HPV33 E6 and E7 genes in northeast China. Te presented data provide deeper insights into the diverse geographical distribution of HPV31 and HPV33 E6 and E7 genes and should contribute to the development and application of detection probes, targeted drugs, and vaccines against specifc HPV variants in the Chinese.

Data Availability
Te data used to support the fndings of this study are included within the article.

Conflicts of Interest
Te authors declare that they have no conficts of interest.