Genetic Diversity of HPV 16 and HPV 18 Based on Partial Long Control Region in Iranian Women

Background Human papillomavirus (HPV) 16 and HPV 18 account for 75% of all cervical cancers. The L1 gene, encoding the major surface protein (MSP), is used to classify HPV types (lineages and sublineages), genotypes, and intratypic variants. Therefore, this study aimed to investigate the lineages, sublineages, genetic variabilities, and mutation effects on transcription factor binding sites by using partial sequences of the HPV 16 and HPV 18 long control regions (LCRs) in these samples. Materials and Methods After DNA isolation from 56 positive samples, the LCR of HPV 16 and HPV 18 were amplified using specific primers, and phylogenetic trees were drawn through MEGA X. Compared to the reference sequences, single nucleotide polymorphisms (SNPs) were identified. The transcription binding sites were also evaluated using the online PROMO database. Results The LCRs of 52 samples were successfully sequenced. Overall, 81.58% of all HPV 16 variants belonged to the D1 sublineage, followed by A4 (13.16%), A1 (2.63%), and C1 (2.63%) sublineages. All HPV 18 isolates belonged to A sublineage, 92.85% to A3 sublineage, and 7.15% to A4 sublineage. Out of 27 SNPs in the HPV 16 LCR, A7382T, T7384G, C7387T, C7393G, A7431G, T7448C, and C7783A were HPV 16-specific. Also, among 14 SNPs in the HPV 18 LCR, C7577A and A7943T were not previously reported. An insertion (C) between 7432 and 7433 positions was identified in all studied HPV 16 variants. Besides, most of the HPV 16 mutations were embedded in the YY1, TFIID, Oct-2, and NF-1 binding sites, while c-Fos and MBF1, as the most common binding sites, were affected by HPV 18 LCR mutations. Conclusion The present results showed that D1 and A3 were the dominant sublineages of HPV 16 and HPV 18, respectively. Therefore, women infected with these variants need to be examined in further longitudinal studies to obtain more information about the oncogenic potential of these dominant variants in Iran. Besides, YY1, TFIID, Oct-2, NF-1, c-Fos, and MBF1 were the most frequent binding sites, which were influenced by the mutations.

Different genome segments have been used to classify HPV 16 and HPV 18 into lineages, including long control region (LCR), E6, and L1 [7][8][9][10][11]. e LCR contains important regions in viral transcription and replication, including the enhancer, E2 binding, and origin of replication (ori) sites [12]. Some nucleotide changes are responsible for the increased transcription activity in some HPV 16 variants [13].
Due to the importance of LCR in the viral life cycle and reports of mutations in some nucleotide positions of the LCR, causing changes in the viral oncogenic potency (with a potentially important role in viral pathogenicity) [14,15], this study aimed to identify variants in Iranian women with normal Pap smear results, according to the partial sequences of the LCR, containing the enhancer and E2 binding sites for both HPV genotypes. Also, by using the PROMO database, the effects of single nucleotide polymorphisms (SNPs) on transcription binding sites in the LCRs were evaluated.

Clinical
Samples. 56 samples were used in this study. Indeed, a total of 2,756 inPrep samples were collected from women (age range: 16-48 years) in 12 provinces of Iran for the national HPV screening and genotyping program. e participants were referred to the papillomavirus reference laboratory of Masih Daneshvari Hospital Virology Research Center, Tehran, Iran [16]. Out of 6.1% (167/2,756) of samples that were HPV DNA positive, 24.6% (41/167) and 9.0% (15/167) were positive for HPV 16 and HPV 18, respectively, and enrolled in the study. 16  e profile PCR reaction was set up in 95°C for 300 s, 95°C for 60 s, 55°C for 60 s, 72°C for 90 s, and final extension in 72°C for 300 s for 35 cycles with outer primers, which led to the production of 600-nucleotide amplimers. e same thermocycler profile was used with inner primers to amplify the final products (530 nucleotides). e HPV 18 primers included 5′TCGGTTGCCTTTGGCTTAT3′ and 5′TCCGTGCACAGATCAGGTAG3'. Similar to HPV 16, a PCR program was performed for HPV 18. e β-globin gene was also used to evaluate the sample quality. Amplification of PCR products was confirmed by 1.5% agarose gel electrophoresis.

Sequence Alignment and Phylogenetic Analysis.
e partial sequences of the HPV 16 and HPV 18 LCRs were analyzed in BioEdit software [17] and blasted using MEGA X software with reference sequences [18]. Phylogenetic trees were constructed using the neighbor-joining method, Kimura 2-parameter method, and bootstrapping (1000 replicates) in MEGA X software. Overall, 37 complete genome sequences were used as the reference for both viruses.

Identification of SNPs in HPV 16 and HPV 18 LCRs and
Evaluation of Transcription Binding Sites. After sequencing, to identify SNPs, the studied HPV 16 and HPV 18 LCR sequences were aligned with the reference sequences (GenBank NC_001526.4 and NC_001357.1, respectively) using MEGA X software. Besides, the online PROMO database was used to evaluate the transcription binding sites [19].

Successful Amplification.
Among 56 DNA-positive samples for both HPV genotypes, two samples of each virus (HPV 16 and HPV 18) were excluded due to poor amplification quality. In the remaining samples (n � 52), a 530-nucleotide genotype (7,336 to 7,865 for HPV-16) and a 555-nucleotide genotype (7,465 to 63 for HPV-18) from LCRs were successfully amplified and sequenced.

Single Nucleotide Polymorphisms (SNPs). Out of 41
SNPs found in this study, 27 were in the HPV 16 LCR and 14 in the HPV 18 LCR (Tables 1 and 2). Seven SNPs were specific to the sequences of HPV 16, and two SNPs were newly detected in the studied HPV 18 samples (A7382T, T7384G, C7387T, C7393G, A7431G, T7448C, and C7783A in the HPV16 LCR and C7577A and A7943T in the HPV18 LCR). Moreover, an insertion (C) between positions 7432 and 7433 was identified in all studied HPV 16 variants. No insertion or deletion was found in the HPV 18 LCR.

Discussion
HPV 16 and HPV 18 are the most prevalent HPV genotypes in women with normal Pap smear and cancerous samples [20,21]. Although the burden of HPV 16 and HPV 18 has reduced following vaccination, their potential contribution to cancer progression remains unchanged. One explanation for this finding may be the mutations in different parts of the viral genome, especially along the LCR as the main regulatory region [16,17].
According to different studies in Iran, it seems that the abundance of HPV 16 is higher than other types of HPV among Iranian women with either normal Pap smear or cancerous samples; however, HPV 18, as the second most dominant genotype in cervical cancer, may not be observed frequently in normal samples [1,20]. Since both viruses have different variants, many studies have shown that some HPV 16 variants, especially non-A variants, play a critical role in the development of an invasive form of cervical cancer. However, findings regarding the importance of HPV 18 variants in cervical cancer are incompatible [3,5,22].
In this study, the common variants of HPV 16 and HPV 18 were investigated. Generally, different regions and gene targets were selected to classify the intratypic HPV 16 and HPV 18 variants. Given the importance of LCRs in viral replication and transcription, especially regions with viral and cellular transcription binding sites, they were selected for this study. In the current study, D1 and A4 sublineages were found as the main HPV 16 variants, which is consistent with a previous study by Vaezi et al. [11]. Commonly, D1 variants are found in Southern Asia, sub-Saharan Africa, Europe, and North America [5].
Although the sample size was small for HPV 18, A3 was the predominant variant, followed by A4. is finding is inconsistent with the results of a study by Salavatiha et al., which reported HPV 18 A4 variant to be predominant in Iran, [23]. e HPV 18 A1 variants are prevalent in the Asia-Pacific region, and A3 variants account for a large proportion of variants in Europe, Southern/Central Asia, and America [7]. e HPV-16 A1 variants are globally distributed, while other variants have a specific geographic distribution; for example, the A3/A4 variants are predominant in Eastern Asia, and B1-B4/C1-4 variants are frequent in African countries. However, the participants' demographic information was not available, although traveling and immigration are known as significant factors for the geographic distribution of viral variants.
Some studies have reported that mutations in the HPV 16 LCR of non-A variants were associated with an increase in p97 activity and, consequently, an increase in the expression of E6/E7 oncogenes [21]. For instance, in the HPV 16 LCR, an A7727C change was a distinctive characteristic of the increased activity of p97 in D variants [21]. A combination of changes in A7483C, G7487A, G7519A, C7667T, C7687A, A7727C, C7762T, and C7784T led to an increase in the transcription activity of these variants [15]; all of these changes were observed in the studied HPV-16 LCR. Also, Veress et al. showed that many nucleotide changes responsible for the increased transcription activity in HPV16 D and A4 variant genes were related to the LCR 3′ half [13].
Compared to the reference sequences, mutations within LCR in all subjects had more binding sites for some transcription factors, such as YY1, NF-1, Oct-2.1, TFIID, E2F, and TEF-1. Pande et al. reported that some mutations in the HPV 16 LCR, especially in YY1 binding sites, such as G7519A (the most common mutation in YY1 binding site), A7634C, C7687A, T7712G, C7790T, G7824A, A7837C, T7741, and C7762T, led to increased p97 activity by three to six folds [25]. Among these variants, G7519A, C7687A, and G7824A were identified in our samples. Regarding HPV 18, 64.3% (9/14) of the identified SNPs were in the c-Fos and HMGI binding sites.
Compared to c-Jun family members, C-Fos is an AP-1 transcription factor subunit. Overall, AP-1 is one of the factors, which plays a crucial role in the primary stages of neoplastic changes. All these factors regulate the miR-21 genes in cervical cancer cells, associated with HPV 18 and HPV 16 [26]. Moreover, Wang et al., in their study on cervical samples, showed that the overexpression of HMGA2 led to the progression of cervical intraepithelial neoplasia (CIN) into cervical cancer [27]. It seems that variations in the HPV-18 genome, as well as other high-risk HPV genotypes, can affect the cellular and viral transcription sites, although further research is warranted. e present study had several limitations. First, this study had a cross-sectional design, with no demographic information available regarding the cancer status of the samples; therefore, more detailed longitudinal studies are suggested at a molecular level. Second, in this study, samples collected in the first national HPV screening program in Iran were used, while the diversity of HPV genotypes in all provinces of Iran was not comprehensively studied. Finally, although LCR is an important regulatory sequence of HPV genome, according to a study by Qu et al., in which HPV sequences were examined in Chinese women, the accuracy of sublineage classification based on the LCR was 94.02% and 100% for HPV 16 and HPV 18, respectively; they suggested that E2 was the best region for HPV 16 sublineage classification [28]. Nevertheless, in the current study, a partial sequence of LCR was investigated.
erefore, it is suggested to classify whole genomes or marked regions, such as E2, for sublineage classification.

Conclusion
e present study showed that D1 and A3 were the dominant sublineages of HPV 16 and HPV 18, respectively. Also, YY1, TFIID, Oct-2, NF-1, c-Fos, and MBF1 were the most frequent binding sites, influenced by mutations. Regarding the importance of mutations in the LCR of non-A variants in developing higher grades of neoplasia (CIN grade 2/3) or invasive cervical cancer, women infected with these variants should be examined in future longitudinal studies to obtain further information about the oncogenic potential of these dominant variants in Iran.

Data Availability
e data that supported the findings of the current study are available in GenBank and the cited references.