The Classification of the Persistent Infection Risk for Human Papillomavirus among HIV-Negative Men Who Have Sex with Men: Trajectory Model Analysis

Department of Epidemiology and Biostatistics, School of Public Health, Xinjiang Medical University, Urumqi, Xinjiang 830000, China Branch of the First Affiliated Hospital of Xinjiang Medical University, Changji, Xinjiang 831100, China Liverpool School of Tropical Medicine, Pembroke Place, Liverpool L3 5QA, UK Clinical Laboratory of Xinjiang Medical University First Affiliated Hospital, Urumqi, Xinjiang 830000, China Surgery Department of Toutunhe District General Hospital, Urumqi, Xinjiang 830000, China


Introduction
Human papillomavirus (HPV) infection is one of the most common sexually transmitted infections (STIs) worldwide [1] and is paramount importance due to its association with anal, urethral, penile, and oropharyngeal cancers in males [2]. Generally, MSM have a higher prevalence and incidence of HPV. Recently, studies among HIV-negative MSM have revealed that 48.2% were positive for at least one of the anal HPV genotypes in Guangzhou, China [3], 51.8% in Urumqi, China [4], and 62.8% in three cities (Chengdu, Xian, and Taiyuan), China [5]. Besides, anal HPV infection was found in 59% of HIV-negative MSM in Bangkok, central Thailand [6], and 40% (49/124) of all MSM were infected with at least one anal HPV genotype in Moscow, Russia [7].
There were 180 HPV genotypes, of which more than 40 HPV genotypes infected human genital mucosa [8], such as HPV6, HPV11, HPV16, and HPV18. An HPV infection status can vary over time, for example, (1) the virus in previously infected individuals could clear off between follow-up visits; (2) on the contrary, previously uninfected individuals could become infected between follow-up visits; or (3) it is possible for individuals who have been cleared off the virus to become reinfected [9][10][11]. As such, HPV infection can be characterized as of a transient or persistent nature [12]. Persistent anal HPV infection, particularly with carcinogenic HPV types, is an important risk factor for the development of anal cancer [13]. Thus, investigation of HPV infection risk for individuals based solely on the primary test result would only provide limited insights about changes in infection trends over time, and as such a longitudinal investigation of HPV infection could provide a better temporal resolution on HPV infection trends.
Capturing diverse patterns of HPV infection types over time poses a complex analytical challenge, in particular for a subject infected with multiple HPV genotypes at the same time. It was a huge amount of work to look at the trend of HPV infection one by one. The GBTM can group the study participants according to certain characteristics, which could save time and simplify operations. "GBTM is a specialized application of finite mixture modeling and is designed to identify clusters of individuals following similar progressions of some behavior or outcome over age or time" as suggested by Nagin [14][15][16]. Although each person has unique characteristics, some subjects may share one or several similar characteristics that make it possible to classify them into different categories or groups.
This method can be used to describe the course of many phenomena, whether behavioral [17], biological [18], or physical [19]. For the past few years, GBTM has been widely used in the analysis of longitudinal data in medical research [20] but by far there has been only one application of the GBTM model in the analysis of MSM HIV data [21]. This study is the first attempt to fully apply GBTM to the analysis of MSM HPV infection data. Urumqi is the capital city of Xinjiang, and its economic and cultural background are different from those of Chinese cities. There were no followup studies on MSM HPV infection in this region, and this study fills the gap on the topic issue in this region.

Study
Population. An ongoing prospective study of HPV infection among MSM was performed with the assistance of two local nongovernmental organizations (NGO)-Tianshan Volunteers Workstation and Xinjiang Boys Rainbow Dream Volunteers Service Center. The study exclusion criteria included the following: (1) men younger than 18 or older than 65 years of age; (2) men who reported they did not have a history of anal sex with men in the past year; (3) unwillingness to provide anal swab specimens for HPV testing and blood for HIV testing at each study visit; (4) unwillingness to complete a 63-item computer-assisted study questionnaire on demographic, behavioral, and disease history; (5) prior HIV infection, treatment for anal cancer, anal cytology or high-resolution anoscopy within 12 months prior to enrollment. Informed consent was obtained from all participants, and the study was granted an ethical approval by the Xinjiang Medical University First Affiliated Hospital Ethics Committee (ethical review number was 20160512-11).
The subjects were followed up at six-month intervals, and the snowball sampling methods were used to recruit MSM in Urumqi of Xinjiang, China, from March 1st, 2016 to December 31th, 2017. We recruited a set number of eligible individuals called "seeds," who recruited (and were compensated for recruiting) a set number of other eligible individuals from among their network members [22]. Urumqi population refers to the population living in Urumqi for a long time; on the contrary, the non-Urumqi population refers to the people who have long lived outside Urumqi.
A phased enrollment of participants was undertaken with 100 subjects entered into this study from March 1st to March 31th, 2016 (they should participate in follow-up visits on September 2016, March 2017, and September 2017), and the remaining 400 subjects were enrolled from September 1st to December 31th, 2016 (they should participate in follow-up visits on March 2017 to June 2017 and September 2017 to December 2017). A total of 5 participants tested HIV-positive during follow-up and were subsequently excluded from the study, and the other 134 individuals were excluded from the study because they could not attend the follow-up visits on time for two or more consecutive times. Finally, from the remaining participants, 361 (72.2%) individuals with at least two visits were included in this study, of which 327 individuals had information in HPV on three visits, and 60 individuals had HPV information on four visits.

Specimen Collection and Laboratory Testing.
A salinewetted swab was inserted 3-5 cm into participants' anal canal (between the anus and the anal canal dentate line) to collect exfoliated cells. The anal epithelium was swabbed gently at a 360°rotation for at least two minutes. The specimens were placed into a standard transport medium (Hybribio Biotech Limited Corporation, Chaozhou, China) and stored at -20°C. Laboratory assessments were completed within a week from specimen collection.
This study used Hybribio 37 HPV GenoArray Diagnostic Kit (HPV 6,11,16,18,26,31,[33][34][35]39, 40, 42, 43-45, 51-59, 61, 66-73, and 81-84) and HybriMax (Hybribio Biotech Limited Corporation, Chaozhou, China) to distinguish for HPV genotypes according to the experimental principle introduced in a previous study [4]. Laboratory procedures undertaken included DNA extraction, polymerase chain reaction (PCR), and flow-through hybridization. The GenoArray Kit included positive and negative controls, which were used in every PCR analysis, as well as during the hybridization process. Quality control was performed against 5% of the samples selected at random for retesting. 2.3. HPV Infection Classification. Participants were tested for HPV at baseline and at six monthly intervals for 18 months (up to 3 follow-up visits). Confirmation of HPV infection was defined as a detection of one or more HPV types in the collected specimen (i.e., HPV DNA positive genotyping). Participants with an invalid HPV test were excluded from the analysis. HPV infection was classified into six categories [9][10][11][12] based on the combined detection results at baseline and the three follow-up visits, as follows: (1) Persistent Negative: HPV DNA test results, negative at baseline and the 3 follow-up visits (-, -, -, -).
(3) New Infection: HPV DNA test results, negative at baseline and the first 2 follow-up visits, but positive at the last follow-up visit (-, -, -, +).
(4) Continuous Positive Detection: HPV DNA test results, negative or positive at baseline and the first follow-up visit, but positive at the last two consecutive follow-up visits (-, -, +, +; or -, +, +, +; or +, -, +, +; or +, +, +, +). Since HPV persistent positive status does not inform about HPV persistent infection, we cannot distinguish between persistent infection (by a given HPV type) and regular reinfection (by different types).

GBTM Establishment. Jones et al. successfully developed
GBTM methodology based on a semiparametric, groupbased modeling strategy and established through PROC TRAJ, a SAS macro procedure [23,24]. GBTM analysis was conducted using SAS PROC TRAJ (SAS Institute, Inc.; Cary, NC, USA) to identify subgroups; it was a Bayesian approach. With SAS PROC TRAJ, the levels and shapes of trajectories are determined by the model's regression parameters, and the maximum likelihood method was used for the estimation of the model parameters. Technically, the model was a mix-ture of probability distributions that were suitably specified to describe the data to be analyzed. Therefore, the model assumes that repeated observations on the same individual are independent conditional on trajectory group, meaning that the within-person correlation structure is explained completely by the estimated trajectory curve for each person's group. The output of a GBTM includes estimated probabilities of group membership for each individual and each group and an estimated trajectory curve over time for each group [25]. proc traj data=AR out=of outstat=os outplot=op; id id; var y1-y4; indep t1-t4; model zip; min 0; max 11; ngroups 3; order 3 3 3; run; The "y1-y4" were set as outcome variable (y), representing the accumulative infection numbers of different HPV types in baseline and three follow-up visits. The diagnostic kit used was capable of determining thirty-seven HPV types, if the participants were tested for one of thirty-seven types, y = 1, and if the participants were tested for two of thirtyseven types, y = 2, and so forth. The "t1-t4" were set as the independent variable (x), representing the baseline and three follow-up visits. SAS PROC TRAJ has the capability of modeling three different distributions (continuous, binary, and count) [14][15][16]. In the current research, the Poisson model was used to analyze the collected count data. However, due to the large proportion of zero values present in the current dataset, a Zero-Inflated Poisson (ZIP) model was used to account for the fact. The "ngroups" is the model to be fitted: "ngroups 1" stands for "one-trajectory model," "ngroups 2" for "two-trajectory model," "ngroups 3" for "threetrajectory model," and "ngroups 4" for "four-trajectory model."

Model Evaluation.
To determine the number of trajectory groups present within our sample, we fit a series of GBTM with 1 to 4 groups (an error was reported when the groups are greater than 5). In selecting the appropriate number of trajectory groups, we considered the following criteria: (1) Bayesian information criteria (BIC), as previously reported, the largest BIC value indicates the best-fitting model [14]; (2) the average posterior probability (AvePP), reflected the posteriori probability of each individual classified to the corresponding subgroups, with values closer to 1 indexing greater precision, and the minimum threshold set at 0.7 for individuals [23,24]; (3) group size; (4) the usefulness of the number of groups in terms of the similarities/differences in their trajectory shapes.
2.6. Statistical Analysis. Data entry was performed independently by two staff members using the EpiData version 3.1 software (The Epi Data Association Odense, Denmark). Participants who were grouped by GBTM defined as "expected," and those who were grouped by researchers in this study defined as "observed." We used the following definitions to group "observed": (1) DG: participants who were HPV persistent negative and infection clearance; (2) FG: participants who were HPV continuously positive detection, new infection, positive detection again, and other situations, the HPV DNA test results were 0 ≤ y ≤ 4 at baseline and three follow-up visits, and the curve showed a flat trend; (3) IG: participants who were HPV continuous positive detection, the HPV DNA test results were 5 ≤ y ≤ 11 at baseline and three follow-up visits, and the curve showed an increased trend. Participants who were HPV other situations, the HPV DNA test results were 5 ≤ y ≤ 11 at baseline and first 2 follow-up visits, but y = 0 at the last follow-up visits.
This study used SPSS 21.0 (SPSS Inc., Chicago, IL, USA) for disposing and analyzing data. Demographic and sexual behavior variables are presented as absolute numbers and proportions or as median and range interquartile for each category. Comparisons among different groups were conducted with 2-sided Pearson chi-square test, Fisher's exact test, and nonparametric tests. Normality tests were performed for continuous variables before conducting further analyses. A multinomial logistic regression model was used to explore risk factors associated with the FG (y = 2) and IG (y = 3) groups compared to DG (y = 1). All variables with P < 0:25 were included and adjusted in the multivariate models. The significance level was set at α = 0:05.
Comparisons between expected and observed outcomes amongst the three risk groups identified that 14 participants grouped in trajectory 2 (FG) during modelling showed infection clearance (-, +, -, -or +, +, -, -), and as such based on observed HPV infection classification results has to be instead classified into trajectory 1 (DG). Similarly, 4 participants in the trajectory 2 (FG) with new infection (y ≥ 5) should be classified into trajectory 3 (IG) ( Table 2).  (Table 3).

Discussion
The persistent infection, clearance, and reinfection of HPV involve a complex set of immune mechanisms, particularly the multiple type infection of HPV at the same time. In this study, the cumulative number of different types of HPV infection was taken as the main study variable, because the following speculation can be made according to the natural history of HPV infection: if a subject was positive for the HPV at each follow-up visit (regardless of which type of HPV was positive), then during the long-term follow-up, the subject may simultaneously develop persistent infection of a particular type, new infection of another type, or reinfection of a particular type, and so on. These subjects are more likely to develop certain HPV-related clinical symptoms in the future than those who are persistent negative and clear the infection.
In recent years, GBTM has been widely used in the field of medical research for noncommunicable diseases [14]. Successful implementation of GBTM includes family income trajectories with adiposity in adolescence [18], trajectories of fiberoptic endoscopic evaluation of swallowing in dysphagic patients [26], and trajectories of cognitive decline among dementia patients [27]. GBTM also has been applied in the analysis of infectious disease data, such as the case of Elsensohn and colleagues [20] who used data on CD4 T lymphocyte counts in patients with HIV receiving antiretroviral treatment to build a GBTM, with an aim at potentially aiding medical decisions. Nevertheless, GBTM has not been employed for analysis of MSM HPV infection data; therefore, this study constituted the first attempt to fully apply GBTM       BioMed Research International to the analysis of MSM HPV infection data. In the current study, diagnostic statistics facilitated the selection of a model with three trajectories, which could more clearly visualize the development trends of HPV infection. Trajectory 1 included 44.6% (161/361) participants who were persistent negative or showing infection clearance and showed a slightly declining trend in the number of infections; this trajectory was regarded as the DG because the greatest proportion of participants had asymptomatic transient infections [28,29]. Giuliano et al. [12] reported that 66% and 90% of HPV positive men are clear of HPV after 12 months and 24 months, respectively (the clearance of HPV is an autoimmunity clearance). It is believed that individuals who clear off an HPV infection may subsequently acquire natural immunity [30]. In conclusion, we made trajectory 1 as low risk group for HPV persistent infection.
Furthermore, trajectory 2 was characterized by 49.6% (179/361) of participants; this trajectory was regarded as the FG and showed a flat pattern in the number of infections. Individuals with the continuous positive detection, positive detection again, or other situations were assigned to the same group (0 ≤ y ≤ 4), suggesting that may be there was a similarity in HPV persistent infection risk. Once a previous HPV infection completely clears, current PCR-based assay methods cannot distinguish whether reinfection with HPV is due to a potential new acquisition or a reactivation of a previous virus [31,32]; therefore, individuals who were continuously positive detection and positive detection again should be closely observed because they may develop a persistent infection, and persistent infection with high-risk HPV (HPV16 and HPV18) would increase the risk of cancer. In addition, "Clearance was defined as one positive test result for a specific HPV type followed by two consecutive negative visits." [33]. The participants' HPV DNA positive test results at the second follow-up visit, but negative ones at the last follow-up visit (-/+, -/+, +, -), those who may have cleared the virus or developed a reinfection in the next follow-up visit, and reinfection may further increase the risk of a persistent infection. Taken together, we hypothesized that participants in above conditions were at moderate risk for HPV persistent infection.
Lastly, trajectory 3 included 5.8% (21/361) of participants who displayed an ascending trend in number of infections, and as such, this trajectory was defined as the IG; 19/21 participants were continuous positive detection (5 ≤ y ≤ 11), and 2/21 participants were other situations. The HPV DNA test results were 5 ≤ y ≤ 11 at baseline and first 2 follow-up visits, but y = 0 at the last follow-up visits; they may develop HPV reinfection at the next follow-up. They were detected multiple HPV types at the same time, predicting a high risk of HPV persistent infection.
In this study, we compared the FG and IG groups against the DG in a multinomial logistic regression model and determined that receptive anal sex, occasional use of condoms during anal intercourse, substance use, experience of transactional sex with males, and history of other STIs were significant risk factors of the occurrence of HPV. Donà et al. [29] demonstrated that MSM who reported receptive anal sex showed a higher incidence of infection by any HPV type compared to those who did not engage in receptive anal sex (HR, 2.65; 95% CI, 1.16-6.06). In addition, Winer et al. [34] found that consistent condom use by their partner appeared to reduce the risk of HPV infection. Furthermore, Yu et al. [35] reported that "High rates of drug use, coupled with high rates of ulcerative STIs such as HPV, suggest the potential for rapid amplification of STIs/HIV risk." Finally, a report by Colón et al. [36] concluded that self-reported syphilis MSM had a higher risk of high-risk HPV infection than nonreported (OR, 4.00; 95% CI, 1.20-13.37).
Persistent HPV infection may be related to individual immunity; immunization experiments are expensive, and monitoring of each individual's immunity perhaps is even harder to accomplish. Grouping participants by GBTM and focusing on the subgroup of an uptrend, such as trajectory 3 in this cohort study, could improve the cost-effectiveness of immunization experiments. In summary, for disease prevention and control, policymakers should formulate programs that favor the public macroscopically, for example, vaccination strategies for MSM with high risk of HPV persistent infection. HPV vaccines have only recently been marketed in China and are mainly targeted at females. Unlike men who have sex with women would benefit through herd immunity from female-only vaccination strategies, HPVassociated disease among MSM is unlikely to decline in female-only vaccination settings [37]. To address this issue,  [38].
The information on the behavioral characteristics of different development trend groups was conducive to screen the high-risk infection individuals through the questionnaire without laboratory test. For example, subjects who had receptive anal intercourse, occasional use of condoms in anal sex, experience of transactional sex, or substance use maybe infectious ones; this hypothesis needs to be verified by a long study. In ethical terms, health interventions should be launched for all categories of HPV infection, whether it is persistent positive, new infection, or other situations. Fortunately, the GBTM can group the study participants according to certain characteristics before the preventive intervention, which could save time, simplify operations, play a positive role in the screening of risk groups, and achieve precision prevention.
Since this was an exploratory study, a sample size calculation was not undertaken. Properly powered studies in the future would be more advantageous. The study is based on a 1.5-year follow-up data, and the robustness of the GBTM results for long-term studies needs to be verified. Furthermore, since the nonprobabilistic sampling method was used in this study, the extrapolation of the research results should be made with caution. Lastly, the PCR-based assay methods in this study cannot distinguish whether infection with HPV is prevalence or incidence; maybe there were prevalence-incidence biases.

Ethical Approval
This study was granted an ethical approval by the Xinjiang Medical University First Affiliated Hospital Ethics Committee (ethical review number was 20160512-11).

Conflicts of Interest
The authors have declared that no competing interests exist.