Circulation of HIV-1 CRF02_AG among MSM Population in Central Italy: A Molecular Epidemiology-Based Study

Introduction. The evolutionary and demographic history of the circular recombinant form CRF02_AG in a selected retrospective group of HIV-1 infected men who have sex with men (MSM) resident in Central Italy was investigated. Methods. A total of 55 HIV-1 subtype CRF02_AG pol sequences were analyzed using Bayesian methods and a relaxed molecular clock to reconstruct their dated phylogeny and estimate population dynamics. Results. Dated phylogeny indicated that the HIV-1 CRF02_AG strains currently circulating in Central Italy originated in the early 90's. Bayesian phylogenetic analysis revealed the existence of a main HIV-1 CRF02_AG clade, introduced in the area of Rome before 2000 and subsequently differentiated in two different subclades with a different date of introduction (2000 versus 2005). All the sequences within clusters were interspersed, indicating that the MSM analyzed form a close and restricted network where the individuals, also moving within different clinical centers, attend the same places to meet and exchange sex. Conclusions. It was suggested that the HIV-1 CRF02_AG epidemic entered central Italy in the early 1990s, with a similar trend observed in western Europe.


Introduction
In the past decade, there has been an increase in the circulation of non-B strains and circulating recombinant forms (CRFs) of the human immunodeficiency virus type 1 (HIV-1) in previously subtype B homogeneous area such as Europe and North America [1][2][3][4][5][6][7][8]. In Europe, data from extensive multinational database showed that non-B strains and CRFs accounted for 15.0% to 30.0% of HIV-1 total infections in the different countries, with an increase in proportion from 2000 to 2007 [9][10][11].
Amongst the HIV-1 non-B types, the CRF02 AG is one of the most prevalent recombinant forms in the world, responsible for at least 8% of total infections [9]. Recently, its prevalence was increased in west and west-central Africa, where this viral variant is native and predominantly transmitted within heterosexual population, also in male at-risk populations, such as continental men who have sex with men (MSM) [10]. In western and central Europe, a similar trend was also observed, with an increased proportion of CRF02 AG-infected patients from 5% in 2000-2003 to 8% in 2004-2007 [9]. In Europe, this CRF represents the first HIV-1 non-B subtype among newly diagnosed individuals in many countries, such as Spain, Germany, Belgium, and The Netherlands [11].
In Italy, the proportion of HIV-1 non-B-infected patients has been progressively increased from 0.3% in 1993 to 20.0% in recent years [6,12] and it exceeds 63.0% among foreigners living in this country [13][14][15]. Similarly in Lazio region a progressive increasing of the proportion of HIV-1 non-Binfected (from 6.1% in 2004 to 21.3% in 2009) has been observed among native Italian patients [16]. The high HIV-1 non-B subtypes prevalence in Italy can be associated to the increase of immigrants from Africa, Eastern Europe, and South America, accounted in the last decade [11,16]. Among the HIV-1 non-B subtypes circulating in Italy, CRF02 AG is the second per proportion, after F and before C strains [6,[17][18][19][20][21][22][23].
To explore the increasing prevalence of the HIV-1 CRF02 AG and, in general, as those of other non-B subtypes, is important for epidemiological purposes but can also to be relevant in clinical setting. Some biological differences seem to exist between subtypes and this fact could have several implications in terms of development of drugs resistance, response to antiretroviral therapy, disease progression, and transmission rate.
Therefore, understanding of the reasons for the emergence and spread of non-B subtypes in certain core groups, such as MSM, is crucial for epidemiological analyses aimed to define the pathway of diffusion of HIV-1 nonnative strains in Europe over time and across the different sexual-risk populations.
In this regard, European MSM seem to have a high risk of infection with non-B subtypes because they are more likely than heterosexual individuals to have sex abroad, thus in countries with high proportions of non-B subtypes and with nonnational sexual partners [24,25].
To improve knowledge about the spread of infection with the CRF02 AG, the evolutionary rates, and dates of origin of the epidemic were estimated and were identified and were characterized as a number of epidemiological clusters from a dataset of pol gene sequences isolated from MSM living in central Italy.

Study Population.
Men who have sex with men (MSM) with confirmed HIV-1-infection by CRF02 AG were considered as eligible for the study. All the participants were identified from a large cohort of MSM newly diagnosed with HIV-1 infection from a surveillance network of clinical sites in the Lazio Region, from 2000 to 2012. Using a regional centralized archive, a set of pol sequences belonging to all participants was also selected. Thus, the study population represents all HIV-1 cases infected with a CRF02 AG subtype reported in the Lazio Region after the 2000 and for whom a pol sequence was available.
The dataset consisted of HIV-1 CRF02 AG pol sequences (1,191 nucleotides long). All the isolates were collected between 2000 and 2012. Demographic, behavioral, and clinical data of the selected patients were extracted from individual medical records and organized anonymously in a common electronic archive. All data was imputed, managed, and stored according to the standards and recommendations of the ethics committee of participant institutes.

HIV-1 Pol Sequencing.
HIV genotype analysis was performed on plasma samples by means of a commercially available kit (ViroSeq HIV-1 genotyping system; Abbott Laboratories, Abbott Park, IL). Briefly, RNA was extracted, retrotranscribed by murine leukemia virus reverse transcriptase (RT), and amplified with Amplitaq-Gold polymerase enzyme by using 2 different sequence-specific primers for 40 cycles. Pol-amplified products (containing the entire protease and the first 335 amino acids of the RT open reading frame) were full length sequenced in sense and antisense orientations by an automated sequencer (ABI 3100) by using 7 different overlapping sequence-specific primers [26].

Statistical Analysis.
To test the differences between Italian and nonnational individuals infected with HIV-1 CRF02-AG, the Mann-Whitney test for the continuous variables and Fisher's exact test for the categorical variables were used.
values <0.05 were considered statistically significant. The calculations of all statistical tests were performed by using SPSS statistics package version 19.

Likelihood
Mapping. The phylogenetic signal of our dataset was investigated by means of the likelihood mapping analysis of 10,000 random quartets by using TreePuzzle program as already described [27]. In this analysis, groups of four randomly chosen sequences (quartets) were evaluated using Maximum Likelihood (ML). For each quartet, the three possible unrooted trees were reconstructed under the selected substitution model. The posterior probabilities of each tree were then plotted on a triangular surface, so that fully resolved trees fall into the corners and the unresolved quartets in the centre of the triangle (indicating a star-like signal). When using this strategy, if more than 30% of the dots fall into the centre of the triangle, the data are considered unreliable for the purposes of phylogenetic inference.

Phylogenetic Analysis, Time-Scaled
Phylogeny and Demographic History. All HIV pol sequences were aligned using Clustal X and manual editing was performed with Bioedit as already described [28], removing gaps and cutting to identical sequence lengths. ModelTest version. 3.7 was used to select the simplest evolutionary model that adequately fitted the sequence data [28].
Subtype was determined uploading sequences individually into the REGA HIV-1 automated Subtyping Tool version 2.0 (http://www.bioafrica.net/rega-genotype/html/subtypinghiv.html) and confirmed by in-house phylogenetic analysis. Maximum likelihood (ML) phylogenies were estimated with the best fitting nucleotide substitution model (GTR + I + G) (tested with ModelTest); calculations were performed with Phyml version 3.0. Statistical support for specific clades and clusters was assessed by bootstrap analysis considering bootstrap values >70% (data not shown).
The dated tree, evolutionary rates, and population growth were coestimated by using a Bayesian Markov Chain Monte Carlo approach (MCMC; Beast v. 1.7.4) implementing a general-time-reversible + invariant + gamma model. Different parametric demographic models (a constant population size, exponential and logistic growth) and a nonparametric Bayesian skyline plot (BSP) were compared under strict and relaxed clock conditions, and the best models were selected by means of a Bayes factor (BF, using marginal likelihoods) implemented in Beast as already described [28]. In accordance with Kass and Raftery [29], the strength of the evidence against H0 was evaluated as follows: 2lnBF <2; no evidence; 2-6: weak evidence: 6-10, strong evidence: >10, very strong evidence. A negative value indicates evidence in favor of H0. Only values of >6 were considered significant.
Chains were conducted for 30 × 10 6 generations and sampled every 3,000 steps. Convergence of the MCMC was assessed on the basis of the effective sampling size (ESS) for each parameter. The effective sample size (ESS) of a parameter sampled from an MCMC (such as BEAST) is the number of effectively independent draws from the posterior distribution that the Markov chain is equivalent to.
Only ESS values of >250 were considered robust and were accepted. Uncertainty in the estimates was indicated by 95% highest posterior density (95% HPD) intervals. The trees were summarized in a target tree by the Tree Annotator program included in the Beast package by choosing the tree with the maximum product of posterior probabilities (maximum clade credibility) after a 10% burn-in. The time of the most recent common ancestor (TMRCA) estimates was expressed as mean and 95% highest posterior density (HPD) years before the most recent sampling dates, corresponding to 2012 in this study.
The demographic history was also analyzed on the HIV-1 CRF02 AG pol sequences by performing the Bayesian skyline plot. The software MEGA 5 [30] allowed to calculate the mean genetic distances within different cluster in both subclades A and B by using Tamura Nei method.

Phylogenetic Analysis, Time-Scaled Phylogeny Reconstruction and Demographic
History. Phylogenetic analysis, conducted by the REGA HIV-1 automated subtyping tool confirmed that all the 55 HIV-1 sequences from participants are classifiable as CRF02 AG (data not shown). BF analysis showed that the relaxed clock fitted the data significantly better than the strict clock (2lnBF between the strict and relaxed clock was >200 in favor of the second). Under the relaxed clock, the BF analysis showed that the BSP was better than the other models (2lnBF > 48).
The demographic history of HIV-1 CRF02 AG subtype showed that the epidemic started in the end 80s and after a light grow until 1995 has remained more or less constant up to about 2007. After the 2007 the epidemic had a decrease, suffering of the bottleneck until 2010 when showed an increase over time (Figure 2). The mean genetic distance was measured within clusters inside the subclades. The within groups mean genetic distance ranged from 1.6% for subclade B to 0.4% for subclade-A.

Discussion
MSM continue to be the population group at higher risk of acquiring HIV-1 infections in developed countries. In western Europe, the incidence of HIV-1 among MSM has increased over the last decade, probably due to an increase in unsafe risk sexual practices and the re-emergence of several sexually transmitted infections (STI) in the young, low educated and HIV unaware individuals [31][32][33][34][35]. The HIV epidemic in Italy started among intravenous drug users (i.e. heterosexual and MSM IDUs) who accounted for 57% of all the AIDS cases reported in the adult population between 1982 and 2007 and was mainly attributed to the HIV-1 B subtype [36]. In Europe, national surveillance data showed an increasing proportion of HIV-1 cases among MSM, ranging from 15% in 1996-1997 to 22% in 2006-2007 [31]. In Italy, HIV-1 CRF02 AG accounts for about 3% of all the infected and represents 80% or more of all the CRFs in the country [37].
The aim of this study was to investigate the evolutionary and demographic history of this CRF in a selected retrospective group of HIV-1 infected MSM resident in central Italy, by using a Bayesian coalescent-based framework. Dated phylogeny allowed to estimate a TMRCA in 1992 for the root of the tree, thus suggesting that the HIV-1 CRF02 AG strains currently circulating in Central Italy originated in the early 90's and that the epidemic was relatively close to the HIV-1 B subtype introduction in the 80s [38]. This is the first molecular epidemiology-based study in Italy which aimed to investigate the characteristics and timing of circulation of CRF02 AG in HIV-1 infected people, in particular in MSM.
Bayesian phylogenetic analysis on the 55 pol sequences revealed the existence of a main clade statistically supported, introduced in Rome before 2000 and differentiated over time into two sub-clades (A and B) well defined for year of start of circulation (2000 and 2005). Inside the sub-clade A, we found only one cluster (A1) statistically supported, whereas inside the sub-clade B we identified five different statistically supported clusters (B1 to B5).
All sequences within identified cluster were interspersed, indicating that the subgroup of MSM analyzed in this study belongs to a close and restricted network of individuals, which despite attending different clinical centers attend the same place for engagement and sexual exchanges. This hypothesis is also in line with the fact that the only sequence found in the MSM living in Latina, Southern Lazio, did not cluster with any of the sequences retrieved among MSM living in Rome, probably because MSM in Latina hang around different meeting places than those living in Rome. Conversely, sequences were relatively dispersed to several clinical centers, presumably due to a variety of factors, which commonly influence clinical facilities from general population, such as recommendations from family, friends and neighbours, trust in a given physician, and closeness of specific clinical centers to own house. The presence of so many different clusters in the phylogeny also indicated that different viral introductions occurred at different times, as expected for this young epidemic. A number of authors have recently attempted the timescale reconstruction and characterization of epidemiological clusters among individuals infected with HIV-1 (particularly MSM) using molecular clock-based approaches [39,40]. Hué et al. identified in United Kingdom (UK) different clusters of HIV-1 B subtypes concluding that different epidemics are involved the introduction of this subtype in MSM as we observed from our data. Although the UK cohort was bigger than our group of MSM, the peculiarity of our dataset is that it was obtained from a restricted and homogeneous group of MSM, mostly living in Rome and all attending clinical centers within the same metropolitan area. In the light of all these considerations, we can hypothesize that all individuals analyzed in the present study were infected by the same HIV-1 CRF02 AG. All the clusters within clades A and B, ranging from 2003 to 2009, had a low level of intraclade genetic heterogeneity, as shown by the different genetic distance, suggesting that these strains probably did not accumulate a large number of mutations over time. This hypothesis is supported by the positive correlation between the TMRCA estimates and the genetic diversity (ranging from 1.6 to 0.4%) observed over the time. Because Italians, and nonnational MSM did not form separate clusters but were instead rather interspersed within the same cluster, it was not possible to establish who infected whom. Probably some nonnational individuals arrived in Italy are already infected, while others did not.
We did not find any significantly supported clades before 2000, probably because of the high rate of mortality among HIV-infected subjects before the introduction of highly active antiretroviral therapy (HAART); moreover, we should remember that our samples were collected between 2000 and 2012 [38]. Usually, one possible bias in the identification of epidemiological clusters is represented by the effects of convergent evolution due to drug therapy-associated selection pressure. Our demographic analysis by logistic model showed that the epidemic grew in Italy, which is to a certain extent supported by other authors findings about subtype B epidemic spread in high-income countries [41]. Indeed, the BSP showed that the epidemic was characterized by a slight exponential growth from the first half of 1990s (corresponding to the root of the tree) until the early 1995s. During this period, the effective number of infections increased by the low rate of 0.5 cases year [42]. After a constant grows the viral population experienced a bottle neck and a the number of cases decreased until 2010, probably because of the death of infected individuals in the community, but again increased after this date reaching a sort of plateau. This result is largely consistent with what happened among MSM over western Europe, where the HIV-1 upraise in the early 2000s was preceded by years of continuous but slow decline in the number of HIV-1 being diagnosed [33][34][35]43]. Moreover, the recent replacement of the endemic subtype by new subtypes from other geographic areas [44,45] may partly explain the decrease in the number of people infected with HIV-1 B subtype, but the epidemiological analysis did not show any trend in the recent clades which may support an increased proportion of cases in the non-Nationals. Therefore, we cannot suggest that a relation with a possible recent introduction of this recombinant form exists. However, the relatively high number of nonnationals (about 22%) analyzed in our group of MSM can ring alarm bell suggesting a significant spillover of CRF02 AG into the Italian population fostered by the steady increase of immigrants from different African countries during the past decade. Given the position of Italy in the Mediterranean Sea as a strategic travel route between Africa and western Europe, HIV-1 molecular epidemiology may change over time. Understanding the CRF02 AG epidemic from Africa to Italy may also play a fundamental role in assessing the potential spread of this viral strain within western Europe, given the vast exchange of persons and goods between southern and western Europe. Unfortunately, no consistent data from other similar national studies were available also to make useful comparisons with the same population of MSM or other ones.
Understanding HIV molecular epidemiology and potential future spread of different non-B subtypes has also clinical relevance. It is already known that differences among HIV-1 genetic forms may impact both the clinical management and surveillance of drug resistance, as a result of the effect of treatment on non-B HIV-1 strains [45,46].
Moreover, HIV-1 subtypes must be considered in the vaccine development process [45,47]. Although cross-clade immune reactivity has been detected among individuals and vaccine recipients, it is reasonable to expect that a vaccine with an antigenic composition including CRFs may produce a more effective response [45].