Imputation of the Date of HIV Seroconversion in a Cohort of Seroprevalent Subjects: Implications for Analysis of Late HIV Diagnosis

Objectives. Since subjects may have been diagnosed before cohort entry, analysis of late HIV diagnosis (LD) is usually restricted to the newly diagnosed. We estimate the magnitude and risk factors of LD in a cohort of seroprevalent individuals by imputing seroconversion dates. Methods. Multicenter cohort of HIV-positive subjects who were treatment naive at entry, in Spain, 2004–2008. Multiple-imputation techniques were used. Subjects with times to HIV diagnosis longer than 4.19 years were considered LD. Results. Median time to HIV diagnosis was 2.8 years in the whole cohort of 3,667 subjects. Factors significantly associated with LD were: male sex; Sub-Saharan African, Latin-American origin compared to Spaniards; and older age. In 2,928 newly diagnosed subjects, median time to diagnosis was 3.3 years, and LD was more common in injecting drug users. Conclusions. Estimates of the magnitude and risk factors of LD for the whole cohort differ from those obtained for new HIV diagnoses.


Introduction
The majority of clinical cohorts of HIV-infected people are made up of seroprevalent subjects whose dates of seroconversion are unknown [1][2][3]. Seroprevalent subjects have been used to quantify the magnitude and risk factors of late diagnosis of HIV infection, an important public health problem which, by definition, cannot be studied in seroconverter cohorts [4,5]. Although there are multiple definitions of late diagnosis based on different biological markers [4,[6][7][8], most of them are based on the patient's CD4 lymphocyte count close to the date of HIV diagnosis. For some persons, HIV may have been diagnosed before their inclusion in a clinical cohort; therefore, no CD4 counts close to HIV diagnosis are usually available. Consequently, these people are ignored, and estimates are obtained only from those with available CD4 counts-largely the new HIV diagnosesrather than from the whole cohort. Most clinical cohorts include newly diagnosed people as well as people who have been diagnosed in the past, but the latter group is rendered 2 AIDS Research and Treatment invisible. The use of multiple imputation techniques to estimate the time between HIV seroconversion and HIV diagnosis could overcome the aforementioned problem. These techniques, which so far have not been applied to study late HIV diagnosis, are based on the correlation between certain biological markers like CD4 lymphocytes and the duration of infection [9][10][11].
The magnitude of late HIV diagnoses in the subgroup of new HIV diagnoses in cohorts from industrialized countries ranges from 18% to 39% [4,5,[12][13][14]. For these cohorts, the proportion of subjects who are new HIV diagnoses-and therefore can be analyzed-ranges from 4% to 73% [4,5,[12][13][14][15]. In Spain, considering late diagnosis as subjects with a CD4 lymphocyte count of <200 cells/mm 3 or an AIDSdefining disease in the first year after HIV diagnosis, we reported 37% of late diagnosis in 2004-06 in the 68% of subjects who could be evaluated because they were newly diagnosed at inclusion in the cohort [16]. Risk of late diagnosis increased with age, was higher in men than in women, and, contrary to previous publications [12,17,18], was higher in heterosexuals and injection drug users (IDUs) compared to men who have sex with men (MSM). We hypothesized that this unexpected finding may reflect that the new diagnoses represent a different population than the old ones, which could not be evaluated for late diagnosis analyses [16]. To test this hypothesis, we estimated the magnitude and risk factors of late HIV diagnosis, in all cohort members and separately in those newly diagnosed, in a multicenter cohort of seroprevalent subjects in Spain for whom we have imputed their HIV seroconversion dates.

Methods
CoRIS is an open, multicenter, and prospective cohort of adult patients with confirmed HIV infection who are naive to antiretroviral treatment (ART) at the first visit to any of the CoRIS centers and who agree to participate in the study by signing an informed consent form. A complete description has been published elsewhere [19]. Briefly, CoRIS collects a minimum dataset which is subject to internal and external quality controls. Between January 2004 and October 2008, 4,057 subjects were recruited from 27 participating centers where the percentage of CD4 lymphocytes (hereinafter referred to as "CD4%") was measured. A total of 231 subjects were excluded because they had recently been recruited, and no CD4% results were available, and 159 were excluded because their first CD4% values were recorded after ART initiation. Accordingly, 3,667 patients were available for analysis.
Subjects were classified as late diagnosis (LD) when the diagnosis of HIV infection was made more than 4.19 years after seroconversion. This cut-off point was chosen because, in a previous publication [20], it was estimated that this was the time elapsed from seroconversion to reaching a CD4 threshold of <350. In turn, this CD4 lymphocyte threshold is used in the new definition of late presentation recommended by the European Late Presenter Consensus Working Group [6].
A multiple imputation technique was used to estimate the date of seroconversion of all CoRIS subjects, based on the model for progression of infection described by Muñoz et al. [10], which has been used in Spain [11]. These authors use parametric survival models based on the Weibull distribution to estimate the time elapsed between the date of HIV seroconversion and the date of first CD4% in the absence of ART, on the basis of that first CD4%. Their paper describes the model's parameter for each of the five thresholds in which CD4% is categorized.
This model and its coefficients allow us to know the probability that the date of seroconversion falls before a given date, conditioned by the fact that it must be between the date when the subject started being at risk for HIV infection and the date of HIV diagnosis. From this model equation, we can estimate (impute) the timespan between the date of seroconversion and the date of HIV diagnosis when the following information is made available for each subject: (a) date when the subject started being at risk for HIV infection, (b) date of HIV diagnosis, and (c) the value of CD4% and the date it was measured.
We used the following imputation process: (1) for each individual, a random number was drawn from a Weibull distribution with the parameters corresponding to his/her CD4% threshold, which was considered a random estimate of the timespan between the date of seroconversion and the date of first CD4% (t). This made it possible to calculate the timespan between the date of seroconversion and the date of HIV diagnosis ("time to HIV diagnosis", t 1 ), and the date of seroconversion as the difference between the date of first CD4% minus time t. Subjects whose time t 1 was longer than 4.19 years were considered late diagnoses. (2) The preceding process was replicated 20 times. Twenty different databases were generated with the information obtained in each replication. (3) The subsequent analyses were made by combining the results obtained when analyzing these 20 databases separately.
We also present the results obtained using the definition that classified subjects as delayed diagnosis (DD) when they had a CD4 lymphocyte count of <350 cells/mm 3 in the first year after HIV diagnosis or an AIDS-defining disease in the first three months after HIV diagnosis. Thus, this definition only permitted the evaluation of subjects for whom that information was available, that is, the new HIV diagnoses.
We assumed that the date when a subject started being at risk for HIV infection was the beginning of the epidemic in Spain, 1 January 1980, except in (a) patients infected by the sexual route or by injecting drug use who were born after 1 January 1965; for these subjects, we used the date of their 15th birthday, and (b) patients in the remaining transmission categories who were born after 1 January 1980, for whom we used their date of birth.
We present a descriptive analysis of the characteristics of subjects included in the analysis, as well as their time to HIV diagnosis. We used an analysis of variance for the comparison of means, to compare the time to HIV diagnosis according to patient characteristics.
To evaluate the factors independently associated with late diagnosis, we used a multivariate logistic regression model. In this model, robust methods were used to estimate the confidence intervals, assuming correlation among subjects  recruited in each center and independence between subjects in different centers [21]. The analyses were performed using R version 2.13 [22] and Stata 11.

Results
Of the 3,667 patients included in this analysis, most were men (77.8%), were infected by sexual transmission (43.1% MSM and 37.5% heterosexual), and were Spanish nationals (68.5%); 15.8% had been infected through injecting drug use. The mean age at HIV diagnosis was 34.8 years (SD = 10.2) and the median follow-up time was 1.38 years. At cohort entry, 442 patients (12%) had been diagnosed with AIDS, another 191 (5.2%) developed AIDS, and 86 persons (2.3%) died during followup.  Table 1 shows the distribution of years elapsed between the mean imputed date of seroconversion and the date of HIV diagnosis. Overall, the median time to HIV diagnosis was 2.8 years (IQR: 1.2-5.2).

Description of Time from Imputed
Time to HIV diagnosis was longer in men, in persons with heterosexual or "other" routes of transmission (vertical, transfusions, tattoos, . . .), and, in those from countries other than Spain, it also increased with age at HIV diagnosis and was longer in patients who developed AIDS and in those who died. Table 2 shows the distribution of late diagnosis according to the sociodemographic characteristics of the subjects and the odds ratio based on the multivariate analysis. Factors independently associated with late diagnosis in the multivariate analysis were male gender, place of origin Sub-Saharan Africa or Latin America, and older age at HIV diagnosis. Subjects with heterosexual transmission had a higher frequency of late diagnoses than MSM although that higher frequency did not attain statistical significance.  (Table 3), longer than the median of 2.8 years estimated for the whole cohort.
These differences can partly be explained by the fact that the 739 subjects excluded from the analyses were significantly different (P < 0.05) from the 2,928 who were included; in the following ways, they were younger at diagnosis (mean age 30 versus 36 years) and at seroconversion (mean age 28 versus 32 years); they were more frequently IDUs (38.2% versus 10.1%); they were more often of Spanish origin (75.0% versus 66.9%). Table 2 shows the distribution of late diagnosis and the results of the multivariate analysis in this subcohort. Unlike what was seen in the whole cohort of 3,667 subjects, IDUs had a higher frequency of late diagnoses compared with MSM. Subjects with heterosexual transmission also had a significantly higher frequency of late diagnoses than MSM.
With regard to sex, age of diagnosis, and country of origin, the results were similar to those for the whole cohort. Table 3 shows the estimated time to HIV diagnosis in this group and the percentage of delayed diagnoses according to the definition DD. For each of the sociodemographic characteristics studied in the subgroup of 2,928 new HIV diagnoses, we observed high consistency, except in women, between time from imputed seroconversion date to HIV diagnosis and frequency of delayed diagnoses (DD).

Discussion
This study illustrates the application of a multiple imputation method to estimate the date of HIV seroconversion in a cohort of seroprevalent patients who are not all newly diagnosed with HIV at entry. We defined as late diagnosis the subjects with times to HIV diagnosis longer than 4.19 years. The advantage of this definition is that it allows estimation of late diagnosis in the whole cohort and not just in patients with CD4 markers close to the time of HIV diagnosis.
Half of the cohort members were not diagnosed with HIV until 2.8 years after becoming infected, and one fourth were not diagnosed until 5.2 years after infection. Based on the multivariate analysis, the time between the imputed date of HIV seroconversion and HIV diagnosis was longer in men, increased with age, and was longer in persons from Sub-Saharan Africa and Latin America compared to Spaniards. In contrast, half of the new HIV diagnoses at entry into the cohort were not diagnosed until 3.3 years after their imputed 4 AIDS Research and Treatment HIV seroconversion date, and diagnostic delay was more common in IDUs. By imputing the date of seroconversion, we have shown that the magnitude of late diagnosis in the whole cohort was smaller than in the subgroup of new diagnoses (34% versus 39%). In addition, we found differences not only in the magnitude of late diagnosis but also in the associated risk factors. These differences reflect the important changes in HIV epidemiology, and probably in HIV testing practices as well, that have taken place in Spain in the last decade: a major reduction in the number of IDUs who were exposed to frequent HIV testing opportunities, together with an increase in sexually acquired infections which continues to require more active HIV testing approaches. As CoRIS is not population based, these conclusions cannot be extrapolated to the whole HIV-positive population in Spain.
Our group had already evaluated late diagnosis in the cohort, but limited to those patients with an HIV diagnosis close to the time of their inclusion in the cohort [16]. We had observed a very high prevalence of late diagnoses in IDUs, a result that differed from other studies carried out in Spain which described very high HIV testing uptake in IDUs [12,17,18]. Here, by imputing the date of seroconversion, which allows study of the whole cohort, we no longer see a higher frequency of late diagnoses in IDUs although this pattern continues to be seen in the subgroup of new diagnoses. What this reflects is that IDUs diagnosed with HIV before cohort entry-in drug attention centers-were excluded from the analyses. Together with a marked decline in the number of IDUs among new HIV diagnoses in Spain, the analyses of late HIV diagnosis within the surveillance system have also identified a higher frequency of late diagnosis among IDUs [23]. Consistent with previous publications from Spain and other countries [12,13,16,17,23], late diagnosis is higher in men, in migrants from non-Western countries, and increases with age.
The results of this study are also important for comparison purposes as the proportion of new HIV diagnoses in a cohort may vary between cohorts and within the same cohort over time. For example, cohorts may increase the number of recruiting sites, or HIV incidence may change in a given group. In this work, we highlight the fact that new HIV diagnoses do not represent the whole cohort and that their relative contribution needs to be taken into account when comparing different cohorts or when interpreting trends over time.
Our results are based on imputing the date of seroconversion by using the first available CD4 percentage from each patient while off treatment. Other authors have observed that this estimate can be improved by using the evolution of various CD4 measurements [24,25]. We also performed this imputation process for each measurement of CD4 percentage and estimated the date of seroconversion as the median date of seroconversion estimated by the imputation for each value of CD4. No differences were found with this analysis; the median date of seroconversion was 1 July 2002. This may be because the median number of CD4 measurements in persons off treatment was only two, since most people start treatment soon after entry.
We also conducted several sensitivity analyses using different assumptions about the date of initial risk, and the results were similar.
To evaluate the influence of the distribution model initially selected to impute the date of seroconversion [10], we analyzed the data based on Weibull models with different parameters which, in some cases, permitted a subject to have been infected for 30 years at the time of CD4 measurement. The results obtained did not differ substantially from those presented. Time to HIV diagnosis and delayed diagnosis (DD) was not highly consistent in women. Some studies have shown that, after seroconversion, women take longer than men to reach the same CD4 level [20,26]. Lodi et al. estimate these differences at between 6 and 12 months [20]. We conducted an analysis considering for women a Weibull distribution with the same shape parameter, but with a median of 9 months longer than for men. In this simulation, differences between men and women in time to diagnosis and in the percentage of late diagnosis disappear.
In conclusion, estimates of the magnitude and risk factors of late HIV diagnoses for an entire cohort may differ from those obtained for new HIV diagnoses, a finding that highlights the need to both improve and expand HIV testing practices in our setting.