Psychometric Limitations of the Center for Epidemiologic Studies-Depression Scale for Assessing Depressive Symptoms among Adults with HIV/AIDS: A Rasch Analysis

The Center for Epidemiological Studies-Depression (CES-D) scale is a widely used measure of depressive symptoms, but its psychometric properties have not been adequately evaluated among adults with HIV/AIDS. This study used an item response theory approach (Rasch analysis) to evaluate the CES-D's validity and reliability in relation to key demographic and clinical variables in adults with HIV/AIDS. A convenience sample of 347 adults with HIV/AIDS (231 males, 93 females, and 23 transgenders; age range 22–77 years) completed the CES-D. A Rasch model application was used to analyze the CES-D's rating scale functioning, internal scale validity, person-response validity, person-separation validity, internal consistency, differential item functioning (DIF), and differential test functioning. CES-D scores were generally high and associated with several demographic and clinical variables. The CES-D distinguished 3 distinct levels of depression and had acceptable internal consistency but lacked unidimensionality, five items demonstrated poor fit to the model, 15% of the respondents demonstrated poor fit, and eight items demonstrated DIF related to gender, race, or AIDS diagnosis. Removal of misfitting items resulted in minimal improvement in the CES-D's substantive and structural validity. CES-D scores should be interpreted with caution in adults with HIV/AIDS, particularly when comparing scores across gender and racial groups.


Introduction
Depressive symptoms are common among adults living with HIV or AIDS, with an estimated 20% to 37% having a major depressive disorder [1,2]. Depression among adults with HIV/AIDS has been associated with high risk behaviors [3], low medication adherence [4], poor health outcomes [5], and reduced quality of life [6,7]. Accurately and reliably assessing depressive symptoms is critical to research aimed at understanding and treating depression and thereby improving the quality of life of people living with HIV/AIDS. Although a clinical interview by a trained professional is considered the gold standard for determining a diagnosis of depression based on DSM or ICD criteria, screening instruments and symptom rating scales are often used for research purposes [8]. Symptom rating scales have the benefit of providing a continuous measure of depressive symptom frequency or severity, and screening instruments include a cutoff score indicating the need for further evaluation or a probable diagnosis of depression. Both are typically quick and easy to self-administer.
Of the many different instruments used to assess depressive symptoms, the Center for Epidemiological Studies-Depression (CES-D) scale [9] is one of the more common measures used in HIV/AIDS research [8]. This 20-item selfreport instrument has demonstrated good reliability and 2 Depression Research and Treatment validity in a variety of populations [10,11]. Its development was based on Beck's four-factor model of depression, which includes positive affect, negative affect, somatic symptoms and retarded activity, and interpersonal difficulties. While many studies have documented this four-factor structure [12], some research suggests that these factors may vary across different groups [13], which raises concern about its generalizability.
Associations between CES-D scores and demographic factors, such as race/ethnicity [13][14][15], gender [16], education [17], and income [17][18][19][20], have been well documented. However, it is not clear whether these observed differences reflect true group differences, psychometric variability across groups, or a combination of both. Recent studies have begun to address the possibility of racial/ethnic and gender differences in the psychometric properties of the CES-D [14,[21][22][23][24][25][26][27], although, to our knowledge, none have addressed the influences of education or income, and none have adequately addressed these issues in the HIV/AIDS population.
For adults with HIV/AIDS, concern has also been raised about the CES-D's inclusion of somatic symptoms of depression, as they may overlap with disease-related symptoms and inflate depression scores among those with HIV/AIDS [28]. Others have shown that the somatic symptoms have little impact on the CES-D's ability to distinguish depressed and nondepressed adults with HIV/AIDS or other chronic conditions [29]. This issue is not limited specifically to HIV disease but has been debated in relation to other diagnostic groups as well [30][31][32][33].
Questions have also been raised about the validity of the CES-D's four positive affect items (felt as good as other people, hopeful, happy, and enjoyed life), leading some researchers to suggest excluding these items from the total CES-D score [34,35]. In an early study of inpatients with HIV/AIDS, two of the positive affect items (felt as good as other people and hopeful) were found not to differ between healthy controls and adults meeting DSM criteria for depression [29]. In the same study, it was also found that the two interpersonal items (unfriendly and felt that people did not like me) were unable to distinguish nondepressed and depressed patients with HIV/AIDS. How seriously these issues affect the reliability and validity of the CES-D for adults with HIV/AIDS remains unknown.
The psychometric properties of the CES-D have been systematically evaluated using both classical test theory (CTT) and item response theory (IRT) in a variety of populations [34,36]. Although many studies have specifically evaluated its structural validity using exploratory and confirmatory factor analysis, results have been inconsistent. Furthermore, these approaches are largely limited to the specific sample and cannot determine the degree to which items are equivalent across individuals. IRT approaches, such as Rasch modeling, have certain advantages over CTT and have been used to more fully describe the CES-D's underlying structure, to identify items with poor fit to the rest of the scale, and to identify items that perform inconsistently across groups [30,37,38].
Items that fail to perform consistently across groups are said to demonstrate differential item function (DIF). This occurs when a specific item is more or less easily endorsed by certain groups of respondents while controlling for differences in the underlying construct being measured [39]. In the case of the CES-D, DIF occurs when respondents who have the same underlying levels of depression but belong to different subgroups (e.g., gender, race, or health status) have different response patterns to a particular item. This occurs when the location of an item on the depression continuum varies depending on the respondent's group membership. Understanding whether items demonstrate DIF in relation to demographic and clinical variables is critical to interpreting the differences in CES-D scores across various groups [40].
Several studies have used CTT approaches to evaluate the utility of the CES-D for identifying risk of depression among adults with HIV/AIDS [29,41], but none have used Rasch modeling to evaluate aspects of validity and reliability, including unidimensionality and stability of item function across groups. Therefore, the purpose of this study was to evaluate aspects of the CES-D's validity and reliability using an application of the Rasch model in a sample of adults with HIV/AIDS. Results from this study will also determine whether there is differential item function, or DIF, in relation to several key demographic and clinical variables. Given that a 10-item version of the CES-D has also been suggested for adults with HIV/AIDS [41], we also use the Rasch model to evaluate the psychometric properties of this version in our sample. Although this analysis focuses on adults with HIV/AIDS, the findings may have potential relevance to other adults with chronic illness.

Participants and Setting. The Symptom and Genetic
Study was a prospective longitudinal study aimed at identifying biomarkers of symptom experience among adults with HIV/AIDS [48]. The Committee on Human Research at the University of California, San Francisco (UCSF) approved the study protocol (#10-01357). Participants were recruited from April 2005 to December 2007 using flyers posted at local HIV/AIDS clinics and community sites. Participants provided written informed consent and signed a Health Insurance Portability and Accountability Act release for the use of their protected medical information in research before participation. Study visits, each lasting approximately one hour, were conducted at the University of California, San Francisco, General Clinical Research Center.
Eligible participants were English-speaking adults at least 18 years of age who had been diagnosed with HIV infection at least 30 days before enrollment. To specifically address stable HIV/AIDS-related symptom experience, potential participants were excluded if they currently used illicit drugs (as determined by self-report or by positive urine drug testing prior to the baseline assessment); worked nights (i.e., at least four hours between 12 AM and 6 AM); reported having bipolar disorder, schizophrenia, or dementia; or were pregnant within the prior three months. Participants were not excluded for insomnia but were excluded for other diagnosed sleep disorders, such as apnea and narcolepsy. Research staff conducted eligibility screening by interviewing potential participants in person or by phone. A demographic questionnaire was used to collect information about the participant's age, gender, race/ethnicity, education, and income. A prior diagnosis of AIDS and current medications (including antiretroviral and antidepressants medications) were obtained by self-report. Urine screening was used to detect current illicit drug use both prior to and three days after enrollment. The most recent CD4+ Tcell count and viral load values were obtained from the participant's medical record.

Depressive Symptoms. The Center for Epidemiological
Studies-Depression (CES-D) scale [9] was used to assess the frequency of depressive symptoms in the previous week. The CES-D consists of 20 items selected to represent major symptoms in the clinical syndrome of depression. Total scores can range from 0 to 60, with scores of 16 and higher indicating the need for adults to seek clinical evaluation for major depression. The CES-D's four subscales and their range of scores are positive affect (0 to 12), negative affect (0 to 21), somatic symptoms and retarded activity (0 to 21), and interpersonal difficulties (0 to 6). The CES-D has wellestablished concurrent and construct validity [11,[49][50][51][52]. In this study, Cronbach's alpha coefficient for the CES-D was 0.88.

Statistical Analysis.
Descriptive statistics were used to summarize the study sample, and nonparametric tests (Kruskal-Wallis and Mann-Whitney ) were used to compare the nonnormal distributions of CES-D scores across demographic and clinical groups. A Rasch, partial-credit model application was used to analyze the CES-D scores using Winsteps Rasch analysis software, version 3.69.1.16 [53]. First, the rating scale properties of the original 20-item CES-D were evaluated. A stepwise process was then used whereby items failing to meet standard fit criteria were removed one at a time. If multiple items failed to meet the criterion for a given step, the item with the worst misfit was removed and the step was repeated with the remaining items until all items met the criterion set. Cronbach's alpha coefficient was used as a measure of internal consistency, and principal components analysis (PCA) was used to assess unidimensionality. The analytic approach is summarized in Table 1 and has been previously described [42,44,46,47,[54][55][56]].

Description of Sample.
Of the 560 adults who expressed interest in the study and were screened for eligibility, 116 were not eligible (primarily due to disqualifying psychiatric diagnosis, = 74) and 94 chose not to enroll (by either declining to participate or not showing for the first study visit). Of the 350 adults with HIV/AIDS enrolled in the study, three were excluded from analysis due to missing CES-D data. Thirty participants tested positive for illicit drug use at the second study visit (3 days after enrolling in the study), and their data were retained in the analysis to evaluate DIF related to drug use. Demographic and clinical characteristics of the 347 participants included in the final sample are reported in Table 2. Overall, depressive symptoms were common in this sample, and significant differences in total CES-D scores were observed based on the respondent's gender, education, viral load, antidepressant use, and current illicit drug use. There were also significant demographic and clinical differences in the 4 component scores of the CES-D.

Rating Scale
Functioning. The rating scale used in the CES-D met the criteria set. The average measures for each category and thresholds advanced monotonically.

Internal Scale Validity.
Of the original 20 items in the CES-D, five failed to demonstrate acceptable goodness-of-fit (see Tables 1 and 3 A principal components analysis (PCA) of the residuals, also performed using Winsteps, indicated that the Rasch dimension (depressive symptoms) explained only 32.5% of the variance in the original 20-item CES-D and 37.9% in the 15-item version, which were both below the criterion of ≥50%. The secondary dimension explained 9.4% and 7.4% of the variance in the 20-item and 15-item versions, respectively, both exceeding the set criterion of <5%.
Because these findings failed to support the unidimensionality of the CES-D as a whole, we complemented the analysis of the entire 20-item scale with a PCA of each of the four subscales in order to explore whether they demonstrated higher levels of unidimensionality. Although the variance explained in the subscales was generally higher than for the full scale (either the 20-item or 15-item versions), none of the subscales reached the set criterion of ≥50% explained variance (see Table 4).

Person-Response Validity.
In this sample, both the original 20-item CES-D and the 15-item version had an unacceptably high proportion of misfitting respondents (see Table 1), with both exceeding the criterion of <5%. The respondents who demonstrated a high degree of misfit were less likely to be male, White, or diagnosed with AIDS but did not differ from the rest of the sample with respect to any other demographic or clinical characteristic listed in Table 2.

Person-Separation Reliability and Internal Consistency.
The 20-item CES-D demonstrated acceptable personseparation reliability (index = 2.04) according to the set criterion of >2.0. The 15-item version had slightly lower person-separation reliability (index = 1.90), which did not reach the set criterion. Both versions were able to Note. After initial evaluation of the original 20-item CES-D, a stepwise process was used whereby items failing to meet criteria were removed one at a time, and only those meeting criteria in earlier steps advanced to subsequent steps. If more than one item failed to meet a criterion, the item with the worst fit was removed and the step was repeated with the remaining items. The last column includes a     Tables 3 and 5). For DIF by gender, item 17 (crying) was more easily endorsed by female and transgender participants compared to male participants (i.e., females and transgender participants had higher scores on this item than what would be expected by the Rasch model, while males had lower scores than expected), and item 15 (unfriendly) was more easily endorsed by transgender compared to male participants. For DIF by race, items 20 (could not get going), 18 (sad), and 6 (depressed) were more easily endorsed by White participants compared to Black participants, whereas item 19 (felt disliked) was more easily endorsed by Black participants than White participants. In addition, item 16 (enjoyed life) was more easily endorsed by White participants than by participants in the Other race group. For DIF by antidepressant use, item 20 (could not get going) was more easily endorsed by those taking an antidepressant than by those who were not. Finally, for DIF by AIDS diagnosis, item 8 (hopeful) was more easily endorsed by those who had not been diagnosed with AIDS than by those who had. Somewhat fewer but similar patterns of DIF were evident in the 15-item version of the CES-D (see Table 5). As shown in Table 3, two of the items with DIF in the original 20-item scale (items 8 and 16) also demonstrated poor item fit and were not included in the 15-item version, thus accounting for some of the DIF differences between the 15-item and 20-item versions. There was no DIF related to age, education, income, or illicit drug use in either version.

Differential Test Functioning.
To evaluate the impact of eliminating the five misfitting items, individual measures from the original 20-item CES-D and the 15-item version were compared by calculating the -score of the difference between the scores. As only 6 participants (1.7%) demonstrated -values exceeding ±1.96, we concluded that the 15item version generates similar measures to the original CES-D for the majority of the sample, despite the lack of unidimensionality in both versions. Furthermore, the original 20item CES-D and the 15-item measures were highly correlated ( = 0.93, < 0.01), exceeding the criterion of >0.80 and < 0.05. [41]. In our final step, we also evaluated the 10-item version suggested by Zhang et al. [41] for use with adults living with HIV/AIDS (see Table 1). Even though the rating scale met the criteria in this version and differential test functioning was acceptable, one item (item #8) demonstrated misfit (10%) to the Rasch model, and the 10-item version also failed to meet the criteria for unidimensionality, as only 34.2% of the total variance in CES-D scores was explained by the first principal component. In addition, a higher-than-expected proportion of the sample (11.0%) demonstrated misfitting responses in this version. Most importantly, the 10-item version was not able to distinguish the sample into distinct levels of depressive symptom severity (separation index = 1.42). Note. > indicates that the item was more easily endorsed by the first group than the second group. There was no differential item function related to age, education, income, duration of HIV diagnosis, or illicit drug use. * The transgender group was small ( = 23) and results should thus be interpreted with caution.

Discussion
To our knowledge, this is the first psychometric evaluation of the CES-D using Rasch analysis in a sample of adults living with HIV/AIDS. The findings of this study indicate that there may be serious psychometric limitations to using the CES-D with this population. Five of the original 20 items demonstrate substantial item misfit to the Rasch model, but exclusion of these items resulted in little improvement of the scale's psychometric properties. In fact, both short forms we evaluated, one of which excluded all five of the misfitting items, demonstrated similar psychometric limitations to the full scale. Nonetheless, we recognize that the CES-D is widely used and its use will likely continue until an instrument with more robust psychometric properties is available. While there are other depression measures currently in use, to our knowledge, none have demonstrated robust psychometric properties for assessing depressive symptoms among adults with HIV/AIDS or other chronic illnesses. Thus, for those who use the CES-D to assess depressive symptoms among adults with HIV/AIDS, it is important that its psychometric limitations be considered, particularly when used in research settings and when comparing scores across groups, to avoid drawing invalid conclusions. Furthermore, we recommend that the 5 misfitting items be interpreted with particular caution, especially when their responses seem inconsistent with the rest of the scale, as they may not be measuring the same construct as other items.
Of the five misfitting items, three were positive affect items, which have been identified in other studies as being poorly correlated with the rest of the scale [34,35] and not useful for distinguishing depressed and nondepressed adults with HIV disease [29]. While it is possible that the positive affect items represent a separate construct, given the reversed scaling of these items, the possibility of response bias should also be considered. The other two misfitting items (poor appetite and restless sleep) are somatic in nature. These symptoms are also relatively common among adults with HIV/AIDS [48] and may be associated with aspects of HIV disease or chronic illness that are unrelated to depression.
The results of this study raise concerns about the unidimensionality of the construct measured by the CES-D. Even the subscales representing Beck's four components of depression failed to meet the standards of unidimensionality as defined for our study sample, thereby raising questions about the measure's structural validity. Studies in other populations have also reported a lack of unidimensionality due to misfitting items, although the issue could generally be corrected by excluding the misfitting items [24,30]. These results suggest that factors other than depressive symptoms might be influencing CES-D scores in this sample of adults with HIV/AIDS.
The gender-related DIF identified in this study was similar to that reported in non-HIV samples, with women generally being more likely to report crying and Blacks being more likely than Whites to report feeling disliked and less likely to report feeling sad or depressed or that they could not get going, regardless of their underlying level of depression [21,27,[57][58][59]. This study also included a small sample of transgender adults, and several items (crying, people were unfriendly, and lonely) demonstrated DIF for this understudied group as well. Two clinical variables, antidepressant use and AIDS diagnosis, each demonstrated DIF on a single item in the 20-item CES-D. Somewhat unexpectedly, there was no DIF related to illicit drug use, despite higher CES-D scores among those who tested positive for illicit drugs compared to Depression Research and Treatment 9 those who did not. It may be reassuring to know that drug use did not compromise the validity of CES-D scores in this sample, as conducting urine tests is not always feasible.
In light of the number of items with poor fit to the Rasch model or demonstrating DIF, the appropriateness of the clinical cutoff warrants further evaluation in this population. Items that are poorly correlated with the rest of the scale and items that result in differential responding in various subgroups may cause scores to vary systematically across subgroup, and it would be important to determine whether those differences are clinically meaningful with respect to clinical diagnosis and whether a higher or lower cutoff may be appropriate for different subpopulations of adults with HIV/AIDS. Further research in this area is warranted if the CES-D is to be used as a valid screening tool for clinical depression among a diverse population of adults living with HIV/AIDS.
Although it might be recommended that the five misfitting items identified in this study be omitted from the CES-D when used to assess depressive symptoms in adults living with HIV/AIDS, the resulting 15-item version was only slightly better in terms of unidimensionality, person-response validity, and DIF and was slightly worse with respect to personseparation reliability. Therefore, the psychometric properties of the 15-item version are not sufficiently better than those of the original CES-D to warrant such a recommendation.
A number of CES-D short forms have already been developed for use with various populations, either to minimize participant burden [60,61] or to eliminate items that discriminate poorly between depressed and nondepression samples [62]. A 10-item version of the CES-D has been previously recommended for use among adults with HIV/AIDS [41]. However, to our knowledge, it has only been evaluated in relation to its sensitivity and specificity to a 20-item CES-D score ≥16, and aspects of its internal scale validity (item fit or DIF) were not assessed. Thus, our findings provide additional evidence of the psychometric properties of this version of the CES-D in a similar sample of adults with HIV/AIDS. Even though the 10-item version omits three of the five misfitting items identified in our current study, as well as five of the eight items demonstrating DIF, the criteria for item fit, unidimensionality, and person fit were not met. Perhaps the most problematic finding was that the 10-item CES-D version was unable to separate the sample into even two distinct levels of depressive symptom severity (as would be indicated if the separation index was at least 1.5 [46]), as this raises serious concern regarding this short form as a useful measure of depressive symptoms among adults with HIV/AIDS.
The findings of this study need to be considered in light of several limitations. First, information was not available regarding the participant's diagnostic status for depression other than the proxy of taking antidepressant medication, and therefore, the utility of individual items for distinguishing depressed from nondepressed respondents in this sample could not be determined. This study evaluated a number of demographic and clinical variables but was not large enough to examine DIF within groups, such as gender with differential effects by age or racial/ethnic group. Furthermore, the sample in this study was sociodemographically diverse but had insufficient numbers of Hispanic/Latino participants to specifically evaluate DIF for this group. Lastly, most of the adults in this sample had been diagnosed with HIV for many years, and therefore, our findings cannot be generalized to adults who have been newly diagnosed.

Conclusions
In conclusion, in our sample of adults living with HIV/AIDS, the CES-D lacked internal scale validity (i.e., unidimensionality), even after excluding items with poor fit to the Rasch model, and several items demonstrated significant DIF in relation to gender and race. In light of these issues, CES-D scores should be interpreted with caution in this and possibly other chronic illness populations. Further research is needed to determine appropriate clinical cutoff scores and identify brief measures of depression with better psychometric properties.