Reliability and Validity of the Greek QLQ-C30 and QLQ-MY20 for Measuring Quality of Life in Patients with Multiple Myeloma

Objectives. The aim of this study was to assess the psychometric properties of the Greek EORTC QLQ-C30 and QLQ-MY20 instruments. Method. A sample of myeloma patients (N = 89) from two tertiary hospitals were surveyed with the QLQ-C30, QLQ-MY20 and various demographic and disease related questions. The previously validated Greek SF-36 instrument was used as a “gold standard” for health-related quality of life (HRQoL) comparisons. Hypothesized scale structure, internal consistency reliability (Cronbach's alpha) and various forms of construct validity (convergent, discriminative, concurrent and known-groups) were assessed. Results. Multitrait scaling confirmed scale structure of the QLQ-C30 and QLQ-MY20, with good item convergence (96% and 72%) and discrimination (78% and 58%) rates. Cronbach's α was >0.70 for all but one scale (cognitive functioning). Spearman's correlations between similar QLQ-C30 and SF-36 scales ranged between 0.35–0.80 (P < 0.001). Expected interscale correlations and known-groups comparisons supported construct validity. QLQ-MY20 scales showed comparatively lower correlations with QLQ-C30 functional scales, and higher correlations with conceptually related symptom scales. Conclusions. The observed psychometric properties of the two instruments imply suitability for assessing myeloma HRQoL in Greece. Future studies should focus on generalizability of the results, as well as on specific issues such as longitudinal validity and responsiveness.


Introduction
Multiple myeloma (MM) is the second most prevalent blood cancer (13%), after non-Hodgkin's lymphoma, and represents approximately 1% of all cancers [1]. The average age of onset is 70 years and it affects slightly more men than women. The precise etiology of MM has not yet been established, although the possible role of environmental and/or occupational factors [2] and low socioeconomic status [3] has been suggested. Myeloma is generally thought to be incurable with a median survival of five years [4], which might be prolonged in younger patients, due to less comorbidity. Recent years have seen advances in treatment with steroids, chemotherapy, radiation, thalidomide, stem cell transplants, and newer drugs such as lenalidomide and bortezomib, all of which aim to control the disease and prolong survival. However, these palliative and life-prolonging treatments are associated with various side effects which impair health-related quality of life (HRQoL). Typically affected dimensions are physical functioning, fatigue, and pain [5].
Today, HRQoL is considered an important endpoint in cancer clinical trials, and a better understanding of it may lead to enhanced care of cancer patients in the future [6]. However, in haematological research the number of studies reporting HRQoL, either as a primary or a secondary endpoint, is low compared to studies involving other cancer patients [7]. According to a recent review of MM-randomized controlled trials also assessing HRQoL (among other endpoints), the methodology of collecting and reporting HRQoL must be improved before results might actually influence clinical decision making [8]. The same review identified the European Organization for Research and Treatment 2 The Scientific World Journal of Cancer Core Quality of Life Questionnaire (EORTC QLQ-C30) [9] as the most frequently used instrument for measuring patient-reported HRQoL in MM, as it has been previously demonstrated to be reliable and valid in this particular patient group [10].
The EORTC has developed a myeloma module referred to as QLQ-MY20, to be administered alongside the core QLQ-C30 [11]. The module has been validated and shown to be measuring additional aspects of HRQOL, such as body image and future perspective [12]. Both instruments have been translated into Greek according to procedures reported elsewhere [13]. In particular, the QLQ-C30 has gained popularity in Greece, in part due to recent studies having demonstrated its psychometric properties in heterogeneous [14,15] and homogenous [16] samples. We are, however, unaware of any Greek study having tested reliability and validity of the QLQ-C30 on MM patients. Moreover, despite being the only myeloma-specific instrument available in Greek, the QLQ-MY20 has not had its psychometric properties tested. In this context, the present validation is expected to increase confidence on future results obtained from the Greek QLQ-C30 and QLQ-MY20 and to make a contribution to the existing international body of knowledge on the subject.

Instruments.
The EORTC QLQ-C30 consists of 30 items and raw patient responses are transformed to produce 0-100 scores on five functional scales, nine symptom scales, and a scale representing global quality of life. Higher functional scale scores indicate better HRQoL, whereas higher symptom scale/item scores indicate higher level of symptoms [17]. The scales have undergone psychometric testing, based on classical test theory, which has yielded favorable results. The QLQ-MY20 is a 20-item myeloma module intended for use among patients varying in disease stage and treatment modality. The hypothesized scale structure consists of two symptom scales (disease symptoms and side effects of treatment), one-function scale (future perspective) and one single item (body image). The scoring approach is identical in principle to that for the function and symptom scale/items of the QLQ-C30, that is, a higher score on the symptom scales reflects a higher level of symptoms, whereas a higher score on future perspective, social support, and body image indicates good functioning or support. The validated Greek version [18,19] of the generic SF-36 instrument [20] was used as the "gold standard" for HRQoL assessment in this study.

Patients and Data
Collection. The data were collected between November 2009 and April 2010. Patients were recruited from two tertiary care hospitals in Athens: Evangelismos General Hospital and Alexandra General Hospital. It is noteworthy that the latter is a reference point in Greece for treating MM and has participated in many international research protocols [21][22][23][24]. Subjects were approached during a scheduled visit to one of the hospitals. None were suffering from cancer metastases or severe comorbid conditions which could further compromise HRQoL. The QLQ-C30, QLQ-MY20, and SF-36 were administered for self-completion followed by treatment-related and sociodemographic questions, in the presence of a trained study researcher to minimize missing data. Completion time was 20-30 minutes and all participants provided informed consent. The study was ethically approved by each Hospital's Review Board.

Analysis.
Percentages of ceiling and floor scores were calculated as an indication of the instruments' ability to distinguish between subjects at the top and bottom parts of the scale, and a 50% threshold was adopted for both [25]. Scale internal consistency reliability was assessed via Cronbach's alpha and the 0.70 standard for group-level comparisons was adopted [26]. Item internal consistency, which is substantial when correlation between an item and its hypothesized scale (corrected for overlap) is >0. 40, and item discriminant validity which is successful when correlation between an item and its own scale is significantly higher (>2 standard errors) than with other scales was used to examine the scale structure [27].
Spearman's correlations between QLQ-C30 and SF-36 scales were used to assess convergent construct validity. Based on the literature [28,29], scales measuring similar HRQoL dimensions were hypothesized to be strongly correlated (>0.50 [30]). Construct validity was also examined via the interscale correlations between QLQ-MY20 and QLQ-C30, assuming that conceptually related scales would correlate substantially (>0. 40), and that scales with less in common would show lower correlations [9,31]. Each SF-36 scale was separately regressed against the QLQ-C30 and QLQ-MY20 scales to identify aspects of the disease-specific instruments more closely linked to general HRQoL [32]. Bonferroni correction has been used to adjust all P values to account for multiple testing bias.
Tests of "known groups" validity were performed by comparing scale scores across groups known to differ by sociodemographic data and clinical indicators. According to the literature, men were expected to report better functioning on all QLQ-C30 scales, higher global QOL, and fewer symptoms than women [33]. A decrease of functioning scores with increasing age was expected for both genders, whereas a stem cell transplantation should be associated with better scores. Mann-Whitney and Kruskal-Wallis tests were used to examine group differences. Age was treated as a continuous variable by examining its correlation to scale scores. Regarding the QLQ-MY20, we are unaware of any published stratification of scores according to gender and age and thus can only hypothesize that the direction of the differences should assimilate those of the QLQ-C30. All statistical analyses were performed with SPSS version 15.0.

Results
Ninety patients from the two participating centers (33 and 57, resp.) were interviewed and only one patient refused to participate (response rate 98.9%). The two subgroups did not differ significantly in gender and age-group distribution, or in mean QLQ-C30 and QLQ-MY20 scores, implying that treatment center would not confound the results. The majority of the participants (N = 89) were males (56.2%) The Scientific World Journal 3 and diagnosed with MM 3.9 ± 2.3 (mean ± SD) years ago on average. Their age ranged between 36 and 87 (mean 64.2 ± 11.3, median 66), and most patients were married (80.9%). At diagnosis, 37 (41.6%) patients were classified as stage I, 29 (32.6%) as stage II, and 17 (19.1%) as stage III, and according to the criteria set by International Myeloma Working Group [34] whereas for 6 (6.7%) data were not available. Forty-one (46.1%) patients had previously undergone an autologous stem cell transplantation, 44 (49.4%) had not, whereas for 4 patients (4.4%) data were again unavailable.
Data on central tendency, variability, and reliability of QLQ-C30 and QLQ-MY20 scales are presented in Table 1. Five symptom scales/items of the core QLQ-C30, that is, nausea/vomiting, appetite loss, constipation, diarrhea, and financial difficulties suffered from high (>50%) floor scores, implying a lack of these symptoms in this sample, but also suggest an underlying reduced discriminative ability. Conversely, no ceiling effects were observed on the core instrument despite three scales being close to the threshold value (role, cognitive, and social functioning). As for the myelomaspecific module, only one ceiling effect was observed, regarding the body image scale. Throughout both instruments, only one scale (cognitive functioning) did not meet the 0.70 internal consistency criterion.
Significantly higher item-scale correlations between items and their hypothesized scales than with competing scales were observed ( Table 2). The 0.40 item-scale correlation criterion was satisfied, confirming item convergence, in 23/24 (95.8%) and 13/18 (72.2%) tests for QLQ-C30 and QLQ-MY20 scales, respectively. Accordingly, item discrimination was successful in 149/192 (77.6%) and 21/36 (58.3%) of the respective tests. The most obvious scaling failures occurred in the cognitive functioning and treatment side effects scales. Seven out of eight hypothesized strong correlations (>0.50) between pairs of QLQ-C30 and SF-36 scales measuring similar dimensions were confirmed ( Table 3). The only expected correlation between scales not confirmed was between general health scale of the SF-36 and global health status of the QLQ-C30 (ρ = 0.35). Noteworthy, strong correlations were observed between the QLQ-C30 and SF-36 physical functioning (ρ = 0.80) and pain scales (ρ = −0.73). All hypothesized strong correlations were statistically significant (P < 0.001).
Linear regression identified those QLQ-C30 and QLQ-MY20 scales most closely associated with specific domains of HRQoL (represented by SF-36 scales). In the core instrument, only one scale (nausea/vomiting) and two symptoms items (constipation and financial difficulties) were not significant predictors of any SF-36 scale. Explanatory power was satisfactory for most models, with the physical functioning (68.7%) and mental health (61.8%) models demonstrating the highest. These results are presented in Table 5 in which only significant predictors are shown. Regarding the myeloma module, all scales were significantly related to at least one SF-36 scale, and once again the physical functioning and mental health models showed the highest explanatory power (32.4% and 45.3%, resp.).
Regarding the ability of the instruments to distinguish between known groups, men reported better functioning and fewer symptoms than women on all scales (P < 0.05 or better). The differences were in the expected directions for the single-item symptom scales as well, although differences were not significant. Men also outscored women (P < 0.01 or better) on the QLQ-MY20 disease symptoms, side effects, and body image scales. Increasing age was associated with decreased physical and cognitive functioning, more constipation and disease symptoms (P < 0.05). A previous transplant generated better scores on all scales however the only significant difference was in the QLQ-C30 physical functioning scale (P < 0.05).

Discussion
The present study aimed to assess the reliability and validity of the Greek versions of the QLQ-C30 and QLQ-MY20 and, in doing so, to increase confidence in results from future studies in Greece using these instruments. The suitability of the EORTC QLQ-C30 has been tested in MM patients [10,35], but there are no other validated myeloma-specific quality-of-life questionnaires. Interestingly, a recent review of myeloma RCTs published after 1990, in which HRQoL assessment was included [8], identified only one study employing the QLQ-MY20 module [36]. According to EORTC data [37], the QLQ-MY20 module has been translated into more than 40 languages; however, we are unaware of any published validation studies regarding non-English versions of the instrument. Findings from the current study help fill an existing gap in Greece and contribute to the international literature on the subject.
The mean QLQ-C30 scores obtained from this sample of Greek patients are comparable to the EORTC reference values from a large sample of patients (N = 944) from a number of countries participating in various EORTC consortia [38]. The overall results for the psychometric properties of the two instruments, which was the main issue in this study, were satisfactory. Generally low floor and ceiling effects were observed, implying good discriminative ability and, perhaps, good responsiveness as well, although the latter could not be assessed in the present study due to its crosssectional design. Although the high floor effects on five symptom scales of the core instrument might be threatening this assertion, QLQ-C30 reference values reported by the EORTC show quite similar results [38].
Internal consistency reliability was acceptable, thus providing evidence that each scale is measuring a distinct construct. The only noteworthy exception was the QLQ-C30 two-item cognitive functioning scale (a = 0.568), which 4 The Scientific World Journal       has demonstrated exceptionally low reliability (a < 0.70) in QLQ-C30 validation studies in Greece [16] and elsewhere as well [29,[39][40][41]. The two aspects of cognitive functioning assessed, that is, concentration and memory, are not necessarily strongly associated with each other. For example, a patient who cannot concentrate well due to pain or fatigue may actually have good memory [29]. However, poor internal consistency reliability may be balanced by clinical relevance, which is an equally important consideration in HRQoL instrument development [42]. Good results for multitrait scaling confirmed the hypothesized scale structure, implying that the translation of the items and the response choices are appropriate and that scale scores derived from the Greek version could contribute to cross-cultural comparisons. Cognitive functioning and treatment side effects were the only exceptions to overall scaling success for the QLQ-C30 and QLQ-MY20, respectively. The former was expected as a result of the scale's low internal consistency reliability mentioned previously. On the other hand, most of the items of the side effects scale showed strong correlations with the disease symptoms scale as well, and this explains its compromised item-scale discriminant validity. Expectedly, correlations between QLQ-MY20 scales and QLQ-C30 functional scales were comparatively lower than those among QLQ-MY20 and conceptually related QLQ-C30 symptom scales. Interscale correlations between scales of the myeloma-specific model showed strong correlation between disease symptoms and treatment side effects (ρ = 0.50, P < 0.001), which might be implying some overlap in the constructs measured.
Criterion-related construct validity was supported by confirmation of hypothesized strong correlations between QLQ-C30 and SF-36 scales measuring similar dimensions. Furthermore, most QLQ-C30 scales correlated more strongly with SF-36 scales measuring similar HRQoL dimensions than with those measuring different ones, providing evidence that scale captions imply conceptual similarities present on individual item levels. An exception was observed for the QLQ-C30 global QoL scale which demonstrated a stronger correlation with the SF-36 vitality scale than with the general health scale. Similar results have been reported elsewhere [16][17][18][19][20][21][22][23][24][25][26][27][28][29]. Possible explanations might be that the two instruments operationalize the HRQoL construct differently [43], or perhaps artefacts from the translation from English to Greek.
The OLS regression models showed relatively high proportions of HRQoL variance explained. The QLQ-C30 physical functioning scale was a significant predictor of four SF-36 scales, namely physical functioning, pain, general health, and vitality. Similarly, all QLQ-MY20 scales were significant predictors of at least two SF-36 scales, with body image contributing to five. These results provide evidence that the Greek versions of the instruments are indeed measuring standard quality of life aspects. Although disease symptoms are usually uncontrollable, aspects such as body image and future perspectives are perhaps easier to manipulate, and a systematic approach to their improvement may prove beneficial in terms of HRQOL for the MM population. Portions of variability of each SF-36 scale remained unexplained, since other demographic or health-related variables were not used in the regression models, and this could trigger further research.

6
The Scientific World Journal  Known-groups comparisons (quantitative results not shown for parsimony) generated expected results in terms of score differences and the direction of these differences. On the other hand, not meeting the criterion of statistical significance in some known-groups comparisons might be an artifact of confounding clinical and treatment factors which could not be controlled for, mostly due to sample size [16]. Comparisons of HRQoL according to disease stage were not performed because, unfortunately, we had access to the relevant data only at diagnosis (up to four years back for some patients) and it is likely that stage has changed. It should be noted that all other variables are contemporaneous with questionnaire administration.
The nature and design of this relatively small, cross-sectional study obviously limit its generalizability, but the work lays the foundation for future studies. Although internal consistency reliability and cross-sectional construct validity of the Greek QLQ-C30 and QLQ-MY20 were satisfactorily demonstrated, test-retest reliability and longitudinal construct validity and responsiveness were not addressed. These properties are particularly important as cancer patients' health status changes. Thus, a longitudinal study design could be considered to overcome these limitations. Overall, the psychometric properties of the Greek version were similar to the original English version for the QLQ-MY20 and other language versions for the QLQ-C30 [12,29,31,[39][40][41][42]44]. In an attempt by physicians to choose better therapy protocols and adopt better care strategies for patients, these two instruments appear to be suitable for HRQoL measurement in Greek myeloma patients.