Establishing the Psychometric Properties of the ICOAP Questionnaire through Intra-Articular Treatment of Osteoarthritic Pain: Implementation for the Greek Version

Objectives. In this prospective study, we intend to establish the psychometric properties of ICOAP for its use in studies involving the Hellenic population. Methods. SF-36 Health Survey was used as a standard against ICOAP scores from a sample of 89 patients (mean age: 71.07, 69 females) with hip and knee OA pain who underwent 2 treatment cycles of 4 intra-articular injections of sodium hyaluronate, separated by a 12-week medication-free time interval. Both questionnaires were filled twice with no missing data during follow-up. Results. ROC analysis accomplished ICOAP's criterion-related validation. Wilcoxon Signed-Rank Test and paired samples t-test endorsed ICOAP's responsiveness along with Effect Size values, standard response mean, and Relative Efficiency. Comparisons between the areas under curves (AUC) on ROC plots established external responsiveness. Cronbach's-alpha value favored ICOAP's internal consistency. This, along with intraclass correlation, results in both advocated reliability and content validity. Interitem discrimination was demonstrated by the ease of completion of ICOAP as well as the degree of familiarity with it. These findings inaugurated construct validity in collaboration with Spearman's and One-Way ANOVA results. Conclusions. ICOAP is a valid, reliable, and responsive QoL instrument and suitable for studies of osteoarthritic joint pain in the Greek setting.


Introduction
During recent decades, the ongoing increase in life expectancy has shifted the interest of health professionals towards new ways of disease management. In the case of osteoarthritis (OA), this interest addresses the most important need of every arthritic patient: to live a pain-free life at the lowest functional compromise.
Keeping in mind a variety of conservative and surgical treatment methods that are still in use, it becomes obvious that osteoarthritic pain comes into play as a major healthrelated quality of life (HRQoL) determinant that challenges any country's health system efficiency in terms of burden of disease.
Among developed HRQoL questionnaires focused on osteoarthritic pain, the Intermittent and Constant Osteoarthritis Pain (ICOAP) questionnaire for hip and knee osteoarthritis, a relatively new assessment tool, is the first to introduce the distinction of OA hip and knee pain in its two components: constant (ICOAP-CP) and intermittent (ICOAP-IP) pain. This distinction provides detailed information for each of these two kinds of pain separately as well as for total pain, thus forming a global view which differs from "pain on activity" as measured by all the preexisted questionnaires [1].
In this study, we attempt to establish the psychometric properties, namely, validity, reliability, and responsiveness, of the Greek version of the ICOAP questionnaire from a sample 2 Arthritis of OA patients following a specific treatment protocol applied for hip and knee OA by injecting intra-articular Sodium Hyaluronate (HYNa). The whole process was guided by the attributes and criteria set by Scientific Advisory Committee of the Medical Outcomes Trust [2].

Eligibility Criteria.
Eligible participants were individuals diagnosed with single joint (hip or knee) OA-related pain lasting for 3 months or more and meeting the clinical and radiographic criteria established by the American College of Rheumatology [3], along with Kellgren and Lawrence radiographic OA classification [4]. They also underwent lab tests to rule out infection or rheumatic/metabolic disorders.
All participants were native Greek language speakers and they provided informed consent to participate.
Ethical approval for the study was granted by the hospital's ethics committee in accordance with the principles of the Declaration of Helsinki [5].

2.2.
Sampling. This is an experimental study with pharmacological intervention without control group. 89 patients with chronic, hip or knee, osteoarthritic pain in a single joint regardless of bilateral OA existence were enrolled in the study between August 2011 and February 2012. A subsample (part of the original sample) of 25 people with the same severity unilateral knee OA was formed exclusively for the test-retest reliability needs.

Treatment Protocol.
All patients followed an 18-week therapeutic scheme consisting of two treatment cycles (phases A and C, resp.). Each phase lasted 3 weeks. Between these two phases, there was a 12-week period (phase B) without treatment. Each treatment cycle consisted of 4 intraarticular injections of Sodium Hyaluronate (HYNa) administrated once weekly.
The beneficial clinical effect of HYNa is known for at least two decades [9]. This kind of treatment is recommended among a number of applicable nonsurgical therapies for OA by the European League Against Rheumatism (EULAR) [10] while OARSI highlights its significant efficacy despite the conflicting conclusions among several studies [11,12]. In general, these guidelines demonstrate good structure and established criterion validity. Furthermore, they address degenerative and functional alterations in joint cartilage and are related to satisfactory outcome "discrimination" among the applied treatment modalities [13].
Patients completed both SF-36 and ICOAP questionnaires twice: first at the beginning (pretreatment) and then just after the last injection. Questionnaires were collected immediately after their completion. Investigators were blinded throughout the study with regard to both the identity of the participants and the answers given. ICOAP evaluates both constant pain and pain that "comes and goes." It consists of 11 questions in its final form; the first five refer to constant pain and the remaining six to "intermittent" pain. Preliminary psychometric testing has shown the ICOAP to be reliable and valid [1] as well as responsive [14].

Research
The main advantage of this assessment tool lies in the ascertainment that "intermittent pain" significantly impacts quality of life (QoL) especially when pain is intense and unpredictable. Among the 12 questions from the initial design of the instrument, one item, predictability of pain, was removed from subsequent analyses as correlations with other items and item-total correlations were low. This was attributed to its strong evidence of its subjectivity and the likelihood of degrading the questionnaire's psychometric properties [1].
Each question is scored from 0 to 4, and the sum of the 11 responses, as suggested by the developers, is further standardized on a scale between 0 (no pain) and 100 (worst outcome) [1].
Another important advantage that differentiates ICOAP from similar research tools is the fact that it raises questions about the distress and the effect of a painful state on a person's quality of life. These novelties have made it quite attractive for use [15].
ICOAP has been translated and cross-culturally adapted in parallel, using a common protocol, into 9 different languages other than English in order to test its adaptability to the specific cultural pattern of each society/nation, after which it became available for use in international multicenter studies [16]. Although it has also been available in Greek since 2007, it has not yet been validated for the Greek population.

SF-36.
The Health Survey SF-36 (Medical Outcomes Trust, Boston, MA) is a well-known multipurpose quality of life questionnaire having been used in numerous studies that measure the effects of various diseases [17].
It contains 36 questions which are followed by 2 to 6 possible answers. Each interviewee is asked to choose the response which best matches his/her actual health state.
Both structure and content are assigned to cover at least a minimum set of psychometric standards among those which are required for comparisons between different assessment groups. The Medical Outcome Studies (MOS) team gave SF-36 its final form focusing on two main health aspects, the physical and the mental one, thus forming two concise indexes, the Physical Component Score (PCS) and the Mental Component Score (MCS), respectively. These 2 indexes were formed by 8 other preexisting scales that represent 8 different health components such as Physical Functioning (PF), Role Physical (RP), Bodily Pain (BP), General Health (GH), Vitality (VT), Social Functioning (SF), Role Emotional (RE), and Mental Health (MH). Ratings range between 0 and 100. Practically, the greater the score, the better the health status [18,19].
The contribution of the Greek version of SF-36 research tool was essential for the study. Its translation procedure accommodated the guidelines of the International Quality of Life Assessment (IQOLA) Project followed by validation and reliability testing in the Greek population [18,20,21].
Although ICOAP seemed to be at first glance a diseasespecific tool, it also showed quality of life clues as stated above. For that reason, we selected SF-36 as it can effectively measure both pain and QoL characteristics. Paired samples -test along with One-Way ANOVA searched for differences between mean scores across treatment phases. The latter also investigated discriminant validity [22].
To test ICOAP's responsiveness, we used both Wilcoxon Signed-Rank Test and paired samples -test. Estimations of the Effect Size (ES), Standardized Response Mean (SRM), and Relative Efficiency (RE) were considered to be essential in a study design like this whereas repeated measurements provide continuous data. In order to emphasize we took into account these indices along with Spearman's correlation coefficients. Effect Size was estimated by dividing the Mean Difference by the SD of baseline means, while in SRM the denominator was Mean Difference's SD. ES and SRM values lesser than or equal to 0.3 and equal to or greater than 0.8 disclosed low or large ES, respectively, while those within the intermediate interval exhibited moderate outcome [23].
ICOAP's Relative Efficiency was assessed by SRM ratios of ICOAP subscales over those of SF-36, while criterion-related validity was determined by ROC analysis which compared predictors of pain measured by ICOAP and SF-36's BP scales. The same analysis was carried out for SF-PF and VT scales. With the interest focused on the adequate combination of sensitivity and specificity, a dichotomous external outcome criterion (cut-off point) which best discriminates improved from unimproved conditions had to be defined [24]. While for some variables the "zero" (absence of pain) served as cut-off point, for others the median value (or its lowest 95% CI) means that all scores greater than this corresponded to positive outcome/improvement.
As the Area under Curve (AUC) in ROC plot depicts the magnitude of accuracy, AUC values of 0.5 (i.e., the area under the diagonal) or greater were considered to be of importance. The 95% CI lower limit served as criterion of statistical significance.
It is noteworthy that ROC analysis provides useful evidence of responsiveness along with paired samples -test and Effect Size as proposed by Deyo et al. [25] (Table 3).
Convergent validation inquired high correlations between scales that measure the same/similar constructs, while divergent validity probed for scales that differ regarding the health aspect that they measure such as SF-VT and MH with those of ICOAP [26].
Based on ICOAP's Likert-pattern structure, its reliability was tested by Cronbach's-alpha coefficient [27] as well as testretest reliability by intraclass correlation coefficient.

Results
Immediately after the export of descriptive statistics (Table 1), we proceeded to normality testing of variables which suggested the use of nonparametric statistical tests.
Statistically significant correlations were found in treatment's phase A between all ICOAP scales and SF-PF, SF-BP, and SF-PCS, with the same for SF-VT with ICOAP-CP (Constant Pain) and ICOAP-TP (Total Pain) scores. Regarding phase C, all ICOAP components showed strong relationships with all SF-36 scores, except SF-RE and SF-MH. Strong relationships were also detected between all ICOAP scores in both treatment phases ( Table 2).
One-Way ANOVA analysis revealed for phase A significantly high values for both ICOAP subscales against SF-36 PF, BP, VT, and PCS scores ( value < 0.01) and moderate values for SF-VT ( value < 0.05) with ICOAP CP and IP demonstrating between 1.95 and 2.95 times higher values as compared to SF-36 scales. The same was observed for phase C where value was for ICOAP-CP 2,06 times and ICOAP-IP and 6.31 times higher, respectively, than each SF-36 scale ( value < 0.01) ( Table 5) Among  (Table 3).
Relative Efficiency outcomes less than 1 were detected only in the BP scale and for ICOAP-CP with SF, MH, and PCS, whereas all the rest were >1, and in some cases >2 (e.g., RE with ICOAP-TP) ( Table 6).
By comparing the areas under curves (AUC) in ROC plots, ICOAP-CP demonstrated the best AUC while both SF-36 BP and PF demonstrated the worst AUC with ICOAP-IM to be laid in between them. Specifically, ICOAP-IP presented acceptable results while ICOAP-CP presented good results (according to the positive actual state that was used) against SF-BP and PF. It should be noted that we set the value 0.5 as the practical lower limit for each AUC (Table 7 and Figure 1).
Cronbach's-alpha coefficient showed in phase A excellent (>0.9) internal consistency for ICOAP CP and IP and almost excellent (0.878) internal consistency for ICOAP-TP, while showing excellent internal consistency in all scales for phase C (Table 8).
Test-retest reliability in a subsample of 25 patients with unilateral Kellgren-Lawrence III knee OA and during the same time frame showed excellent (>0.75) degree of approximation between them in terms of pain severity as registered by ICOAP scores (Table 8).

Discussion
This study attempts validation of ICOAP's Greek version through a specific conservative treatment intervention for osteoarthritis in patients with hip and/or knee joint OA pain and the use of SF-36 is deemed essential because it is a popular, well-established HRQoL research tool which meets reliability and validity criteria [28].

Face Validity.
Face validity was established in a more empirical fashion as it was addressed on questionnaire's overall ease of use. Actually, ICOAP's interitem discrimination was established as easy as in SF-36. Participants clearly understood the purpose of each questionnaire. All questions were answered with the same degree of convenience while cases of completion inability were not reported due to lack of understanding.

Criterion-Related Validity.
Criterion-related validity analysis explored evidence of the extent to which ICOAP scores are related to OA-pain, that is, its accuracy in "discovering" that kind of pain, as well as in quantifying it during the course of time, regardless of the intervention applied.
In an attempt to study the bilateral nature of a clinical outcome, one has to select the appropriate boundary. In ROC analysis, the definition of the proper cut-off level is of the highest importance because as that level decreases, sensitivity increases while specificity decreases and vice versa [29].
Besides the existence/absence of pain (zero cut-off point), we also took into account the assumption that a reliable questionnaire should capture changes to at least 50% of cases. For that reason, the lower limit of the 95% Confidence Interval was chosen as the alternative cut-off point.
Although power analysis showed that 52 was the minimum required sample size, bootstrapping to 1000 patients prior to ROC plotting upgraded the precision of results.

Construct Validity.
Construct validation that was performed throughout the first treatment cycle explored the potential of ICOAP's overall construction to provide measurement results that warrant its task [30].
The absence of a control group halts construct validation according to Cronbach and Meehl, though ICOAP meets the criteria of the nomological network they have set [31].
As content and criterion-related validity along with interitem correlations are relevant to construct validity, the latter was demonstrated through these types of evidence as shown by associated tests and especially by high associations (Spearman correlation coefficients) or One-Way ANOVA results among the ICOAP components and SF-36 scales [32]. SF-BP seemed to be the most corresponding scale to ICOAP scores because it measures pain and its consequences in a person's daily activities. Anyway, Construct Validation requires numerous studies, not a single one [30].
Regarding Divergent Construct validation, we chose Pearson's c.c. for SF-VT against ICOAP TP because they both follow normal distributions.

Responsiveness.
The magnitude of SRM values in accordance with that of ES concludes that ICOAP-IP and TP display higher responsiveness than ICOAP-CP, while SF-BP displays the highest one. Nevertheless, ICOAP's responsiveness was established by paired samples -test results in conjunction with SRM, ES, and RE outcomes.
Values of RE > 1 in most scales provided evidence that the sample size was proper and capable for the detection of a specified ES [14].
It is meaningful that ES and SRM results were further confirmed by the highly significant "Wilcoxon Matched-Pair Signed-Rank Test" output. Practically, that high, in absolute values, ranked difference confirmed ICOAP's efficiency in reflecting posttreatment alterations in osteoarthritic pain (Table 3).
Attempting to confirm internal responsiveness, we estimated the magnitude of test-statistic value for each ICOAP subscale. Recorded values of 0 > 1.96 provided evidence of difference existence between two sequel measurements [33].
ROC analysis demonstrates an advantage over simple preand posttreatment comparisons in assessing scale responsiveness [27,33]. Considering an AUC of at least 0.70 to be adequate, only ICOAP-CP and (almost) ICOAP-IP accomplished that task [34]. Obviously, ICOAP subscales seemed to be slightly better pain discriminators than those of SF-36 which had scores lower than that threshold [35] ( Table 7).
This study showed almost the same results with preceding validations especially that of Bond et al. [23] which gave ESs and SRMs values within similar ranges. However, the type of treatment applied in each of these studies is of great significance as in treatment modalities with higher strength of recommendation (i.e., surgeries) one may expect better scores like those reported by Davis et al. [14].

Ability of ICOAP to Respond to Changes.
By estimating the depth of health outcome that ICOAP measured, we adopted the limit of 15% of patients with the lowest and highest scores, as proposed by McHorney and Tarlov [36]. With the data available, we noticed adequate effects: floor in ICOAP-CP, ceiling in SF-Social Functioning scale (phase C), and both ceiling and in SF-RP and RE. These findings for SF-36 match those from a previous study where responsiveness of WOMAC and SF-36 was tested in patients who had undergone hip replacement surgery [37].
ICOAP's floor effect was also reported in a study on patients with knee OA after treatment with physical therapy [22]. It is essential to underline that it is in line with SF ceiling effects because lower values in ICOAP correspond to better outcome (lesser pain) unlike in SF-36. So, the term "ceiling effect" of ICOAP seemed to be the most proper one instead of "floor effect." In an attempt to interpret these outcomes, one may hypothesize that extreme items may be missing in the lower ICOAP-CP scale with subsequent diminished content validity. The same hypothesis could be made for reliability as well as for responsiveness either because patients who scored at boundaries could not be further distinguished from one another or because changes between them could not be measured [34]. Such issues require a considerable number of studies in order to get a definite documented response.
Nevertheless, ICOAP shows comparable ability as compared to SF-36 in the detection of improvement after application of intra-articular treatment with HYNa.

Reliability.
Since we explored ICOAP's ability to yield the same score on each administration to a given person and that score is of that person's true ICOAP outcome, we raised a reliability issue [38]. Cronbach's-alpha along with ICC coefficient both provided powerful results for achieving this. The "homogenous" subsample of 25 patients with unilateral K/L III knee OA served the test-retest reliability process which also reported excellent interrater agreement in respect to consistency/reproducibility of ICOAP measurements that were made by different observers.

Internal Consistency.
According to the "rule of the thumb" of George and Mallery, ICOAP demonstrated from the initial assessment excellent internal consistency for constant and intermittent pain (Cronbach's-alpha > 0.9) and almost excellent (0.878) internal consistency for Total Pain, while demonstrating excellent internal consistency for all scales at the end of treatment, revealing effective distinguishing of both types of osteoarthritic pain [39].

Management of Measurement
Error. This "experimental" study entails some degree of measurement error. Notwithstanding, the theory of reliability is based on the measurement of random error. Such a great proportion of common variance that is included in each item among paired observations (i.e., high Cronbach's-alpha values) means increased effectiveness in the management of measurement error [40].
Having already provided evidence of high construct validity, we can state that we have perhaps overcome the constant error issue. Indeed, ICOAP's structure contributed to versatile and well-rehearsed responses. Furthermore, the constant number of patients (no follow-up loss) throughout the study accomplished paired observations of the same parameter in the same sample. Lastly, high intraclass correlation coefficient outputs diminished the measurement error, as ICC estimates the average correlation among all possible orderings of pairs independently of the order of measurement [41].
It is generally admitted that larger correlation coefficients are associated with greater differences between measurement outputs (i.e., the initial and the final stage of treatment). This comes into agreement with the overall beneficial therapeutic outcome.

Content Validity.
Construct validity subsumes all categories of validity [42] while content validation provides evidence about the construct validity of an assessment instrument [43]. Therefore, there's a reciprocal relationship between those two terms.
Cronbach's-alpha and ICC results provide extra evidence about the presence of Content Validity [44]. These outputs are in accordance with those that were reported for pain after knee replacement surgery [45].

ICOAP as HRQoL Instrument.
During validation procedure, ICOAP revealed several QoL characteristics. Apart from the first 2 items of each subscale, all the rest provide QoL information on some aspects of patient's life that the pain potentially could affect, for example, individual's concerns or mood. Compared with SF-36, there are similarities among ICOAP's 4th and 10th questions and SF-36's 24th and 5th questions along with 11th and 26th questions, respectively. Any omission of a specific item barely affects the high Cronbach's-alpha value, providing evidence that each question contributes equally to the overall power of the questionnaire. Furthermore, the AUC results of ROC analysis demonstrated an adequate prediction level assuming that ICOAP provides additional information about OA pain-related quality of life [1,44].
Despite the effort to identify some QoL characteristics of ICOAP, these are far from the original QoL character of SF-36 which still remains a benchmark for many researchers. This is confirmed by the associations made between ICOAP subscales and those of SF-36 which are not directly related to pain.

Limitations.
During the study, we faced obstacles and dilemmas that should be reported. As posted previously, this study was not focused on a treatment's clinical effect, but on a questionnaire's validation. That is the reason that we did not form control group. Nevertheless, lack of control group prevents the strict application of Cronbach and Meehl's guidelines for construct validation and although the latter was achieved indirectly in any case, we recommend that these guidelines must be applied in any similar case.
Absence of a specific questionnaire for rating the significance of each ICOAP question can potentially exclude direct Content Validation; it is important to take this necessity into account for the future.
ICOAP-CP's "ceiling" effect could be attributed to the severity of OA in patients because 37% of knees and 64% of hips were rated minimal or mild OA (Kellgren/Lawrence I and II) where pain is not as intense as in advanced (K/L III and IV) stages. It is noteworthy that the therapeutic outcome was strongly influenced by HYNa's optimal effect as numerically shown by paired samples -test mean scores or the percentage of patients who improved their ICOAP scores as well as by One-Way ANOVA values.
However, neither Relative Efficiency nor the ceiling effect affected ICOAP's responsiveness which was further confirmed by other statistical tests as described above.

Conclusion
The ICOAP demonstrated strong agreement between the actual and the theoretically expected measurement of the constant and intermittent osteoarthritic hip/knee joint pain. Indeed, ICOAP can effectively introduce both sides of the same coin and it can also accurately quantify any possible variation in each pain subscale, displaying higher predictive ability than the most relevant (to pain) SF-36 scales.
As compared to SF-36, ICOAP shows comparable ability in detecting OA-pain and a discrete preponderance in recognizing any possible shift in its characteristics during interventions.
Based on the above-mentioned results, ICOAP fulfills its objective and displays a high comparability grade as well with other "similar" assessment tools. Its application for evaluation and management of both OA-pain types provides valid and comparable data.
Concluding, although ICOAP lacks standard QoL features, it is a valid, reliable, and responsive OA-pain instrument for use in studies relative to hip and knee osteoarthritis in the Greek clinical setting.