The McGill Pain Questionnaire , Japanese version , reconsidered : Confirming the theoretical structure

Department of Anesthesiology, Kanagawa Cancer Center, Kanagawa, Japan; Department of Psychiatry, Gunma University School of Medicine, Gunma, Japan; Department of Anesthesiology, Saitama Medical School, Saitama, Japan Correspondence and reprints: Dr Mamoru Hasegawa, Department of Anesthesiology, Kanagawa Prefectural Cancer Center, 1-1-2 Nakao, Asahi-ku, Yokohama, Kanagawa 241-0815, Japan. Telephone +81-45-391-5761, fax +81-45-361-4692, e-mail m-hasegawa@cam.hi-ho.ne.jp Received for publication August 28, 2000. Accepted December 10, 2000 M Hasegawa, S Hattori, M Mishima, et al. The McGill Pain Questionnaire, Japanese version, reconsidered: Confirming the theoretical structure. Pain Res Manage 2001;6(4):173-180.

T he measurement of pain, particularly chronic pain, is difficult due to its complexity.One of the most important roles of clinicians is to evaluate -precisely and objectively -the subjective experience of pain.It also has become important to use universally accepted scales to evaluate pain so that data can be compared internationally.
The McGill Pain Questionnaire (MPQ) is one of the most widely used and sensitive clinical tools for the verbal assessment of pain (1).The MPQ is theoretically grounded in the gate control theory of pain (2) and is designed to assess the three dimensions of the pain experience (sensory, affective and evaluative) hypothesized by this theory.The MPQ comprises 20 subclasses of qualitatively and quantitatively ordered pain descriptors.The three interrelated, yet distinct, dimensions of the pain experience (sensory, affective and evaluative) are derived from the first 16 subclasses of these descriptors.These, as well as the remaining four sets of descriptors, which comprise a miscellaneous dimension, are summed to yield a total Pain Rating Index (PRI) based on the rank values of words chosen.The descriptors describe sensory (subclasses 1 to 10), affective (subclasses 11 to 15), evaluative (subclass 16), and mixed sensory, affective and evaluative (subclasses 17 to 20) aspects of pain.Previous studies using factor analysis revealed that these descriptors can be classified into one of three main subscales (sensory, affective and evaluative descriptors) or a miscellaneous fourth subscale (3).Each subclass has several descriptors ranked in order of intensity (rank value) beginning at the value 1, increasing one by one in series.The PRI is the sum of the rank values selected by the patient in each subclass.The score of each subscale is briefly described as the sensory score of the PRI, the affective score of the PRI, the evaluative score of the PRI (PRI-E) and the miscellaneous score of the PRI.The total score of the PRI was the sum total of each subscale score.Its purpose is to assess not only pain intensity, but also the multiple dimensions of the pain experience.Despite widespread acceptance of the MPQ in the field, many results from studies that have tested the construct validity of this instrument have been inconsistent.
Exploratory factor analyses have not consistently supported the presence of the three nonredundant factors predicted by Melzack's postulated model.Indeed, some factor analysis procedures identified two factors (4); others identified four (5,6), five (7), six (8,9) and seven factors (10).However, exploratory factor analyses are poorly suited to test a priori postulated factor structures.On the other hand, confirmatory factor analysis (CFA) enables the direct testing of factorial structures (11)(12)(13).Following a confirmatory methodology, Turk et al (11) and Lowe et al ( 12) confirmed Melzack's postulated tricomponent structure.In a recent study, Donaldson (13) reported that the a priori three-factor structure provides an acceptable fit according to data but that other models may fit as well or better.Donaldson (13) found three underlying dimensions that differ somewhat from the theoretical organization implied by the gate control theory (2); this alternative structure was called the semantic model.This model distinguishes three factors: sensory action, sensory evaluative factors and affective evaluative factors.Examination of the descriptors composing the PRI subclasses may cluster about three underlying semantic dimensions that differ somewhat from the theoretical organization implied by the gate control theory.Specifically, the temporal, spatial, punctate pressure, incisive pressure, constrictive pressure and traction pressure descriptors express semantic content of sensory action.The thermal, brightness, dullness, sensory miscellaneous, tension and evaluative descriptors connote sensory evaluation.The autonomic, fear, punishment and affective miscellaneous subclasses represent affective evaluation.(It is helpful when examining these groupings to attend to the descriptors themselves, rather than the subclass names, which are useful but not infallible summaries of their semantic content.)Generally, the distinction between the sensory and affective dimensions has held up extremely well, but there is still considerable debate about the separation of the affective and evaluative dimensions.The semantic model does not assume a separate evaluative factor.
Many investigators have reported moderately high correlations between the sensory, affective and evaluative factors (3,5,6,(11)(12)(13)(14)(15)(16)(17).Turk et al (11) found that cross-construct correlations were higher than within-construct correlations (reliability indexes).It was concluded that the subscales are, therefore, not different and that only the total score should be used.However, differences in languages and cultural backgrounds have hindered the wide use and standardization of a Japanese version of the MPQ (JMPQ) in Japan.The MPQ has been translated and reconstructed into many languages worldwide.In each translation, alterations have been made, where necessary, to make the questionnaire most relevant to roboré, sur le plan statistique, la validité du modèle empirique.OBJECTIF : Vérifier, au moyen d'une analyse factorielle de confirmation (AFC), la structure théorique de la version japonaise du questionnaire, qui ressemble à celle du questionnaire original.PLAN D'ÉTUDE : Les auteurs ont appliqué l'AFC à des données prospectives, recueillies dans un hôpital universitaire et obtenues de 199 patients externes, consécutifs, souffrant de douleurs chroniques pour vérifier la structure théorique de la version japo-naise du questionnaire.RÉSULTATS ET CONCLUSION : L'AFC porte sur les seize premières cotations de sous-classe de l'indice; l'étude a permis de dégager un modèle final bien adapté, pouvant expliquer 92 % de la covariance observée dans les données recueillies.Les résultats confirment l'hypothèse selon laquelle les sous-échelles sensorielle, affective et évaluative de l'indice sont représentatives du caractère multidimensionnel de la douleur, et ce, avec un minimum de chevauchement.Aussi est-il proposé de maintenir la structure théorique du questionnaire dans sa version japonaise.L'étude constitue la première étape vers la normalisation de la version japonaise du questionnaire et sert de pont culturel dans le domaine du traitement de la douleur entre le Japon et les pays anglophones comme le Canada.
the language.In the cross-cultural translation of psychometric tests, the 'decentring' concept of the back-translation technique is important (18).Decentring, a translation concept first outlined by Werner and Campbell (18), is the process by which one set of materials is translated into another language; it is not translated with as little change as possible but is translated to produce a smooth, naturalsounding version in the second language.Thus, using the back-translation technique, the language is translated more freely, so that the final image depicted in the translated version becomes as close as possible to that of the original.The major advantage of back-translation is that it gives researchers some control over the instrument development stage, because they can examine original and back-translated versions and make inferences about the quality of the translation.Because of such advantages, the back-translation technique was used in this study to create the JMPQ, which followed a format similar to that of the original MPQ (18).The authors previously used this JMPQ in patients with chronic pain to demonstrate the reliability and validity of the JMPQ as a pain rating scale (19).The original MPQ employed four mutually independent subscales to assess different qualitative aspects of pain.The subscales of the JMPQ did not display adequate discriminant validity compared with the original MPQ depending on the results of our previous paper (1,19).The four PRI components had relatively high intercorrelations within the JMPQ -similar to the English version (11).Therefore, the JMPQ had the same problems as the English version in assessing the various qualitative aspects of pain.These problems are likely a structural flaw of the original MPQ as opposed to structural changes caused by translation.As mentioned previously, a number of exploratory factor analysis studies have consistently failed to support the theoretical structure of the MPQ; however, a few previous CFA studies of chronic and acute pain statistically supported the a priori model (11)(12)(13)16,20).
The aim of this study was to confirm, through CFA, the theoretical structure of the JMPQ, which followed a format similar to that of the original MPQ.

PATIENTS AND METHODS
The present study evaluated 199 consecutive Japanese patients with chronic pain (96 male and 103 female; mean age 54.7±14.1 years) who consulted the pain clinic at Saitama Medical School Hospital (Saitama, Japan) from January to May 2000.Three per cent had less than secondary education, 7% had some secondary education, 43% completed high school, 25% had some postsecondary education and 22% had completed postsecondary education.The average duration of pain was 15.3±7.2months.To minimize the effects of treatment, patients were selected at their initial visit to pick up patients without any previous treatment experience.Regions of pain described by patients were the lower back (n=85), neck (n=58), face and/or head (n=24), shoulder (n=17) and other (n=15).Informed consent was obtained from each subject before data collection.
The JMPQ and other pain rating scales, such as the visual analogue scale (VAS), the verbal rating scale (VRS) and the numerical rating scale (NRS), were administered, and history (concerning previous treatment, regions of pain, psychological state, duration and intensity of pain, past history, present illness and educational level, etc) was recorded in a private interview of patients awaiting treatment.There was no variation in the patients' understanding of the JMPQ according to educational attainment.Therefore, no patients were excluded from the study because of their inability to comprehend the questionnaire.Patients were allowed to choose less than one descriptor in each subclass.The JMPQ used in this study, which followed a format similar to that of the original MPQ, comprised 78 pain descriptors in 20 subclasses and a five-point intensity scale (the present pain intensity scale) (1).Pain descriptors adopted in this study had been translated by Satow et al (21).Back-translation procedures were completed by a bilingual English linguist to ensure the adequacy of the Japanese version (18).Back-translation means 'backward and forward translation' by bilinguists.As in the Turk et al study (11), only the first 16 subclasses of the PRI were included in the CFA.Subclasses 17 through 20 were excluded from analysis, because they were labelled as miscellaneous items and were not classified according to the theoretical conceptualization of pain as a sensory, affective or evaluative phenomenon.
To test the reliability of the JMPQ, internal consistency was evaluated for each subscale using the alpha coefficient.To test the validity of the JMPQ, concurrent validity was assessed for each subscale as follows: the JMPQ subscale and total scores for all patients were correlated with scores on three well known and established pain rating scales -the 100 mm VAS, the five-point VRS and the 0 to 100 NRS -by using the correlation matrix derived from the above measures and the JMPQ scales.
Data obtained in this study were analyzed using the program package Amos 4.0 (SPSS Inc, USA) (22).Using Amos, a CFA was conducted to test the hypothesized threefactor structure (Figure 1).For the purpose of identification, the first element of each congeneric set was fixed to 1.0.
The primary statistical goal in CFA is to examine how well the observed sample data fit the hypothesized model.The assessment of the overall fit of the model is based on multiple criteria that reflect the substantive meaningfulness of the model, statistical criteria (eg, the amount of variances and covariances jointly explained by the model) and practical criteria (eg, the percentage of covariance explained by the model).
The second stage of the analysis began upon findings of a less than adequate fit of the sample data for the initial hypothesized model.Thus, sensitivity analysis was completed in an exploratory fashion to delineate the cause of the model misfit.All analyses were based on covariance matrices.
Additionally, intercorrelations between subscales were compared with the reliability indexes of the subscales.Intercorrelations between subclasses (items) were also calculated to clarify the item correlations.

RESULTS
In the CFA, the 16 items of the PRI were considered separate observed variables from which the latent (sensory, affective and evaluative) variables were estimated.Table 1 presents the descriptive statistics for each of the 16 items in the sample.On the present pain intensity scale, 50 patients (25.1%) described their pain as 'mild', 62 (31.2%) as 'discomforting', 48 (24.1%) as 'distressing', 25 (12.6%) as 'horrible' and 14 (7.0%) as 'excruciating'.
The CFA model evaluated using Amos can be best illustrated by a path diagram (Figure 1) of the theoretical model of the JMPQ.According to the model, the sensory dimension (a latent variable) is measured by the first 10 PRI subclasses (observed variables), the affective dimension by the next five subclasses and the evaluative dimension by the 16th subclass.Each of these relationships is illustrated in Figure 1 by an arrow linking the observed variable to the latent variable and is represented statistically by corresponding factor loading, to be explained below.The three dimensions of pain are theoretically intercorrelated, as indicated by the double-headed arrows in Figure 1 and statistically represented by correlation coefficients among the latent variables.The single-headed arrows leading to the observed variables in the diagram represent the 'unique' (error) portion of each measured variable, ie, that portion not accounted for by the corresponding latent variable.The variance of each of these unique terms is a parameter also estimated by Amos.In the model, the observed 'evaluative' variable does not have a unique or error term because it is the only term designed to measure the evaluative dimension.All of its variance, therefore, is accounted for by the latent variable of the evaluative factor.
As recommended by Byrne et al (23), assessment of fit was based on multiple criteria that reflected statistical, practical and theoretical considerations.Global assessment of fit was based on the likelihood ratio (the χ 2 to degrees of freedom ratio), the Tucker-Lewis Index (TLI) (24), the goodness of fit index (GFI) (25), and the Bentler revised normed comparative fit index (CFI) (26).Interpretations based on the TLI are indicative of the percentage of covariance explained by the hypothesized model, in which values less than 0.90 indicate that the model can be improved substantially (27).The GFI should range between 0 and 1, and high values (greater than 0.9) are associated with a good fit of the model.Adequacy of fit based on CFI values should be greater than 0.90 (26).

CFA
As shown in Table 2, the fit of the hypothesized model was poor from a statistical perspective (χ 2 [102]=258.16)and not acceptable from a practical perspective (CFI=0.773,GFI=0.852).In this regard, Byrne et al (23) and Bentler (28) have noted that researchers have been urged not to judge model fit solely on the basis of χ 2 values.Although the coefficient of determination for the first model was excellent (0.996), indicating that the combination of the 16 scales served to measure adequately the factors, the reliability of each observed measure (R 2 ) with respect to its underlying latent construct ranged from excellent to poor.Specifically, the most reliable scale in the first model was the PRI-3 (R 2 =0.59), and the PRI-9 was the least reliable (R 2 =0.01). 1, all factor loadings in the first model (model 1) were statistically significant, with the exception of the PRI-9 subclass scale.Taking into consideration the lack of fit of this model with the observed data, this model was rejected, and the analysis proceeded in an exploratory fashion to specify the sources of misfit in the current model and a series of alternative models.The PRI-9 subclass was retained in post hoc analyses to determine the possible existence of correlated errors with other subclasses and subscales.

SENSITIVITY ANALYSIS As shown in Figure
To generate alternative models, the constraints for each specified model were relaxed one at a time using the modification indexes as a guide for those parameters in which it made substantive sense to do so.The modification index represents the expected drop in χ 2 if a particular parameter were freely estimated, in which values less than 5.00 show little improvement in fit (23).Previous research with psychological constructs has demonstrated that to generate a well fitting model, it is frequently necessary to allow for correlated errors (23,29).This strategy resulted in a final model that allowed error uniqueness between two subclasses or between a subclass and subscale of the same measure to covary.The final model is presented schematically in Figure 2.
To assess the extent to which each newly specified model improves over the previous model, the difference in χ 2 between the two models is estimated.Because this differential is itself χ 2 -distributed, with the degrees of freedom being equal to the difference in degrees of freedom between the two models, the significance of this change can be tested statistically.An improvement in model fit is indicated by a large diference in χ 2 value.Reviewing results related to models 2, 3, 4, 5, 6, 7, 8 and 9 (Table 2), the estimation of each model yielded a significant improvement in fit over its predecessor.Each error covariance, when relaxed, resulted in a statistically significant χ 2 .The final model was acceptable from a practical perspective, explaining 92% (TLI=0.92) of the covariance in the observed data.As noted in Table 2, multiple indicators of fit consistently indicated the adequacy of the final model (eg, GFI=0.92,CFI=0.93,TLI=0.92).As such, model 9 resulted in satisfactory measurement of the pain experience with minimal overlap.

Subscale intercorrelations, discriminant evidence and item correlations
Correlations between subscales and reliability indexes of the total scale and each subscale are shown in Table 3. Subscale intercorrelations were moderate and smaller than reliability indexes.
Regarding the reliability of the JMPQ, internal consistency (Cronbach's alpha) of all subscale scores and the total score on the JMPQ, ranging from 0.58 to 0.80, are listed in Table 3.The total score of the PRI appeared to be satisfactory.The reliability index for the PRI-E cannot be computed because it has only one item.
Significant intercorrelations among the JMPQ scores and other pain rating scale scores (VAS, VRS and NRS), ranging from 0.58 to 0.82, indicate the validity of the JMPQ as a tool for assessing pain (P<0.05).
Intercorrelations between subclasses (the matrix of the item correlations) are also shown in Table 4. Relatively high intercorrelations were demonstrated between subclasses (items).

DISCUSSION
In 1992, Gracely (30) argued in an editorial in Pain that exploratory factor analysis methods do not reveal much about the structure of a pain questionnaire when patients are asked to choose items from the scale to describe their pain.Other methods can be used to assess the structure of the questionnaire, but when administered as a symptom checklist, factor analysis tells us more about the characteristics of the patients than the characteristics of the scale.This has been attributed to confusion between semantic and associative meanings.Depending on the studies, between two and seven factors have been reported (4)(5)(6)(7)(8)(9)(10).If exploratory factor analysis was used, there is no denying that a three-factor solution may say more about the similarities among the patients of the studies than the 'true' factor structure underlying the MPQ.But CFA was used, which is suited to test a priori postulated factor structure.A well fitting final model was yielded; therefore, theoretically, the results at least supported the hypothesis that the sensory, affective and evaluative subscales of the PRI were representative of the multidimensionality of the pain experience with minimal overlap.Nevertheless, it cannot be denied completely that the a priori three-factor structure merely provides an acceptable fit to the data; other models, such as semantic models, may fit as well or better, as Donaldson (13) pointed out.Additinally, the MPQ is theoretically grounded in the gate control theory of pain (2) and is designed to assess the three dimensions of the pain experience (sensory, affective and evaluative) hypothesized by this theory.Therefore, three subscales of the MPQ were basically assumed to be not completely independent, but mutually intercorrelated and roughly differentiated scales.It is inevitable, therefore, that there are correlated errors among the items (subclasses) of the MPQ.
Using data based on the responses of the chronic pain patients on the first three subscales of the JMPQ, a well fitting three-factor model was obtained through CFA.This model was established post hoc, by conducting a sensitivity analysis to identify the source of misfit in the initial hypothesized model.Improvement of fit was linked primarily to correlated errors among subclasses for the reasons stated above.
Turk et al (11) concluded that using only the total score of the MPQ was appropriate for pain assessment, because like the present results, the three subscales (sensory, affective and evaluative) were found to be highly intercorrelated.Moreover, reliability indexes were lower than cross-construct correlations.The conclusion of Turk et al (11) was supported by the present results, because contrary to Lowe et al (12), Donaldson (13) and Masedo and Esteve (20), the affective and sensory subscales did not seem to constitute completely independent dimensions of the same constructpain.As mentioned above and as Holroyd et al ( 16) pointed out, it cannot be denied that there are relatively high intercor-relations among three subscales in the Japanese version, similar to the original English version, in view of the process of specifying the final model shown in Table 2, and the result of intercorrelations shown in Tables 3 and 4. Therefore, there is some doubt as to whether three subscales of the JMPQ can actually measure different qualitative aspects of pain at the present time.Indeed, there is skepticism about the discriminant validity of the JMPQ, but one can say that the three subscales of the JMPQ (sensory, affective and evaluative) are theoretically differentiated based on the results of the CFA, but that they are also mutually intercorrelated and not completely independent scales.
To the authors' knowledge, the structure of the JMPQ has not yet been confirmed to be similar to the original format based on back-translation methodology.The present results confirmed the value of the current questionnaire in the field of pain medicine.The authors suggest that a rigorous translation of the MPQ based on back-translation methodology permits the comparison of pain descriptors internationally, despite different languages, thus facilitating international research exchange and communications.The authors expect this study to be the first step toward standardization of the JMPQ and to serve as a cultural bridge in the field of pain medicine between Japan and Canada.

CONCLUSIONS
The results supported the hypothesis that the sensory, affective and evaluative subscales of the PRI are representative of the multidimensionality of the pain experience with minimal overlap.It is suggested that the theoretical structure of the MPQ was kept in the JMPQ used in this study.Therefore, this study may become the first step toward standardization of the Japanese version of the MPQ and serve as a cultural bridge in the field of pain medicine between Japan and English-speaking nations such as Canada.
Theoretical structure of the JMPQ Pain Res Manage Vol 6 No 4 Winter 2001

Figure 1 )
Figure 1) Standardized estimates for three-factor structure (initial model) underlying the first 16 subclasses of the Japanese version of the McGill Pain Questionnaire (MPQ).Values in parentheses are t values of the parameter estimate.Values greater than 1.96 indicate statistical significance (P<0.05).MPQ subscales are sensory, affective and evaluative, and MPQ subclasses are Pain Rating Index (PRI)-1 to PRI-16.Regular boxes represent observed variables and ovals represent latent (unobserved) variables; double-headed arrows represent a pattern of intercorrelation and single-headed arrows leading from the latent constructs to the boxes are regression paths that represent the link from the factors to their respective set of observed variables.*Parameter is fixed to 1.0 for purposes of statistical identification.
1 subclasses (MI=37.25), the PRI-15 and the PRI-12 subclasses Theoretical structure of the JMPQ Pain Res Manage Vol 6 No 4 Winter 2001

Figure 2 )
Figure 2) Standardized estimates for three-factor structure of final respecified model underlying the first 16 subclasses of the Japanese version of McGill Pain Questionnaire (MPQ).Values in parentheses are t values of the parameter estimate.Values greater than 1.96 indicate statistical significance (P<0.05).MPQ subscales are sensory, affective and evaluative, and MPQ subclasses are Pain Rating Index (PRI)-1 to PRI-16.Regular boxes represent observed variables and ovals represent latent (unobserved) variables; double-headed arrows represent a pattern of intercorrelation and single-headed arrows leading from the latent constructs to the boxes are regression paths that represent the link from the factors to their respective set of observed variables.*Parameter is fixed to 1.0 for purposes of statistical identification