Interpreting the meaning of pain severity scores

Faculty of Nursing, University of New Brunswick, Fredericton, New Brunswick Correspondence and reprints: Dr Marilyn J Hodgins, Faculty of Nursing, University of New Brunswick, PO Box 4400, Fredericton, New Brunswick E3B 5A3. Telephone 506-458-7628, fax 506-453-4519, e-mail mhodgins@unb.ca Received for publication January 21, 2002. Revised and accepted June 25, 2002 Effective pain management is increasingly recognized as a desired goal or outcome of health care interventions. Societal expectations for effective pain management practices have increased significantly since the release of documents such as the Canadian Pain Society’s Patient Pain Manifesto (1) and the Joint Commission on Accredition of Healthcare Organizations’ Pain Standards for 2001 (2). However, the attainment of this goal is hampered by the absence of explicit guidelines for interpreting what constitutes ‘effective’ pain management. Without this informaInterpreting the meaning of pain severity scores

Poor pain management practices are generally discussed in terms of barriers associated with the patient, clinician and/or health care organization. The impact of deficiencies in the tools that are used to measure pain are seldom addressed. Three factors are discussed that complicate the measurement of pain: the nature of pain, the lack of meaning associated with scores generated by pain scales, and treatment goals that lack specificity and are not linked to patients' pain scores. The major premise presented in the present article is that the utility of pain measurement is limited because health care professionals do not have a common understanding of the meaning of scores generated by pain measurement tools, especially within the acute care setting. To address this issue, approaches to establishing instrument validity need to be broadened to include the examination of the meaning and consequences of these measurements within a specific context. Substantive improvements in pain management are unlikely to occur until criteria are identified to link explicitly the scores generated by pain measurement tools to treatment goals.
tion, it is difficult for clinicians to make informed decisions regarding what constitutes best practice.
Three factors complicate the measurement and subsequent management of pain. (In this paper, the terms 'assessment' and 'measurement' are used interchangeabley). The first and perhaps most challenging of these factors is the nature of pain itself. Pain is a multidimensional, subjective phenomenon that varies greatly among individuals, situations and time. A second complicating factor occurs due to deficiencies in the instruments that are routinely used to measure pain, especially within the acute care environment. The scores generated by these instruments lack meaning and offer little direction for decision making. Finally, treatment goals for the management of pain lack specificity. Goals are needed that clearly define the desired end point or outcome of an intervention, or that provide direction for setting goals in a particular clinical situation. The significance of each of these factors on pain management is discussed within the present paper. In addition, it is argued that substantive improvements in the management of pain will not occur until criteria are established that explicitly link the scores on pain measurement tools to treatment goals.

THE NATURE OF PAIN
Pain is a personal experience. Realization of this fact led to the adage that "pain is whatever the person says it is and exists whenever the person says it does" (3). Few attempts have been made, however, to unravel how patients interpret the experience of pain, especially acute pain. In 1956, Beecher (4) wrote about the importance of meaning in the context of the pain experiences of soldiers wounded in battle. Beecher found no dependable relation between the extent of soldiers' wounds and their pain. Many soldiers with massive injuries expressed little pain until they were safely removed from the dangers of the battlefield. However, these soldiers appeared to react with normal or even heightened responses to the pain evoked by medical procedures during their treatment at first aid units. Beecher (4) concluded that the intensity of the pain experience was largely determined by the emotional meaning and significance of the noxious event to the person. Beecher (4,5) labelled this the "reaction to pain". However, from his writings, it is not clear that Beecher generalized these findings to the context of his surgical practice with civilian patients (personal communication, H Merskey, June 25, 2002).
The introduction of the gate control theory by Melzack and Wall (6) radically altered approaches to the investigation and management of pain. Melzack and Wall (6) reiterated Beecher's belief that the perception of pain is not simply a function of the amount of physical injury. They proposed that the intensity and quality of pain are influenced by factors such as past experience, attention, expectation and anxiety, as well as the meaning of the situation in which pain occurs. Melzack and Casey (7,8) later extended the gate control theory to emphasize the multidimensional nature of pain perception and how it shapes the pain response. In this revised model, cerebral processes were categorized as sensory-discriminative, motivational-affective and cognitive-evaluative. The sensory-discriminative domain encompasses factors concerning the temporal pattern, location and intensity of the pain. The aversive nature of the pain experience and the emotions evoked by pain are represented by the motivational-affective domain. Finally, the cognitive-evaluative dimension reflects how the person interprets or evaluates pain using factors such as past experience, probable outcome and the meaning attached to the situation (9,10). Although not clearly depicted in the gate control theory, pain also has a social dimension (11)(12)(13). Pain "does not exist in isolation from the social and cultural milieu in which it occurs" (14).
Without discrediting the work of Melzack and Wall (6), Cleeland (15) reported that, in his research, two dimensions accounted for most of the variance in patients' pain scores. He labelled these 'sensory' (ie, severity) and 'reactive' dimensions of pain, reflecting Beecher's earlier work (4,5). Cleeland (15) went on to suggest that understanding the meaning of pain severity scores might be enhanced if information on the reactive dimension of pain were collected simultaneously.

MEASUREMENT AND INTERPRETATION OF PAIN
A prerequisite for effective pain management is accurate measurement of the phenomenon. Measurement is the process of assigning numbers or labels to a phenomenon to depict the kind or amount of an attribute that is present at a given point in time (16)(17)(18). Measurement tools are created to quantify expeditiously and accurately the kind or amount of an attribute present at a given point in time.
(The terms 'measurement', 'tool' and 'scale' are used interchangeably throughout the present article.) The information obtained using a measurement tool can assist in clinical decision-making if it is trustworthy, accurate and meaningful. For example, the sphygmomanometer is one of the most commonly used tools in health care. This tool measures blood pressure or the pressure exerted by blood on the walls of the arteries. An acceptable blood pressure for adults ranges from 100/60 to 140/90 mmHg. Values that fall outside this range are generally deemed unacceptable or indicative of a need for further investigation and intervention. Measurements on this tool have clinical utility because health care professionals have a common understanding of how blood pressure scores are obtained and interpreted.
The simplest and most frequent approach to measuring pain is to quantify its intensity or severity using a singleitem pain scale such as the 11-point numerical rating scale, word categorical scale or the visual analogue scale (19). Unfortunately, scores generated by these pain measurement tools do not possess a level of common understanding. Even though these pain measurement tools have been extensively used in research and their use in clinical practice is increasing, there is little agreement in terms of the clinical Interpreting pain scores meaning or importance of their scores. Consequently, it is difficult for clinicians or researchers to interpret which scores warrant intervention and which indicate the effective management of pain. For example, do all patients who score their pain as 'mild' on a categorical scale or less than 30 on a 100 mm visual analogue scale have acceptable pain control? Alternatively, should pain scores be interpreted on the basis of patients' reports of satisfaction with treatment or willingness to accept their current status? Perhaps pain scores should be interpreted differently depending on the clinical situation or type of pain. If so, how should this be determined? There are few answers to these important questions. However, until these questions are answered, the clinical utility and relevance of pain measurement tools are significantly reduced.

RELIABILITY AND VALIDITY OF PAIN
MEASUREMENTS Establishing the reliability (consistency) and validity (accuracy) of the scores generated by a measurement tool is an essential component of tool development. Due to the dynamic and subjective nature of pain and the popularity of single-item pain scales, much of the psychometric testing in this area has focussed on the issue of validity rather than reliability (20). Instrument validity is often inferred when the scores generated by a measurement tool fall within an anticipated or reasonable range (21)(22)(23)(24). For example, it has been observed that patients' pain scores tend to follow a predictable pattern of gradual decline during the postoperative recovery period, despite significant interindividual variability. Other researchers have concluded that the various pain measurement tools produce comparable measures due to the moderate to strong intercorrelations among subjects' scores on these tools (25)(26)(27)(28)(29)(30)(31). Finally, some researchers have asserted the utility of these tools due to their ability to discriminate between pain and similar but distinct concepts such as anxiety (32), fear (33), coping (34) and depression (35). Although such evidence is an essential piece of the validation process, it is not sufficient to interpret the meaning of scores generated by these measurement tools. The significance of this deficiency was highlighted by Messick (36,37) in his writings on validity.
Messick (36,37) believed that the focus of validity testing should be broadened to include the meaning of subjects' scores on a measurement tool. According to Messick, a knowledge base should be established that not only guides the use of a measurement tool, but also advances understanding of the meaning of scores on the tool. Guidelines are needed that outline the meaning, relevance and utility of respondents' scores on a measurement instrument for a particular purpose; the implications of these scores for decision-making and action; and the functional worth of these scores as evidenced by the consequences of their use. This conceptualization of validity was depicted in the form of a progressive matrix (Table 1).
In this matrix, validity is conceptualized as a unified concept, and validation as a continually evolving process.
Evidence pertaining to the content, and substantive and/or structural validity of a measurement tool is viewed as part of construct validity, supporting the trustworthiness of score interpretation. Messick (36,37) asserted that such evidence is not sufficient, however, because it does not provide sufficient information about how subjects' scores on a tool should be used. The use and interpretation of scores on a measurement instrument can be seriously confounded by contextual factors. Identical scores on a measurement tool may be treated very differently depending on the situation. For example, higher scores on a pain measurement tool may be interpreted as acceptable immediately following a traumatic injury but unacceptable if they persist over time. It is important, therefore, to consider the relevance and utility of scores on a measurement tool in specific situations with various population groups.
The consequential basis of validity testing addresses the value implications and outcomes that occur as a result of interpreting and using measurement scores. Messick (36,37) believed that validity and values could not be separated. The value systems of the researcher and/or clinician who uses the measurement tool inevitably biases the inferences derived and actions taken. For example, a clinician who believes that pain builds character is more likely to interpret higher pain scores as acceptable than someone who considers this to be a myth. Consequently, it is important to uncover the underlying value system(s) operating in a specific context, and to determine their potential impact on the interpretation and use of measurement scores.
The last cell in Messick's validity matrix addresses the actual and potential social consequences incurred by the use of a measurement instrument. Because measurement is conducted for a specific purpose, it is important to examine the extent to which this purpose is realized. Consideration should be given to the various costs, both material (eg, financial, human resources and time) and those less tangible (eg, stress and stigmatization), incurred with the measurement process. According to Messick (36,37), the best ways to prevent or minimize negative consequences that could be attributed to instrument invalidity is to eliminate irrelevant content from the measurement tool and maximize the empirical basis for score interpretation and use. For example, what pain scores will be interpreted as warranting intervention in a specific context?

INTERPRETING THE MEANING OF PAIN SCORES
Attaching meaning to patients' scores on pain measurement tools poses challenges. The first challenge is simply the necessity for patients to convert a complex, subjective experience into an objective number or label (29,38,39), especially when single-item tools are used. Relevant questions include, "What personal and situational factors affect patients' ability to perform this task?" and "What factors do patients consider when making this conversion?" A second challenge is to establish a process for interpreting the meaning of these scores. Although it has been suggested that such knowledge comes with repeated use and familiarity with a tool (40,41), this wait and see approach is extremely inefficient, particularly if a tool is to be used in the practice setting. Why should busy clinicians spend time and effort measuring a phenomenon if no tangible benefits are forthcoming (ie, if they do not know what to do with the scores obtained)?
The lack of criteria for interpreting scores on single-item pain scales also creates problems when discussing the meaning and consequence of research findings for practice. The discussion of study findings in pain research is frequently limited to reporting whether there is a statistically significant difference in mean scores among treatment groups. However, it has long been recognized that a statistically significant finding may have little practical value (42,43). If research using these measurement tools is intended to affect a change in practice, several questions warrant consideration. These questions include the following: • "What do specific scores or ranges of scores on a measurement tool represent?" For example, what scores on a tool signify unacceptable pain?
• "What magnitude of change on a scale warrants action?" and "Does this magnitude vary depending on the region of the scale being used?" For example, is a three-point change from 6 to 9 (on an 11-point numerical scale) more important than one from 1 to 4?
• "How can effective pain management be defined in terms of patients' scores on these tools?" Four general approaches to interpreting the meaning of scores on measurement instruments have been discussed in the literature. Although various labels have been used, these approaches are frequently referred to as 'statistical', 'normative', 'comparative' and 'social' validation (44)(45)(46)(47). Using the statistical approach, meaning is attached to research findings based on a sample-derived, statistical calculation such as effect size, confidence interval or median score. Although some researchers may prefer to base their conclusions on the mathematics, the appropriateness of interpreting clinical meaning solely on the basis of a statistical calculation may be questioned. Alternatively, using the normative approach, meaning is attached based on reference values or scores observed in a normal or functional population. For example, the norms for blood pressure among various population groups (for example, adults, children, Canadians) are well established. The problems associated with this approach are the lack of normative data for many health-related phenomena and the problem of identifying appropriate referent values. A third approach to establishing clinical significance is the comparative or individual approach. Using this approach, meaning is attached to subjects' scores on a measurement tool by comparing them with their scores on a 'gold standard' or external, objective criterion. For example, when the pulse oximeter was first introduced for monitoring respiratory (oxygenation) status, patients' oximetry scores were compared with scores obtained using the more expensive and invasive arterial blood gas method. Unfortunately, no gold standard or norm exists to interpret the meaning of pain measurements, except perhaps the absence of pain.
When gold standards or population norms are not available, a social validation approach must be used. Using this approach, opinions are solicited from others who, by expertise, consensus or familiarity, are able to make a subjective evaluation or interpretation of the situation (48). A value judgment is made regarding what constitutes a meaningful score. A major challenge associated with the social validation approach is determining whose opinions or judgments to use. For example, when interpreting the meaning of pain scores, input might be solicited from patients, significant others, health care providers, members of the general population and/or researchers. Considerable variability in the definition of what constitutes a meaningful score is likely to be obtained, however, depending on whose perspective is used. Although the solicitation of multiple perspectives may enhance the sensitivity of measurement scores, deciding how to deal with conflicting points of view poses a major challenge.
Some researchers have suggested that norms or standards regarding what constitutes a meaningful score can never be established, and that such values must be re-established in every study or in each clinical situation (46,49). However, if this is true, how can the clinical knowledge base be advanced or standards for professional practices be established? Although these measurement issues will not be easily resolved, and may vary somewhat depending on the clinical situation and/or specific patient group, they must be addressed.

GOAL SETTING AND EFFECTIVE
PAIN MANAGEMENT An unwritten assumption that is evident within the literature is that pain management would improve if pain scales were used routinely in clinical practice. Although the use of pain scales increases the visibility of pain, their full potential will not be realized until treatment goals are established that define what constitutes acceptable pain as measured by these scales. Consequently, the final complicating factor is the lack of specificity in treatment goals for the management of pain, and the lack of association between treatment Interpreting pain scores goals and the scores on pain measurement tools.
Goals are the desired outcomes or end points of an action (50). A useful treatment goal is one that lacks ambiguity and vagueness, and clearly defines the desired end-inview as well as the time frame for its attainment (51). Although the Agency for Health Care Policy and Research's (AHCPR) guidelines for the management of acute pain (operative or medical procedures and trauma) states that the prevention of pain is always preferable, it also acknowledges that this may not always be attainable (52). In these situations, the AHCPR endorses a goal of "adequate relief of pain". The following four treatment goals are identified.
• Reduce the incidence and severity of patients' pain.
• Educate patients about the importance of communicating unrelieved pain.
• Enhance patients' comfort and satisfaction.
• Help reduce complication rates and length of hospital stays (51).
However, as Good (53) identified, the clinical utility of the AHCPR guidelines is reduced because these treatment goals are not in a testable form due to their lack of specificity. To help address the limitation of the AHCPR guidelines, Good and Moore (54) conceptualized a middle range theory for the management of acute pain using the presence and severity of side effects as the comparative criterion. Good (53) summarized the goal of their theory as achieving a balance between analgesia and side effects by administering potent pain medications plus pharmacological and nonpharmacological adjuvants to meet the relief goal set by the patient. According to this theory, goals regarding acceptable pain are defined by the patient. This proposition is congruent with the conceptualization of pain as a subjective experience that can only be known by the person experiencing it. Despite the validity of this statement, it is also true that many factors may impair the ability of a person who is in pain to make an informed decision. The person who is in pain may lack sufficient knowledge about pain and the available treatment options. In addition, he or she may be unduly influenced by contextual factors. For example, people who enter a busy emergency department may devalue the importance of their pain or its relief because of activities happening around them. Due to the attentiondemanding quality of pain, it is also questionable whether persons experiencing severe pain can absorb, process and filter the information necessary to make an informed decision. Consequently, assistance may be needed if these individuals are to make informed decisions about their pain and its management. However the type of assistance, as well as when and how it should be offered, needs to be defined. A second limitation of Good and Moore's (54) theory is the lack of specificity of the evaluative criterion. Further work is needed to define explicitly what constitutes unacceptable relief or side effects. The utility of this theory would also be enhanced if the treatment goal were linked to scores on pain measurement tools.
Work has been done to express the goals of pain management in terms of outcomes such as quality of life, functional status and satisfaction with treatment (20,55). However, such work has primarily focussed on patients who experience chronic (malignant and nonmalignant) pain. For example, Serlin et al (56) attempted to interpret the meaning of cancer patients' ratings of pain severity by linking these scores with measures of the extent that pain interfered with their functional status. A nonlinear relationship was observed between patients' pain severity on a numerical rating scale (0 to 10) and its interference with functional status (eg, enjoyment of life, activity, mood, walking, sleep, work and relations with others). Based on their findings, the researchers concluded that the intervals between 4 and 5, and between 6 and 7 on the numerical rating scale were more significant than other intervals in terms of the impact of cancer pain on patients' functional status. In a recently published study, Farrar and colleagues (57) reported that, based on data generated by 2724 patients with chronic nonmalignant pain, a reduction of approximately 30% on a numerical pain rating scale equated to patients' reports of "much improved" or "very much improved" health status.
Findings from studies using patient satisfaction with treatment as the outcome measure have been contradictory. Little relationship was observed between pain severity and patient satisfaction in Ward and Gordon's (55) study of 248 hospitalized patients. Conversely, Desbiens et al (58) reported that dissatisfaction with pain control was more likely among patients with higher pain severity, greater anxiety, depression and alteration of mental status, and lower reported income. In a study of 91 postoperative patients, Thomas et al (59) found that younger female patients with high preoperative pain, high anxiety, low pain expectations and high willingness to report pain were more likely to report dissatisfaction with pain relief.
Unfortunately, none of these studies provides clinicians with criteria for interpreting the meaning of patients' pain severity scores or deciding which nonpain-free states are acceptable in a specific clinical situation. Further work is needed to establish explicit links between patients' scores on a pain scale and treatment goals. The establishment of such links will require collaboration among researchers and clinicians involved in pain management, as well as persons who have recently experienced pain within a specific context. Although these measurement issues will not be easily resolved, and may vary somewhat depending on the clinical situation and/or specific patient group, they must be addressed. Hopefully, through explication, replication and refinement, a process can be established that will permit meaning to be attached to the magnitude of pain. Perhaps the first step in this process is to explicate and compare the factors that persons in pain, clinicians and researchers con-sider when interpreting the meaning of a pain score within a specific context.

CONCLUSIONS
Effective pain management will not be the norm of practice until there is a common understanding of the meaning of patients' scores on pain measurement instruments, and explicit links are established between these scores and treatment goals. Criteria are needed for interpreting what constitutes acceptable pain in situations in which it cannot be prevented as measured by the various pain scales. The availability of such criteria would expedite clinical decision-making as well as increase professional accountability for the attainment of effective pain control. Until then, research will continue to uncover poor pain management practices.