Orofacial Pain during Mastication in People with Dementia: Reliability Testing of the Orofacial Pain Scale for Non-Verbal Individuals

Objectives. The aim of this study was to establish the reliability of the “chewing” subscale of the OPS-NVI, a novel tool designed to estimate presence and severity of orofacial pain in nonverbal patients. Methods. The OPS-NVI consists of 16 items for observed behavior, classified into four categories and a subjective estimate of pain. Two observers used the OPS-NVI for 237 video clips of people with dementia in Dutch nursing homes during their meal to observe their behavior and to estimate the intensity of orofacial pain. Six weeks later, the same observers rated the video clips a second time. Results. Bottom and ceiling effects for some items were found. This resulted in exclusion of these items from the statistical analyses. The categories which included the remaining items (n = 6) showed reliability varying between fair-to-good and excellent (interobserver reliability, ICC: 0.40–0.47; intraobserver reliability, ICC: 0.40–0.92). Conclusions. The “chewing” subscale of the OPS-NVI showed a fair-to-good to excellent interobserver and intraobserver reliability in this dementia population. This study contributes to the validation process of the OPS-NVI as a whole and stresses the need for further assessment of the reliability of the OPS-NVI with subjects that might already show signs of orofacial pain.


Introduction
Statistics Netherlands predicts that the percentage of elderly people (i.e., 60 years of age or older) will rise from 15% at present to 25% by 2040 [1]. This is not just a national Dutch phenomenon. Globally, the United Nations predict proportions of elderly rising to approximately 20%, thus doubling the percentage of people over 60, and even more so in more developed regions where life expectancy is higher (up to 40% regionally) [2].
One of the major challenges with an ageing population is dementia, of which a prevalence up to 7% in people over 60 is reported [3]. Many of these individuals' functions deteriorate to such a level that self-care is no longer possible. Although in some cases the care for people with dementia is supported by their families, others eventually come to live in a nursing home. Many people's functions continue to deteriorate until verbal communication is no longer possible [4]. The progressive decline in communicative abilities may hamper pain assessment in people with dementia, especially when it comes to orofacial pain [5]. In 2011, Lobbezoo et al. [5] emphasized that the existing diagnostic tools for establishing the intensity of pain in nonverbal elderly people with dementia are not appropriate for the assessment of dental or orofacial pain. In the same article, it was noted that there is a lack of research dealing with the assessment of orofacial pain in nonverbal people with dementia. The same is true for the literature on management of orofacial pain in this group [5,6].
During the international and interdisciplinary process towards the development the Pain Assessment in Impaired Cognition metatool [4], the importance of developing a specific orofacial pain assessment tool was noted. Unfortunately, even though the recently developed Orofacial Mobilization-Observation-Behavior-Intensity-Dementia Pain Scale was proposed as a tool to assess the intensity of orofacial pain in individuals with dementia, it proved to be unreliable [7]. This led to the development of the "Orofacial Pain Scale for Non-Verbal Individuals" (OPS-NVI) [4,8,9]. This instrument is meant to assess the presence of possibly pain-related nonverbal communication, such as facial expressions, body movements, and vocal expressions. The patient's behavior is monitored during four types of activities, namely, "resting," "drinking," "chewing," and "oral care," and the intensity of the possible orofacial pain is scored as well. For the OPS-NVI, a reliability and validity assessment has yet to be performed. The present study therefore focuses on testing the reliability of the "chewing" subscale of the OPS-NVI by the assessment of video recordings of older people with dementia during their meal.

Study Sample.
For this study, video clips were used. These clips were part of the data set recorded in relation to the STA-OP!-protocol [10]. The video clips were recorded at various nursing homes throughout the Netherlands and consisted of audiovisual material of residents of these homes recorded during mealtime. Participating nursing homes met the following criteria: (i) Management was willing to give permission for at least one psychogeriatric unit to participate.
(ii) No major organizational changes or building activities were planned or performed in the study period.
For the residents to be preselected for enrollment, the inclusion criteria were (i) presence of moderate to severe cognitive impairment according to the Global Deterioration Scale (GDS), that is, a score of 5, 6, or 7 [11], (ii) absence of chronic psychiatric diagnoses other than a dementia-associated diagnosis.
Both criteria were assessed by elderly care physicians who are part of the staff of Dutch nursing homes. Informed proxy consent for the videotaping and use of the videotapes in the STA-OP!-study and related studies was obtained from family and/or caregivers for every included resident. The Medical Ethics Review Committee of the VU University Medical Center Amsterdam approved the protocol (registration number 2009/119).
After preselection, in order to be enrolled into the study, an additional inclusion criterion was the presence of "clinically significant symptoms of pain" and/or "difficult behavior," defined as (i) Cohen-Mansfield Agitation Inventory (CMAI) [12] score ≥44; (ii) Neuropsychiatric Inventory-Nursing Home Version (NPI-NH) [13] score ≥4 on every respective item; or (iii) indication of clinically relevant pain according to the Minimum Data Set of the Resident Assessment Instrument pain scale (MDS-RAI) (MDS-RAI pain scale ≥2) [14].
The degree of cognitive deterioration was measured according to the MDS-Cognitive Performance Scale (CPS) [15]. The CPS is a seven-category index, ranging from cognitively intact to very severely impaired. The index is categorized by combining the three severe categories as "severe" cognitive deterioration, the middle two categories as "moderate" deterioration, and the remaining two categories as "normal" cognitive performance or only mild deterioration. The CPS scale has shown excellent agreement with the Mini-Mental State Examination (MMSE) in the identification of cognitive impairment in research [16]. The CPS score's mean and standard deviation are shown in Table 1.
Comorbidity was assessed with the MDS-RAI comorbidity list, which contains the following groups of diseases: endocrine diseases, visual impairments, cardiovascular diseases, psychiatric disorders, pulmonary diseases, diseases of musculoskeletal system, neurological diseases (without Alzheimer disease or other types of dementia), infection in the last 7 days, and other [14]. Information on comorbidity is included in Table 1.

2.2.
Procedure. The OPS-NVI consists of four subscales wherein different activities are assessed, namely, "resting" (I), "drinking" (II), "chewing" (III), and "oral care" (IV). Each subscale contains a total of 16 items of observed behavior that are classified into four categories, namely, "facial activities" (1), "body movements" (2), "vocalizations" (3), and "specific behavior"(4). All categories and items therein are identical for each subscale. For this study, only the "chewing" subscale was used. The items of observed behavior are shown in the following.
To complete the OPS-NVI for the purpose of this study, an adaptation of the standard instructions of the OPS-NVI was given to the observers: (1) Observe the behavior of the client while chewing: (a) Observe the activity for 3 minutes or for the length of the activity. Segments where no activity is shown can be skipped.
(2) For each item, tick off the appropriate box: (a) Y = Yes, I saw this behavior.
(b) N = No, I did not see this behavior.
(c) N/A = Not Applicable; it was not possible to score this behavior, because the client was not able to perform this behavior (not: not visible. In that case, tick off "No").
(3) Rate the estimated pain intensity with a number between 0 and 10: (a) 0 is no pain and 10 is pain as bad as it possibly could be. (b) Rate what you think is the experienced pain intensity.
For this study, a total of 321 video clips were collected. Of these, 84 were not used. This was because in 83 cases, no or hardly any masticatory movement was detected, while one clip was removed from the data set because, in retrospect, the person had a possible alcohol-related dementia diagnosis, which did not meet the inclusion criteria as described in the STA-OP!-protocol. This yielded a total of 237 video clips to be observed, with a total of 153 subjects. From these, 69 subjects featured in only one video clip, whereas 84 featured in two clips. The subjects that were filmed twice were recorded with a 3-month interval (12-13 weeks) in between both recordings [10]. There were 109 women and 44 men, with a mean age of 83.3 (SD: 7.1; range: 63.8-102.4), as shown in Table 1.
The video clips featured residents during their mealtime. The clips were recorded with audiovisual recording equipment (JVC brand, type Everio G Series nr. GZ-MG575, Yokohama, Japan). The camera was placed in such a way that the resident's face was shown, nearly all masticatory movements were clearly visible, and vocalizations were clearly heard over the course of the clip. If the resident moved during the recording, the camera position was adjusted accordingly. The duration of the clips varied between 3 and 5 minutes.

Reliability Assessment.
Two observers, both sixth year dental students at the Academic Centre for Dentistry Amsterdam (ACTA), were given a training by an experienced user of the OPS-NVI and were instructed to individually observe the behavior of the participants and estimate the pain intensity  with the OPS-NVI for every clip ( 0), followed by a period of 6 weeks of no observation. After this period, the observers were instructed to complete the OPS-NVI again for every clip ( 1).

Statistical Analysis.
To establish the reliability of the "chewing" subscale of the OPS-NVI, the interobserver and intraobserver reliability were assessed by analyzing the testretest reliability for individual items of the instrument. The sum scores of the items per category and the interobserver and intraobserver reliability of the estimated pain score were also analyzed. For all interobserver reliability analyses, the 0measurements of both observers were used. In cases where the database showed a bottom or ceiling effect for an item, meaning that the item was scored in less than 5% or more than 95% of the cases, it was decided that the item was excluded from the statistical analyses. Thus, items with a Yes or No count <12 were excluded.
The interobserver and intraobserver reliability of the item scores were analyzed using Intraclass Correlation Coefficients (ICCs). The interobserver and intraobserver reliability of the sum scores of the included items per category and of the estimated pain scores were also estimated by ICC. ICCs < 0.4 were considered poor, ICCs between 0.4 and 0.75 fair-togood, and ICCs > 0.75 excellent [17]. The confidence interval was calculated with a 95% confidence level. The percentage agreement for the item scores was also determined.
Probability levels of < 0.05 were defined as statistically significant. All statistical analyses were performed using the SPSS software package version 20.0 (IBM, Armonk, NY, USA, 2011).

Results
As shown in Table 2, a total of ten items were excluded from the statistical analyses, because there was hardly any variability in observed behavior. As a result, the category "vocalizations" was not used in the further analyses. For most cases, excluded items were scored "No," with the exception of (Q4), which was excluded from the analyses because in all cases subjects opened their mouths as part of their chewing activities. Table 3 shows the intraobserver and interobserver reliability and percentage agreement per included item. The table clearly shows a discrepancy between the different observations: the intraobserver reliability of observer 1 ranges from fair-to-good to excellent, while the intraobserver reliability and interobserver reliability of observer 2 range from poor to fair-to-good.
In Table 4, where intraobserver and interobserver reliability per category as well as pain intensity estimations are shown, a similar discrepancy between the two intraobserver reliabilities is noted. However, the reliability per category seems to be slightly higher than the reliability per item.

Discussion
The aim of this study was to assess interobserver and intraobserver reliability of the "chewing" subscale of the OPS-NVI, with patient and environment standardized through video recordings. When analyzing the video clips, the two observers reported clear bottom and ceiling effects, meaning that there were a considerable number of cases in which an item was observed in less than 5% or more than 95% of the cases. This might be due to the fact that although there was preselection for "clinically significant symptoms of pain" and/or "difficult behavior," as defined by the STA-OP!-protocol inclusion criteria [10], there was no specific selection of cases with probable orofacial pain. Therefore, it was decided that all items that were considered noncontributing to orofacial pain for this population were excluded. While the category "facial activities" only lost a single item (namely, "opened mouth," which was excluded because this behavior is always present while eating), the category "vocalizations" was completely excluded, and of the categories "body movements" and "specific behavior" only one item was maintained. These results suggest that the "chewing" subscale of the OPS-NVI might be reduced to the remaining 6 items, which would facilitate its use in daily practice. The discrepancies in the intraobserver reliability between the two observers, as shown in Tables 3 and 4, could be explained as follows. The instructions to the observers were to first score the presence of different items of behavior, regardless of whether the observer thought it was related to orofacial pain. Looking at, for example, the first category, namely, "facial activities" ((Q1) to (Q5)), it indicates behavior that can also be present during masticatory movement without pain. This complicates the observations considerably. Within this context, when observing this behavior, the different observers apparently showed a different sensitivity for the more subtle facial movements, which show that the scoring of the OPS-NVI items is based on a subjective interpretation of observed behavior. To improve the reliability, a different set of instructions, for example, pictures of frowning and nonfrowning individuals that guide the decision to (not) score this specific item, could be developed.
From 84 people, two video fragments were available, because the clips were recorded as part of the STA-OP!-protocol and were therefore obtained at baseline and after 3 months [10]. It could be argued that this could have created observer bias within this study; that is, the pain score could have been based on recollection of previous film clips featuring the same person rather than on independent observations. This could have led to an overestimation of reliability. However, the time between both recordings was relatively long. Taking this into account, along with the fluctuating nature of most painful conditions, it was therefore decided that even though a person featured twice in the database, both clips could be considered as independent of each other.

Strengths and Limitations.
A strength of the present study is the large number of video clips ( = 237) included in the sample, which contributed greatly to the power of the study. Furthermore, not only did using video clips provide an efficient way to collect a lot of data in a short period of time, but also it offered the possibility of assessing the intraobserver reliability, which would otherwise have been impossible.
A limitation might be that observing video clips is not the same as real-life observation. Although most clips were at most 5 minutes long, it is still a limited period of time. This may have resulted in the observed bottom and ceiling effects. Life observation over a longer period, that is, during the course of the entire meal, may have yielded a more accurate estimate of the presence of orofacial pain during mastication. However, longer observations create the risk of making the OPS-NVI more impractical to use and more difficult to implement on a large scale. This study could also have benefited from additional observers, since clear discrepancies between the ICC scores of observer 1 and observer 2, the former ranging from fair-to-good to excellent and the latter from poor to fair-to-good. The lack of a control group with subjects matched for age but without cognitive decline is also a potential limitation: in this study, establishing the presence and intensity of orofacial pain in the included subject is confounded not only by the use of a novel tool, but also by the fact that the subject suffers from severe cognitive decline. By including a control group, the latter will no longer be a confounding factor. It is therefore suggested that future studies into the reliability and validity of the OPS-NVI include a control group.

Implications.
This study was performed to contribute to the reliability assessment of the "chewing" part of the OPS-NVI and also to its development as a whole. In the process of assessing the interobserver and intraobserver reliability, it was found that a total of ten items could be excluded from this subscale of the OPS-NVI, which makes it more concise and easier to use. Additional reliability assessments are required for the other subscales of the OPS-NVI (namely, resting, drinking, and oral hygiene). Following this, validity of the tool will also need to be assessed.

Conclusion. The Orofacial Pain Scale for Non-Verbal
Individuals (OPS-NVI) is developed to improve the recognition of the presence and intensity of orofacial pain. In this study, it was used to assess pain in older people with dementia during their meal, for which the "chewing" subscale of the OPS-NVI was used. The categories within the "chewing" subscale of the OPS-NVI have a fair-to-good to excellent interobserver and intraobserver reliability. The outcomes stress the need for further assessment of the reliability of the OPS-NVI in subjects with more severe orofacial pain.