Clinimetrics of the 9- and 19-Item Wearing-Off Questionnaire: A Systematic Review

The treatment of Parkinson's disease (PD) with dopaminergic therapy improves functionality and quality of life. However, as the disease progresses, the wearing-off phenomenon develops, which necessitates complex posology adjustment or adjuvant therapy. This phenomenon may not be well recognized, especially if it is mild or involves nonmotor symptoms. Questionnaires were developed to improve the recognition of the wearing-off phenomenon. The questionnaires consist of a list of symptoms that patients must check if they have and if the symptoms improve with medication. A recent review by the Movement Disorder Society suggested the 19-item (WOQ-19) and 9-item (WOQ-9) questionnaires as screening tools for the wearing-off phenomenon. However, there has not been a systematic review to assess the questionnaires' clinimetric properties, such as sensitivity, specificity, test-retest reliability, and responsiveness. We conducted an extensive search for studies using these two tools. We identified 3 studies using WOQ-19 and 5 studies using WOQ-9. Both questionnaires seem to have good sensitivity (0.81–1). WOQ-19 has variable specificity (0.39–0.8), depending on the number of positive items, while WOQ-9 lacks specificity (0.1–0.69). Only one study using WOQ-19 reported test-retest, and only two studies reported responsiveness. Thus, this report describes the first independent systematic review to exam quantitatively the clinimetric properties of these two questionnaires.


Introduction
e treatment of Parkinson's disease (PD) with dopaminergic therapy improves functionality and quality of life. However, as the disease progresses, it causes motor and nonmotor fluctuations [1]. e well-described wearing-off (WO) phenomenon is the shortening effect of levodopa, which can be managed with dosage adjustment or adjuvant therapy, such as catechol-O-methyltransferase (COMT) inhibitors [2]. Clinical evaluation has been the gold standard for diagnosing this condition. However, the WO phenomenon may not be well recognized, mainly if it is mild or involves nonmotor symptoms. Several scholars argue that recognition of WO phenomenon could change the way that it is managed and improve patient's functionality [3].
To improve the recognition of WO phenomenon, a 32-item questionnaire (WOQ-32) was developed [3]. e questionnaire consists of a checklist of symptoms that patients must identify, and they must note if these symptoms improve with medication. For practical reasons, using the same research, this questionnaire was adapted to a 19-item questionnaire , which had the same properties [4]. Later, a 9-item questionnaire (WOQ-9) was developed [5], containing the most valuable questions, and it was successfully tested [6]. e WOQ-9 has been used for a number of clinical studies, translated into several languages, and adapted with several different clinimetric properties [7]. A recent review by the Movement Disorder Society set both the WOQ-19 and WOQ-9 as recommended tools for screening for WO phenomenon [8]. However, this review did not address quantitatively the clinimetric properties in Parkinson's disease patients compared to clinical evaluation. us, we conducted a systematic review and analysis of the clinimetric properties of both the WOQ-19 and WOQ-9 questionnaires, such as sensitivity, specificity, predictive positive value (PPV), negative predictive value (NPV), and stability with test-retest and responsiveness.

Methods
We follow the PRISMA statement. e inclusion criterion was studies using WOQ-9 or WOQ-19 in PD patients to diagnose WO compared to the gold-standard, clinical evaluation. e studies must examine sensitivity and specificity, or they must include data that we could calculate. Also, we include studies using one of the questionnaires if they employed data regarding test-retest or responsiveness. Formal validation was not required [9], but at least a translation and face validation for the given language was applied. Reviews, abstracts, and conference meetings were excluded. Responsiveness was calculated following Cohen's effect size [10]. e search was conducted in MEDLINE, Embase, and Web of Sciences between 01/06/2017 and 22/12/17. e terms were ((Parkinson's OR Parkinson's disease) AND (wearing off OR wearing-off OR motor fluctuation) AND (questionnaire)). ere was also a bibliography review for the select articles and reviews already published. e articles were independently selected by title for abstract reading by two reviewers (Artur Schumacher-Schuh and Carlos E. Mantese). In case of disagreement, the articles were discussed by another author (Carlos R. M. Rieder). Later, a number of articles were selected for full reading based on abstract information.
We did not identify any articles that were not included in the search from original libraries.
In WOQ-19, there are 3 trials selected ( Table 1). One of them used 1-item cutoff [12], while the others [11,13] used 2-item cutoff. e sensitivity ranged from 0.81 to 0.90, and the specificity was 0.39-0.80. PPV was 0.62-0.88, and NPV was 0.64-0.84. e wide range of specificity seems secondary to one study that used 1-item cutoff. is trial does not exhibit better aggregate sensitivity but has shown worse specificity.
Test-retest stability was assessed in one paper [18], two weeks apart from each test, which showed an intraclass correlation of number of positive items of 0.858. It was applied to stable patients; however, it did not mention the clinical stability or the type of intraclass correlation.
For responsiveness, two studies were analyzed [19,20]. Both were clinical trials with COMT inhibitors. One of them [19] used WOQ-9 and reported improvement in most items in proportion of patients with improvements; however, it did not provide data to calculate the effect size. In the other trial [20], Cohen's effect size was 0.5.

Discussion
is report describes the first systematic review of quantitative clinimetric properties of WOQ-19 and WOQ-9. Full-text articles excluded (N = 32) Reasons Questionnaires not used (19) No clinimetrics (9) Repeated sample (2) Not validated (1) Review (1) Duplicates N = 147 Records excluded N = 210 Systematic reviews are fundamental to summarize important data for research and clinical practices. Additionally, this report describes the first independent review of the clinimetric properties of these questionnaires. e WOQ-19 seems to have good accuracy, which is an excellent tool in both research and clinical practice, when a 2-item cutoff is used. However, most of the trials used WOQ-9, which has excellent sensitivity but poor specificity.
us, the WOQ-9 could be used as a screening tool to identify certain at-risk individuals, but it would need a clinical evaluation to confirm the diagnosis, as several trials have done [42,46]. Stacy [7] has argued that office visits could fail to recognize WO, and its position as the gold standard of care may need reevaluation [6]. is hypothesis seems difficult to prove. Moreover, most clinical trials for Parkinson's disease treatment use wearing-off outcomes diaries or UPDRS (Unified Parkinson's Disease Rating Scale) wearing-off subitems. Raciti et al. [49] showed that UPDRS has 0.87 sensitivity and 0.43 specificity compared to clinical evaluation, which make it similar to WOQ-9 and considerably worse than WOQ-19. e variability in WOQ-19 can be explained by several reasons. In WOQ-19, one clinical trial used the 1-item cutoff and therefore lost specificity. In a ROC curve plotted by Martinez-Martin et al. [11], the questionnaire showed better accuracy when the 2-item cutoff was used. As in WOQ-9, the 1-item cutoff seems to have the same lack of specificity. Fukae et al. [17] showed that when the 2-item cutoff is used, the WOQ-9's specificity improves (from 0.39 to 0.72) and loses a little sensitivity (from 0.94 to 0.87). Additionally, each study involved different languages, and the final result depended on, in part, the properties of each specific validation. Moreover, the gold standard could be different depending on the physician's expertise (i.e., if they are movement disorder specialists or in-training neurologists). Finally, while most clinical trials excluded patients who could not complete the questionnaires, certain differences in educational, cultural, and social backgrounds could explain a portion of the variability in the questionnaires.
Of note, we did not identify information regarding questionnaire reliability or validity other than criteria validity. Furthermore, only one article on WOQ-19 examined test-retest through intraclass correlation of number of positive items, and it did not mention the type, consistency, or agreement. e second is preferred [50]. For test-retest, even not mentioned clinical stability, two weeks apart from each test seems enough time in Parkinson's disease to avoid recall bias and ensure clinical stability. We have not found any paper with kappa agreement from individual question. Being a questionnaire with dichotomous responses, the use of kappa would seem appropriate. We did not identify any reports of test-retest for WOQ-9. Responsiveness was obtained from two clinical trials for WOQ-19 comparing add-on therapy with entacapone (a COMT inhibitor). is therapy is used to treat WO phenomenon, and both showed an improvement of questionnaire on the basis of number positive items. However, in one trial, we have no data to calculate the effect size [18]. e other one [19] showed effect size of 0.5, which means a moderate effect. e lack of data regarding reliability and even validation by means other than criteria validation might be observed because the original study used WOQ-32, and later, the WOQ-19 and WOQ-9 were developed, and even those questionnaires were not tested for those properties by the developers. Additionally, we did not include conference meetings or abstracts, which can account for the loss of certain data (even so, we did not identify those data in the libraries we searched).
is information is important to clinicians and researchers because it might influence how they use questionnaires. A questionnaire with poor testretest performance is not reliable to use, and the results can change with no change in clinical status.
We excluded several important articles, such as Stacy et al.'s [3] description of the WOQ-32 questionnaire and its transformation into the WOQ-19 [4] and WOQ-9 [5]. However, this article involves data from a different questionnaire, which was later transformed into the WOQ-19 and WOQ-9 questionnaires. Several articles were post hoc analyses of primary data, which we had already included. One study did not have any type of validation and did not meet our inclusion criteria. Most of the excluded trials were with no comparator; therefore, we could not address clinimetric properties.

Conclusions
We conducted the first systematic review of WOQ-19 and WOQ-9, an important tool for screening and diagnosing WO. e lack of certain data suggests caution when using the WOQ-9. However, the WOQ-19 exhibits reliability and  Parkinson's Disease 3 was validated to use as a diagnostic tool. Moreover, we suggest that the authors report complete properties when they are publishing papers validating their methods.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.