Levodopa Challenge Test Predicts STN-DBS Outcomes in Various Parkinson's Disease Motor Subtypes: A More Accurate Judgment

Background The relationship between the levodopa challenge test (LDCT) and postoperative subthalamic nucleus-deep brain stimulation (STN-DBS) benefits is controversial in patients with Parkinson's disease (PD). We aim to evaluate the value of total levodopa response (TLR) and symptom levodopa response (SLR) in predicting postoperative improvement in different PD motor subtypes. Methods Studies were split into a training set (147 patients) and a validation set (304 patients). We retrospectively collected data from 147 patients who received the Unified Parkinson's Disease Rating Scale- (UPDRS-) III and the Parkinson's Disease Questionnaire- (PDQ-) 39 evaluation. Patients were classified into tremor-dominant (TD), akinetic-rigid-dominant (AR), and mixed (MX) groups. Clinically important difference (CID) was employed to dichotomize DBS effects. For patients in each subtype group from the training set, we used the correlation and receiver operator characteristic (ROC) curve analyses to explore the strength of their relations. Areas under the curve (AUCs) were calculated and compared through the DeLong test. Results developed from the training set were applied into the validation set to predict postoperative improvement in different PD motor subtypes. Results In the validation cohort, TLR significantly correlated with postoperative motor (p < 0.001) and quality of life (QOL) (p < 0.001) improvement in the MX group. The AUC between TLR and UPDRS-III (TU) is 0.800. The AUC between TLR and PDQ-39 (TP) is 0.770. An associated criterion in both TU and TP is around 50%. In the AR group, strong correlation was only found in SLR and PDQ-39 (SP) (p < 0.001). And the AUC of SP is significantly larger than that in TLR and PDQ-39 (TP) (p = 0.034). An associated criterion in SP is around 37%. No significant correlation was found in the TD group. Conclusions We provide a more accurate judgment for LDCT. TLR strongly correlated with postoperative UPDRS-III and PDQ-39 improvement in MX patients. A TLR > 50% may indicate a higher possibility of clinically meaningful benefits from STN-DBS comparing to medication only. SLR can well predict QOL improvement in AR patients. Similarly, a SLR > 37% may indicate a higher possibility of clinically significant benefits from STN-DBS. LDCT provides limited information for TD patients.


Introduction
Parkinson's disease (PD) is a neurodegenerative disease with two main therapies of levodopa and deep brain stimulation (DBS). Typically, an acute levodopa challenge test (LDCT) is conducted before DBS surgery to screen potential beneficiaries. Levodopa response (LR) assessed by the Unified Parkinson's Disease Rating Scale-(UPDRS-) III has been regarded as the best outcome predictor for postoperative response to DBS [1]. The relation between the preoperative LR and postoperative DBS benefits has long been disputed [2]. Some authors found that the preoperative LR is a key predictor for outcomes of bilateral STN-DBS for advanced PD [3,4], while others indicated that the significant correlations are only the result of statistical methods and primary assumptions [5]. There are reports that patients who do not have a 30%-or-greater LR do obtain satisfactory improvements after DBS surgery [6,7]. The mismatch between levodopa and DBS responses can be more commonly observed in single-symptom-dominated (SSD) patients, such as tremor-dominated patients or rigiditydominated patients [8]. For those patients, the effect of levodopa on the total UPDRS score can be less informative than that on particular symptoms [9]. The LR toward particular symptoms calculated by UPDRS subitems, which we termed as "symptom levodopa response (SLR)" to distinguish from total levodopa response (TLR), might better predict STN-DBS efficiency in a certain group of patients. We employed both the receiver operating characteristic (ROC) curve analysis and correlation analysis to explore the predictive value of LDCT in different PD motor subtypes. In addition, since single-center outcomes may not be well generalized to a large population, we further validated our results in an external validation set to enhance the credibility of the findings.

Patients.
We reviewed the electronic medical records of all PD patients who received bilateral STN-DBS between June 1, 2015, and June 1, 2019, in the First Affiliated Hospital of Nanchang University. Patients with complete baseline and 3-month follow-up data were included. The diagnosis of PD was in accordance with the United Kingdom PD Society Brain Bank Diagnostic Criteria [10]. Sex, age, age at onset, disease duration, duration of motor fluctuations, and medication were recorded by inquiring the case history. Hoehn-Yahr stage, UPDRS, PDQ-39, Hamilton depression rating scale (HAMD), and Hamilton anxiety rating scale (HAMA) values were assessed for all included patients under the guidance of movement disorder specialists. The ethics committee of the First Affiliated Hospital of Nanchang University approved the study protocol, and all patients or their families provided written informed consent.
2.2. Subtype Classification. Two methods were commonly used to classify PD patients. Jankovic et al. [11] divided patients into tremor-dominant (TD), postural instability and gait difficulty, and intermediate motor subtypes, and Lewis [12] divided patients into TD, akinetic-rigiddominant (AR), and mixed (MX) motor subtypes. We adopted Lewis' method and did not choose Jankovic's method because UPDRS-II was involved in Jankovic's classification process, which will cause problems to the calculation of SLR since UPDRS-II was not included in the LDCT. To divide the patients into various subtype groups, we calculated a tremor score (TS) and an akinetic-rigid score (ARS) for each patient in line with the methods previously reported. The TS was defined as the mean value of the sum of UPDRS items 20  2.3. Patient Management and Follow-Up. After eliminating contraindications and signing the informed consent, all patients received LDCT and STN-DBS. Levodopa, compound levodopa, and other anti-Parkinson's drugs were stopped 12 hours before the test, and dopaminergic receptor agonists were stopped 72 hours before the test. Patients were administered 1.5 times the levodopa equivalent dose of the first dose they take every morning, and the test drug is standard compound levodopa. The electrode implantation was operated as follows. A stereotactic head frame was installed before CT scanning, and the CT imaging was fused with MRI to locate the STN. The surgical path was determined by a surgical planning workstation. Craniotomy was performed under local anesthesia, and the DBS devices (Medtronic 3387/3389 or PINS 1101) were implanted after target refinement by microelectrode recording and intraoperative test stimulation. Implantable pulse generators were then placed in the subclavicular position under general anesthesia. The DBS devices were programmed one month after the surgery. The patients accepted a postoperative programming with little difference. The stimulation effect was measured 3 months after surgery by the UPDRS-III and PDQ-39. The improvement of motor symptoms was calculated in both the on-medication/on-stimulation state and the off-medication/on-stimulation state. Regarding improvements of quality of life (QOL), we did not distinguish between on-medication and off-medication because the PDQ-39 reflects the QOL in the past month.

Dichotomize STN-DBS Effects.
To better explore the predictive value of LDCT on patient's postoperative states, we divided patients into marked-improved ones and fairimproved ones. For motor improvement, we employed minimal clinically important difference (MICID) based on UPDRS-III to determine whether a patient got clinically meaningful improvement after surgery. MICID was established in approximately 6 points for detecting minimal, but clinically pertinent, improvement for UPDRS-III [13]. Since on-medications and on-stimulation can best represent the patient's postoperative state, an at-least-six-point difference in the comparison of baseline UPDRS-III on-medications and postoperative UPDRS-III on-medications and onstimulation indicated the patient improved markedly. Since PDQ-39 scores reflect the overall QOL in both on-and off-medication states, the calculated score difference would overestimate the real improvement between preoperativeon-state and postoperative-on-state. Thus, we employed a stricter criterion for detecting clinically substantial changes. Moderate clinically important difference (MOCID) was established as approximately 3.5 points, around two times of MICID in PDQ-39 [14]. Patients reached an at-leastfour-point difference in PDQ-39 after surgery was regarded as marked-improved patients. Neural Plasticity 2.5. External Validation Set. Results developed from the training set were validated in an external validation set for further evaluation. 304 PD patients who received the levodopa challenge test before STN-DBS in the Changhai Hospital Affiliated to Navy Medical University were included as a validation set. We viewed the baseline and the 3-month follow-up data of the PD patients who underwent STN-DBS surgery and employed the aforementioned method to divide these patients into TD, AR, and MX motor subtypes. Patients were divided into marked-improved ones and fair-improved ones in a similar way as described in the training set.
2.6. Statistical Analysis. Continuous data were presented as the mean ± SD. Comparison among the three groups was conducted by one-way ANOVA. Pairwise comparisons were conducted by the Bonferroni test. ROC curve was normally employed in the diagnosis test, but its application is not limited to diagnostic analysis. Authors have used it to explore the correlation between patient satisfactory and scale scores [15,16]. ROC curve can better demonstrate the strength of correlation and visualize outcomes. In our study, we employed both ROC curve analysis and correlation analysis in the training set and the validation set. The areas under the curve (AUCs) show how well the classifier can distinguish marked-improved patients from fair-improved ones.
Besides, the DeLong test made it possible to compare the strength of correlation. Pearson's correlation was employed to calculate the correlation coefficient. A scatter plot with a fit line and 95% CI was shown. The result of the ROC curve analysis was presented as AUC (95% CI). An AUC > 0:75 indicates the classifier provides clinically meaningful discriminative ability [17]. The Youden indexes and associated LDCT criteria were reported only in ROC curves with an AUC > 0:75. The DeLong tests were performed to compare different AUCs. A 2-tailed p value of 0.05 was considered statistically significant for the comparisons. All statistical procedures were performed using MedCalc version 15.2 (MedCalc, Ostend, Belgium) and SPSS version 24 (IBM, Chicago, IL).

Baseline Characteristics and Patient Improvement.
Data from the preoperative assessments and postoperative follow-ups in the training set are shown in Table 1. Of the preoperative indices, age at onset, duration of motor fluctuation, Hoehn-Yahr stage, UPDRS-III off scores, LR for akinetic-rigid score, LEDD, PDQ-39, and HAMA scores were significantly different between the three groups. For postoperative indices, the UPDRS-III and PDQ-39 scores assessed 3 months after the surgery were significantly different between the three groups. Overall, patients in the AR group have longer disease duration and worse baseline conditions. Fifty-eight patients reached MICID in comparing the baseline UPDRS-III on-medications and the postoperative UPDRS-III on-medications and on-stimulation. Seventy-six patients reached MOCID in comparing the baseline PDQ-39 on-medications and the postoperative PDQ-39 on-medications and on-stimulation. Related data from the preoperative assessments and postoperative follow-ups in the validation set are shown in Table 2.

The Relationship of LDCT and STN-DBS Benefits in the
Training Set Patients. The ROC curves and scatter plots between preoperative LR and postoperative UPDRS-III and PDQ-39 improvement are shown in Figure 1. The AUC of LDCT in differentiating significant and insignificant motor beneficiaries is 0.769 according to UPDRS-III improvement.
The AUC of LDCT in differentiating significant and insignificant QOL beneficiaries is 0.757 according to PDQ-39 improvement. The Youden indexes and their associated criteria are shown in the figure. Postoperative score changes of both UPDRS-III (p = 0:015) and PDQ-39 (p < 0:001) significantly correlate with preoperative levodopa response.

The Relationship of LDCT and STN-DBS Benefits in
Different PD Subtypes. For the sake of clarity in the reporting, we used acronyms to represent the various classification and correlation combinations. TU represented the combination of TLR and UPDRS-III, and SP represented the combination of SLR and PDQ-39. Similarly, TP represented the combination of TLR and PDQ-39, and SU represented the combination of SLR and UPDRS-III.
3.3.1. Classification Performance of SLR and TLR. The ROC curves of different classification combinations are shown in Figure 2. No statistical difference was found between SLR and TLR in the TD group, while the AUC of SP is significantly larger than that of TP in the AR group (p = 0:029).
In the MX group, the ROC of both TU (0.816) and TP (0.802) is above 0.8. The Youden indexes and associated LDCT criteria were reported if the AUC is above 0.75. For SP in the AR group, the Youden index is 0.75, and the associated criterion is 32%. For TU and TP in the MX group, the Youden index is 0.54 and 0.57, and the associated criterion is 50% and 53%.

Correlation
Performance of SLR and TLR. The scatter plots of different correlation combinations are shown in Figure 2. SLR positively correlated with both UPDRS-III (p = 0:035) and PDQ-39 (p < 0:001) improvement in the AR group. TLR positively correlated with both UPDRS-III (p < 0:001) and PDQ-39 (p < 0:001) improvement in the MX group. We found no significant correlation in the TD group.

Discussion
This study discussed the value of LDCT in predicting STN-DBS benefits in different PD motor subtypes and evaluated the findings in an external validation set. We found that on-state improvement is predictable by TLR, especially in MX patients. SLR strongly correlated with postoperative QOL improvement in AR patients. LDCT showed no significant predictive value in TD patients. We employed both the methods of Pearson's correlation and the ROC curve to explore the relationship between LDCT and STN-DBS benefits. Pearson's correlation focused on detecting the consistency of two continuous variables while this method can only detect linearly correlated relations and is highly vulnerable to outliers [18]. Laying emphasis on exploring the predictive effect of continuous variables on binary variables, ROC curve analysis can alleviate the influence of outliers and can also show the strength of relation between two variables. Dichotomizing outcome variable according to research objective can endow associated ROC curve with different clinical significance. The construction of ROC curve is based on the classifier's sensitivity and specificity, which are both incidence measures, the per-cent or ratio of those patients who exceed a cutoff compared to those that did not reach cutoff. Consequently, establishment of the cutoff is very important. In our study, clinically important difference (CID) was used to differentiate marked-improved patients and fair-improved patients. CID has been widely employed in large clinical trial to reflect clinically meaning change [19]. Only around 40% and 50% of samples reached CID in UPDRS-III and PDQ-39 in our study, respectively. This is quite different from the ratio of 70% obtained by Katz et al. [20]. Possible reasons could be that we calculated on-state improvement while they calculated off-state improvement. The comparison between baseline UPDRS-III on-medications and postoperative UPDRS-III on-medications and on-stimulation better shows patients' overall improvement over medical treatment alone. This comparison is more clinically relevant to patients since this reflects the state that the patient is most likely to be in. Off-state comparison would be more helpful when only the stimulation effect is of interest.
The LDCT is commonly regarded as an important referee for predicting DBS effects. It helps to the diagnosis of PD, and typically, DBS response is more robust for the levodopa-responsive symptoms [21]. In the literature,

4
Neural Plasticity several publications argued that preoperative LR does not predict long-term STN-DBS outcomes, but there are also reports claiming the contrary [2,4]. In our study, in both the analysis in the training set and the validation set, the ROC curve showed that the predictive ability of LDCT on STN-DBS effects was not very solid since the AUCs were just over 0.75, despite the significant correlation being observed. However, the predictive ability increased a lot in the MX

Neural Plasticity
group both in the AUC and correlation coefficient r. The AUC of TLR in the MX group was the highest in all three subtypes. Possible reasons could be that the UPDRS-III score is more evenly distributed among all symptoms in the MX group. A uniform distribution could make the percent improvement of UPDRS-III in LDCT reflect the information of levodopa responsiveness more comprehensively.
Instead, for SSD patients, the UPDRS-III score was mainly contributed by several subitems related to a certain symptom, while other less severe symptoms also have the same weight when calculating percentage improvement. This could result in that the calculated LR value does not match the real responsiveness to levodopa. Besides, MX patients have moderate baseline UPDRS scores, between that of TD  Neural Plasticity and AR patients. An absence of outliers can also increase the accuracy of prediction. For MX patients, we further found that around 50% LR can predict STN-DBS effects. Approximately, patients with a LR > 50% are highly likely to gain clinically meaningful benefits from STN-DBS comparing to medication only. This is different from the widely accepted 30% LR. Thirty-percent LR is the clinically minimal motor improvement of UPDRS-III after taking a dopaminergic drug, which is aimed at assisting the diagnosis of PD through levodopa responsiveness [22]. However, many patients with LRs exceeding 30% do not get significant benefits from STN-DBS because this value was not originally established to predict DBS effects. In our study, 50% LR could well distinguish marked-improved patients and fairimproved patients only in the MX group. Interestingly, 50% is very close to the average postoperative UPDRS-III improvement reported by a large meta-analysis [23]. The strong correlation between LDCT and postoperative benefits in the MX group could possibly explain this. As previously mentioned, for SSD patients, only one or two major symptoms are the source of most of their prob-lems and are the target concerns they are most urgently willing to address. To better emphasize on the improvement of these dominated symptoms, we thus introduced the concept of SLR. However, we did not find any significant correlations between SLR or TLR and postoperative motor or QOL improvement in TD patients. The major reason could be that the responsiveness of severe tremor to levodopa therapy is not in accordance with that to the DBS therapy. In contrast, in the AR group, SLR showed strong correlation with PDQ-39 change. And the differentiating ability of SLR was significantly higher than TLR in judging QOL improvement. Regarding why QOL improvements are more predictable in the ARD group by SLR, we suspect the following reasons. First, patient expectation and satisfaction may play an important role here. Some patients may not receive significant benefit on total motor function, but addressing the problems of concern can greatly enhance their satisfaction and QOL score [7,24]. SLR can accurately reflect improvements in major symptoms without being affected by the less-concerning items in UPDRS-III and thus can be more sensitive in judging a patient's possible QOL change [25].   Figure 3: The receiver operating characteristic curves and scatter plots between TLR or SLR and postoperative STN-DBS benefits among the two PD motor subtypes in the validation cohort. Significant correlations or comparisons are highlighted in bold and marked by * .

Neural Plasticity
Second, ARD symptoms, including gait disorder and postural instability, have long been reported as significant influencing factors upon QOL [26]. Gómez-Esteban et al. further indicated that rigidity had more impact on QOL than tremor [27]. The alleviation of annoying and dominant problem can undoubtedly increase patient's life quality. Besides, unlike tremor, which can be resistant to levodopa therapy, rigidity's responses toward levodopa and DBS are more consistent [28]. Third, the ARD patients in our study had the worst baseline conditions and PDQ-39 scores. And it is reported that patients with impaired preoperative QOL are more likely to have better postoperative QOL improvement [29]. For AR patients, we also found that patients with a SLR > 37% are more likely to gain clinically meaningful QOL benefits after STN-DBS. This will give us a reference in predicting postoperative QOL improvement even before the STN-DBS surgery.
Our study has several limitations. First, the data were retrospectively collected, and a relatively small sample size (only 24 patients in the TD group in the training set) could reduce statistical power. But it is generally harder to generate significant differences with a small sample size. Second, the follow-up period was short (3 months), leaving some longterm adverse events unrecognized including depression and progressive cognitive decline, which may also markedly affect QOL. Third, the data sources in our research only come from two single clinical centers, and it is necessary to carry out multicenter clinical research. Future studies should employ a prospective design, increase the sample size, and prolong the follow-up period to further strengthen the evidence, despite that an external verification was conducted in our study which will undoubtedly enhance the credibility of the findings.

Conclusion
We provide a more accurate judgment for LDCT. In a short follow-up period of three months, LDCT provides different information to the three PD motor subtypes receiving STN-DBS surgery. For MX patients, TLR is strongly correlated with postoperative motor and QOL improvement. A TLR > 50% may indicate a higher possibility of clinically meaningful benefits from STN-DBS. For AR patients, SLR can well predict postoperative QOL change. A SLR > 37% may indicate a higher possibility of clinically meaningful benefits from STN-DBS. Both TLR and SLR cannot provide valid predictive information for TD patients. These findings should be considered when screening PD-DBS candidates.

Data Availability
The relevant valuable data are described in the manuscript.

Additional Points
Highlights. TLR and SLR can predict postoperative improvement in different PD motor subtypes. TLR correlated with postoperative motor and quality of life improvement in the MX group. A TLR > 50% may indicate a higher possibility of clinically meaningful benefits from STN-DBS in the MX group. SLR correlated with postoperative motor and quality of life improvement in AR patients. A SLR > 37% may indicate a higher possibility of clinically meaningful benefits from STN-DBS in AR patients.

Ethical Approval
We confirm that we have read the journal's position on issues involved in ethical publication and affirm that this report is consistent with those guidelines.

Disclosure
The funders were not involved either in the design of the study, collection, analysis and interpretation of the data, and writing of the report or in the decision to submit the article for publication. Every author had full access to all data of the study, and the corresponding author had final responsibility for the decision to submit the article for publication.

Conflicts of Interest
The authors report no conflict of interest concerning the materials or methods used in this study or the findings specified in this paper. 8 Neural Plasticity