Continuous and Unconstrained Tremor Monitoring in Parkinson's Disease Using Supervised Machine Learning and Wearable Sensors

Background Accurately assessing the severity and frequency of fluctuating motor symptoms is important at all stages of Parkinson's disease management. Contrarily to time-consuming clinical testing or patient self-reporting with uncertain reliability, recordings with wearable sensors show promise as a tool for continuously and objectively assessing PD symptoms. While wearables-based clinical assessments during standardised and scripted tasks have been successfully implemented, assessments during unconstrained activity remain a challenge. Methods We developed and implemented a supervised machine learning algorithm, trained and tested on tremor scores. We evaluated the algorithm on a 67-hour database comprising sensor data and clinical tremor scores for 24 Parkinson patients at four extremities for periods of about 3 hours. A random 25% subset of the labelled samples was used as test data, the remainder as training data. Based on features extracted from the sensor data, a Support Vector Machine was trained to predict tremor severity. Due to the inherent imbalance in tremor scores, we applied dataset rebalancing techniques. Results Our classifier demonstrated robust performance in detecting tremor events with a sensitivity of 0.90 on the test-portion of the resampled dataset. The overall classification accuracy was high at 0.88. Conclusion We implemented an accurate classifier for tremor monitoring in free-living environments that can be trained even with modestly sized and imbalanced datasets. This advancement offers significant clinical value in continuously monitoring Parkinson's disease symptoms beyond the hospital setting, paving the way for personalized management of PD, timely therapeutic adjustments, and improved patient quality of life.


Introduction
Neurological disorders are the globally leading source of disability, and among them, Parkinson's disease (PD) is the fastest-growing in prevalence, disability, and death [1].Developing new therapies or improving existing ones may help address this growing challenge for healthcare systems and ultimately improve the lives of those afected by this condition.PD is a progressive, neurodegenerative disorder that presents with a broad spectrum of motor (mainly tremor, bradykinesia, and rigidity) and nonmotor symptoms.Afecting around 1% of the population over the age of 60, it is the second most common neurodegenerative pathology (after Alzheimer's Disease) [2] and causes signifcant decreases in the quality of life for those afected [3].
Accurately assessing the severity and frequency of PD symptoms is of distinct importance at almost all stages of disease management: diagnosis, monitoring disease progression, informing therapeutic course of action, and evaluating the efectiveness of treatment [4].Te gold-standard tool for PD assessment is the clinically validated International Parkinson's and Movement Disorder Society Unifed Parkinson's Disease Rating Scale (MDS-UPDRS) [5].Although reliable, UPDRS assessments are time-consuming to perform, require patients to be in-clinic, and only capture the patient's symptomatology during a brief period of time.PD symptoms heavily fuctuate, based on factors such as medication ON/ OFF-state or behavioural context [6], and occasional assessments at any single point in time may fail to accurately refect the overall disease state [7].Alternatively, clinicians rely on patient self-reports in the form of "patient diaries."Te validity of these diaries, however, can be compromised by patients' lack of compliance, by patients not properly recognising symptoms within the clinical taxonomy understood by clinicians (e.g., dyskinesias), or by recall problems [8].Te quality of the diaries can also be afected by the decrease in cognitive capabilities that is highly prevalent at later stages of the disease [9].Tis leads to a discrepancy between selfreported disease state and clinical UPDRS assessments, especially related to motor symptoms, that can be difcult to reconcile [10].
Wearable technology, through its potential to capture and characterise movement during both scripted tasks and in free-living environments, may be able to overcome some of the shortcomings of current PD assessment techniques-including the difculty of recognising relevant symptoms if they present outside of the clinic, or the poor temporal resolution of MDS-UPDRS tests.Tis approach might provide clinicians with valuable information about symptom state outside what is observable during sporadic in-clinic encounters and patient self-reporting of symptoms (Figure 1), which can be particularly insightful for the common scenario of bouts of severe symptoms occurring at times during which a clinician is not present.
A number of groups have shown the feasibility of using wearable sensor data in combination with algorithms to accurately predict clinical scores during standardised tasks [11][12][13][14][15][16][17][18][19][20].Tis has led to a body of literature devoted to a detailed characterisation of tremor in controlled experimental environments [18,21,22].Nevertheless, symptom monitoring in free-living patients remains a challenge, with some methods having failed to deliver optimal results [23,24].In [25], the authors had no clinician-provided tremor labels to validate their system.In [13,24], the only feedback signal available for algorithm training came from the aforementioned patient self-reporting.In [26], no ground-truth labels were available for algorithm training and validation outside of the clinical environment.To date, and to the best knowledge of the authors, no fully validated system to monitor clinical PD features in free-living environments exists [27][28][29].
In order to efectively tackle the transition from controlled environments to unconstrained, free-living patients, a signifcant challenge-from a signal processing perspectivestems from the fact that tremor (and other pathology-related movement patterns) cannot be easily isolated from sensor signals, especially when volitional movement is present at the same time.Machine learning techniques, due to their ability to isolate patterns from broadband, high-entropy signals, ofer a potential solution to this problem.In this context, the use of unsupervised machine learning techniques that mine clinically relevant patterns from large amounts of wearable sensor data has been advocated [4].Previous studies have employed several machine and deep learning approaches for wearable sensor-based approaches for monitoring Parkinsonian tremor, most often involving convolutional neuronal networks with raw data input, fast Fourier transform, or specifc features (time and frequency domain) [18].Others used approaches such as long short-term memory, dynamic neuronal network, multilayer perceptron, and gradient tree boost algorithm [18].
In this work, we bypass the need for self-supervised schemes by making direct use of clinician-provided labels gathered during continuous monitoring.We implement a cross-patient tremor-monitoring system that can be trained on a modestly sized and imbalanced dataset and use this method to investigate the problem of monitoring tremor fuctuations in free-living PD patients.To accomplish this, we rely on a 67-hour database of inertial measurement unit data manually labelled by clinicians using a custom-made mobile application, who were present during hours-long recording sessions during which patients went about their daily activities in a free-living environment.We derive features from the wearable sensor data based on accelerometry and gyroscopy and use these to train a machine learning algorithm that predicts tremor severity at each sensor location.

Methods
2.1.Dataset.Data collection was conducted with 24 patients at the Department of Neurology, University Hospital Zurich, with ethical approval granted by the cantonal ethics commission Zurich.Informed consent was obtained from all participants, who were randomly selected from those seeking inpatient refnement of their PD management, irrespective of age or gender, thus safeguarding an unbiased sample.Notably, patients generally represented more advanced stages of PD due to the nature of their hospital treatment.All subjects continued their prescribed PD medication regimen throughout the study to maintain their typical response to treatment during data capture (see Table 1).We did not have any dropouts.
Te recording device consisted of inertial measurement units (IMUs) securely attached to the patients' wrists and ankles.Te IMUs (ZurichMOVE) incorporated triaxial accelerometers and gyroscopes, recording movement at a sampling rate of 50 Hz.Sessions were designed to last approximately three hours (mean ± std.deviation: 2 h 48 min ± 1 h 5 min), capturing a broad range of voluntary movements and daily activities.Parkinson's Disease Clinicians, present throughout the recording sessions, performed tremor assessments at three-minute intervals (duration mean ± std.deviation: 2.99 min ± 0.96 min), providing a rich temporal resolution of tremor data.Tis enabled the capture of tremor manifestations in a controlled yet representative environment, where patients engaged in various self-selected activities such as reading, writing, using electronic devices, and performing light physical tasks within the patient room, which mirrored a spectrum of daily life scenarios (see Figure 2).
Tremor severity for each limb was documented using a three-point scale: 0 indicating no tremor, 1 for mild tremor, and 2 for strong tremor.Tis bespoke categorisation was devised to streamline the complex clinical tremor evaluation typifed by the UPDRS III, facilitating frequent, repeatable measurements conducive to our data analysis approach.A "no-data" entry was also available to mark periods where observation was not possible, such as personal privacy times.
Te clinicians used a specialized application on a tablet that prompted them to record tremor scores every three minutes.Te 30 seconds leading up to each score entry were extracted and segmented for detailed analysis, yielding a dataset comprising 4,850 tremor score-labelled samples across the three defned categories.

Data Preprocessing.
Parkinsonian tremor usually presents while the patient is at rest with frequencies in the 4-7 Hz band [30][31][32].In order to isolate the tremor signal from the voluntary movement signal, both the accelerometer and gyroscope signals were high-pass-fltered (10th-order Butterworth flter, cutof frequency of 3.5 Hz).In the interest of precision, the wearable sensors were confgured to capture a voluntary-movement signal delineated by a specifc frequency band.Tis was achieved by employing a 10th-order Butterworth bandpass flter with cutof frequencies set between 0.5 and 3 Hz.While this methodological choice is Important bouts of severe symptoms, if they do not happen during clinical visits, might be missed.Tis leads to a gap between clinical assessment and patient self-report, which is the currently available system used to monitor the disease severity between appointments.Tis self-report, oftentimes in the form of diary entries, can however lack in precision, objectivity, and compliance on the side of the patient.Continuous monitoring would allow for automated and objective observation of disease state and progression with only moderate efort on the side of both clinicians and patients.
primed for enhancing tremor signal fdelity, it may also encompass movement signatures potentially related to bradykinesia.Tis recognition, albeit indirect, suggests a promising avenue for further methodological advancements to explicitly characterise and classify bradykinesia within our computational framework.On the tremor signal, a low-pass flter (10th-order zero-phase Butterworth flter, cutof frequency of 7.5 Hz) was also applied to eliminate higher frequency nontremor components, as well as highfrequency artefacts (Figure 3).Since only the oscillatory nature but not the directionality of movement is interesting for analysing tremor, only the magnitude of the fltered accelerometer vector was used.
To better capture the transient nature of tremor, a wavelet-based time-frequency analysis was performed.Employing the continuous wavelet transform, the frequency components representing tremor (between 3.5 and 7.5 Hz) were extracted into a time-dependant signal: with frequency f, time t, and wavelet coefcients X(f, t), and the resulting time-series X tre (t) was computed as X tre (t) �  7.5 Hz 3.5 Hz X(f, t)df.In summary, the preprocessing step consisted of extracting a total of 9 relevant time-series (Supp Table 1) from the 6 IMU channels (3 gyroscope channels + 3 accelerometry channels).All preprocessing steps and feature computations were performed in MATLAB (Te MathWorks Inc., Natick, Massachusetts).4 for non-wavelet-based time-series and Supp Table 5 for wavelet-based time-series).Roughly, these features can be categorised into the following three classes: (1) Time-domain statistics: General descriptive statistics that aimed at assessing the magnitude and variability of movement within a given timeframe.Tese include means, standard deviations, interquartile ranges, and variation coefcients.(2) Frequency-domain features: Features that capture the spectral characteristics of the signal (if tremor is present, signal power will concentrate in the 4-6 Hz frequency band).Te power spectral density was estimated using Welch's method (8 segments, 50% overlap) [33,34].(3) Nonlinear features: Tis set of features aims to quantify the regularity of a signal, which can be defned as the ability of past and future samples to predict the current one.Regularity measures, which go beyond describing signals in the time and frequency domains, have been proposed as better tools to capture the lack of ability to regulate movement that is characteristic in Parkinsonian tremor [32].Generally, nonlinear approaches have been previously applied successfully to tremor detection and quantifcation [35,36].Tese features include approximate entropy and spectral entropy, as well as various autocorrelation-derived statistics.
Once the features for all time-series are computed, the efectiveness of each one as a predictor of tremor severity has to be evaluated.Each extremity was treated independently (i.e., right-hand tremor score only predicted with right-hand IMU-data).Although the overall dimensions of the feature space are 4850 samples x 90 features, not all of those 90 features are good predictors of clinical scores.To discard the ones that are not, a mutual information (MI) score is computed, which provides a good measure for the strength of the relationship between two variables [37].
Once the features were rated, only the 30 best performers were used in the subsequent steps of the analysis, and the remaining 60 features were discarded.

Classifer and Sampling
Strategies.We used a support vector machine (SVM) with radial basis functions (RBFs) for the classifer architecture given its ability to approximate nonlinear boundaries and train on a modestly sized dataset as well as its widespread use in the tremor-monitoring literature.
Given the highly imbalanced tremor scores in our dataset (Figure 4), we had to use a resampling strategy to balance out the three classes.A balanced dataset was achieved by frst applying random undersampling to the majority classes Score 0 and Score 1 until 300 samples were reached.To the minority class Score 2, we applied oversampling by creating new samples based on neighbouring points using the synthetic minority oversampling technique (SMOTE) [38] until 300 samples were reached (Figure 5).
Te resampled dataset was subdued to a random 75/25 train-test split.All computations in feature space (evaluating features, building and training the classifer, and implementing sampling strategies) were performed in Python (Python Software Foundation, https://www.python.org/) making use of the specialized software packages scikit-learn [39] and imbalanced-learn [40].Figure 3: Preprocessed IMU data capturing tremor dynamics.Te composite accelerometer signal, derived from the vector sum of threeaxis data, is shown at the top, indicating fuctuating tremor magnitudes.Initially, the tremor intensity sharply peaks, diminishes briefy, rises again, and tapers of.Below, the triaxial gyroscope data reveal that rotational motion, particularly along the X-axis, is not consistently synchronized with these variations in tremor magnitude.Te initial strong rotational activity is not mirrored during subsequent peaks of translational tremor, illustrating the distinct behaviours of these tremor components.

Selected Features.
Using a best-in-class approach based on the MI-scores for the initial set of 90 features (for best and worst performances see Supp Tables 2 and 3), the 30 best performers are selected (full list and MI-Scores in Supp Table 6).Tese 30 features are subsequently used to train and test the classifer.
A two-dimensional principal component analysis of the initial dataset is computed (Figure 6(a)).Te same feature transformation is applied to provide a visualisation of the resampled dataset (Figure 6(b)).Analogous to the rest of the analysis, these embeddings are constructed after the feature selection step (only making use of the 30 best features).Since neither training nor testing occurs on a dataset comprised of solely "real data" but both splits contain "synthetic instances" incorporated during the resampling process (about 28% of samples are synthetic), some further testing is performed to ensure that the classifer performs equally well when tested exclusively against "real data."Due to the undersampling step, a large number of samples (4137 instances of Score 0 and 66 instances of Score 1) remain neither used for training nor for testing the classifer (although they were used in the feature selection step).Testing our classifer on these samples yields a classifcation accuracy of 0.94.Te associated confusion matrices (absolute counts and normalised counts) are displayed in Figure 8. Te performance on the discarded portion of the dataset (notwithstanding the fact that no instances of Score 2 are available for testing in this dataset)  is comparable with the performance on the resampled dataset, with the same misclassifcation patterns being recognised in both-such as a mild tendency of Score 1 to be misclassifed as Score 0.

Discussion
Accurately assessing the severity and frequency of fuctuating PD symptoms is of signifcant importance during all stages of PD management.In contrast to time-consuming clinical testing or sporadic patient-reporting, continuous wearables-based assessment might allow for continuous, objective, and automated tracking of PD symptoms.

Towards Parkinsonian Symptom Assessment during Unconstrained Activity.
Miniaturised wearable sensors, in combination with modern signal processing techniques, show great promise as a tool for the continuous assessment of PD symptoms.While wearables-based clinical assessments during standardised and scripted tasks have been successfully implemented, assessments during unconstrained activity in free-living environments remain a challenge.In order to efectively transition from controlled environments to unconstrained, free-living patients, machine learning techniques that mine clinically relevant patterns from wearable sensor data need to be developed and validated.Parkinson's Disease 7 Here, we developed and implemented a machine learning algorithm to assess tremor severity in free-living PD patients.We evaluated this system on a 67-hour database comprising wearable sensor data and clinical tremor scores from 24 PD patients, who wore an IMU on each extremity for periods of approximately 3 hours.Based on features extracted from the sensor data, a support vector machine (SVM) was trained to predict the clinician-logged tremor severity for each extremity.Te classifer's ability to detect tremor was high (sensitivity of 0.90).Overall classifcation accuracy was 0.88 on the test-portion of the resampled dataset.To further test the robustness of the classifer, samples that had been previously discarded during the resampling strategy were classifed.Here, overall accuracy was 0.94 and the results were comparable in terms of misclassifcation rates, suggesting that the classifer did not overft the resampled dataset.In summary, we were able to implement a feature-based classifer for tremor monitoring that, with the utilisation of resampling techniques, can be trained even when only modestly sized and imbalanced datasets are available.Tis approach is especially valuable when large amounts of clinical data are not readily available or when gathering these data is costly.

Improving Parkinsonian Symptom Assessment in Free-
Living Environments.Our tremor-classifcation system's lowest performance (accuracy � 0.74) was encountered when classifying Score 1, as it tends to be misclassifed as an instance of class Score 0. A two-class approach (low tremor vs. high tremor severity) might still provide a more comprehensive picture of the severity of the Parkinsonian disease state between clinical visits as compared to currently available techniques, as well as provide more information about the timing and characteristics of tremor manifestations.Our successful use of resampling strategies supports the use of this family of techniques for processing wearable sensor data when the dataset is not large enough, or when the accompanying clinical scores are not balanced.However, validation of this tremor-classifcation system on previously unseen wearable sensor data is still lacking.While the current study provides valuable insights within its limitations, the potential for a more in-depth understanding of limb-specifc tremors awaits future investigations with larger and more balanced datasets.In light of the chosen random 75/25 train-test split serving our specifc resampling technique, we acknowledge the potential impact of this decision on result robustness.Future studies should consider exploring alternative validation methods, such as k-fold crossvalidation, particularly in more balanced (a priori) datasets, to enhance the reliability and generalizability of predictive models.Despite tremor being a cardinal motor symptom of PD, other motor symptoms (especially bradykinesia) should be investigated and tested as well to assess Parkinsonian symptoms with wearable sensors in free-living patients.
Such a continuous monitoring with wearable sensors might allow to observe the course of the disease closely with only moderate efort.Wearables-based PD assessments may also be useful in other respects, for example, as more objective and accurate tools for evaluating new therapies-this could also streamline drug approval processes by improving both the quality and quantity of data gathered during a study, potentially reducing the required sample sizes [41].While our tremor classifcation system focused on a frequency range of 4 to 7 Hz, this was chosen to mirror the frequencies most commonly observed in Parkinsonian resting tremor.However, we acknowledge the broader spectrum of tremor frequencies reported across varying studies and the individual variability among patients with Parkinson's disease.Future research could expand upon our fndings by incorporating a wider frequency analysis, which may reveal more about the nuanced nature of tremor expressions in diferent subtypes of the disease.Such a tailored approach could lead to more personalized diagnostic and monitoring tools.
As we continue to refne the assessment of Parkinsonian symptoms within free-living environments, our fndings suggest that personalized tremor frequency profles could hold signifcant promise for advancing precision medicine in Parkinson's care.Tis aligns with the need for a more comprehensive understanding of tremor subtypes-resting, postural, and kinetic-and their specifc clinical manifestations.Our current system sets a foundation for this advancement, with the potential to include subtype-specifc tremor characteristics in subsequent iterations.Tis evolution in our system's capabilities could ofer critical insights for developing individualized treatment strategies and enhancing the quality of life for those afected by PD.
Te recognition of tremor subtypes-resting, postural, and kinetic tremors-is fundamental in the clinical assessment of Parkinson's disease (PD).Each subtype has distinctive characteristics and clinical implications that are critical for accurate diagnosis and treatment planning.Our tremor-classifcation system, while not designed to distinguish between these subtypes in its current iteration, lays the groundwork for future systems that could integrate subtypespecifc features.Tis could further refne the monitoring and assessment of PD symptoms in free-living environments.

Expanding Applications beyond Parkinsonian Symptom
Assessment.Te utility of machine learning-based motor symptom classifcation systems transcends beyond parkinsonian symptom assessment in free-living patients.Tese computational models could be custom-tailored for patients with a diverse range of movement disorders, including essential tremor and gait disturbances, ofering a versatile tool for symptom monitoring and personalized treatment strategies.
Moreover, postclassifcation tremor-related signals hold promise for enhancing therapeutic interventions.Tese signals could serve as integral components in the design of closed-loop deep brain stimulation systems, potentially refning the optimization of stimulation parameters.By dynamically responding to patients' fuctuating symptom severity, such systems could ofer a more targeted and effcient therapeutic response.
Furthermore, the integration of these classifed signals with neurofeedback methods ofers a promising avenue for mitigating motor symptoms.Such techniques harness the power of real-time brain-computer interfaces, enabling patients to consciously control and reduce the manifestation of their symptoms, as demonstrated in [33].Tese innovative applications underscore the transformative potential of machine learning in revolutionising symptom management for patients with movement disorders.

Conclusion
We implemented a tremor-classifcation system that can be trained on a modestly sized and highly unbalanced dataset.Tis machine learning implementation displayed a high accuracy (tremor detection sensitivity of 0.9 and overall classifcation accuracy of 0.88) for addressing the crucial problem of monitoring tremor fuctuations in free-living PD patients.

Figure 1 :
Figure1: Sporadic clinical encounters do not provide an accurate portrayal of the disease course.Important bouts of severe symptoms, if they do not happen during clinical visits, might be missed.Tis leads to a gap between clinical assessment and patient self-report, which is the currently available system used to monitor the disease severity between appointments.Tis self-report, oftentimes in the form of diary entries, can however lack in precision, objectivity, and compliance on the side of the patient.Continuous monitoring would allow for automated and objective observation of disease state and progression with only moderate efort on the side of both clinicians and patients.

Figure 2 : 4 Parkinson's Disease 2 . 2 . 1 .
Figure 2: Depiction of various activities performed by patients during tremor assessment sessions.Clinicians conducted evaluations at three-minute intervals in an unconstrained environment simulating daily living scenarios, including reading, device usage, and light physical tasks.
Performance.After a random 75/25 train-testsplit of the resampled dataset, the train portion contains 675 samples.After training, the remaining 225 samples in the test portion are used to evaluate the classifcation performance.Te overall classifcation accuracy is 0.88 (198 out of 225 samples correctly classifed).Te associated confusion matrix (both absolute and normalised) is displayed in Figure 7. Sensitivity for samples containing tremor is 0.90.Accuracy scores are lowest for the intermediate class Score 1. Tere is a misclassifcation rate of 0.25 between the adjacent classes Score 0 and Score 1 and no misclassifcations between nonadjacent classes Score 0 and Score 2.

Figure 6 :
Figure 6: 2D principal component analysis of (a) the initial dataset and (b) the fnal dataset after applying the resampling strategy.While in (a) the tremor scores are highly imbalanced, the proportions of the three classes have been equalised in (b).Synthetic data are represented with a "+" sign.PCA embedding is based on the 30 best-performing features in the initial dataset.

Figure 7 :
Figure 7: (a) Confusion matrix and (b) normalised confusion matrix for the test part of the in-sample dataset.Consistent with the principal component visualisations, there is a degree of overlap between the classes that results in some instances of adjacent classes being misclassifed.Te overall classifcation accuracy is 0.88.

Figure 8 :
Figure 8: (a) Confusion matrix and (b) normalised confusion matrix for the out-of-sample predictions on the instances discarded during the undersampling technique.With an overall classifcation accuracy of 0.94, out-of-sample performance is comparable to in-sample performance.

Table 1 :
Demographic and clinical characteristics of the PD patients including age, sex, disease subtype, duration, stage (Hoehn-Yahr), motor examination scores (MDS UPDRS III ON/OFF), ON tremor score items 3.15-3.18,and daily levodopa equivalent dose (LED).