Title: Classification of Parkinson’s disease and essential tremor based on gait and balance characteristics from wearable motion sensors: A data-driven approach

: Parkinson’s disease (PD) and essential tremor (ET) are movement disorders that can have similar clinical characteristics including tremor and gait difficulty. These disorders can be misdiagnosed leading to delay in appropriate treatment. The aim of the study was to determine whether gait and balance variables obtained with wearable sensors can be utilized to differentiate between PD and ET using machine learning techniques. Additionally, we compared classification performances of several machine learning models. A balance and gait data set collected from 567 people with PD or ET was investigated. Performance of several machine learning techniques including neural networks (NN), support vector machine (SVM), k-nearest neighbor (kNN), decision tree (DT), random forest (RF), and gradient boosting (GB), were compared using F1-scores. Machine learning models classified PD and ET based on balance and gait characteristics better than chance or logistic regression. The highest F1-score was 0.61 of NN, followed by 0.59 of GB, 0.56 of RF, 0.55 of SVM, 0.53 of DT, and 0.49 of kNN. The results demonstrated the utility of machine learning models to classify different movement disorders. Further study will provide a more accurate clinical tool to help clinical decision-making.


Introduction
Parkinson's disease (PD) and essential tremor (ET) are common movement disorders characterized by the presence of tremor [1]. Although ET has traditionally been considered a mono-symptomatic disorder presenting with tremor, increasing evidence suggests that ET is a complex disorder with involvement of other motor and non-motor symptoms [2]. Both PD and ET can share clinical features including motor symptoms such as bradykinesia (slow movement), gait impairment and dystonia (involuntary muscle contraction), and non-motor symptoms such as cognitive impairments, sleep disturbances, depression, and anxiety [3,4].
Diagnosis of these disorders can be challenging for clinicians due to overlapping symptoms, and these disorders are frequently confused and misdiagnosed. A past study reported that about a third of patients with PD or dystonia were misdiagnosed with ET [5]. Since misdiagnosis can prevent or delay appropriate medical care and worsen patients' quality of life, accurate differentiation between PD and ET is important to provide optimal care. Clinical observation of gait and balance impairments can play a major role in classifying different conditions and monitoring the progression of PD and ET. Subtle changes in gait have even been found to occur before a clinical diagnosis of PD [6,7], Alzheimer's disease [8], or multiple sclerosis [9], suggesting gait as a potential biomarker for neurological disorders. Balance and gait impairments are more prominent and clinically observable in PD than in ET. However, there is growing evidence suggesting gait abnormalities in patients with ET [10]. Previous studies showed gait and balance abnormalities such as decreased cadence [11,12], decreased gait speed [12], increased double support [11,12], abnormalities in tandem gait [13][14][15], and postural instability in ET [11,16]. These abnormalities in ET are also commonly found in PD, which contribute to misdiagnosis of the two movement disorders [5].
Advances in technology enable an objective assessment of gait and balance through numerous devices such as body-worn inertial motion unit (IMU) sensors, 3-dimentional motion capturing systems, force plates, gait walkways, and smartphones. Many movement disorder clinics and research laboratories have started to implement these technological devices in their practices [17], particularly IMU sensors to evaluate balance and gait in PD [18,19] . Subsequently, a vast amount of complex and non-linear data from technological devices are available for clinicians and researchers that require advanced statistical analyses. Machine learning is widely employed to analyze large data sets produced from movement disorder clinics and research laboratories [20]. Among various machine learning techniques, neural network (NN) models are broadly utilized due to their superior performance compared to traditional analytic methods such as logistic regression (LR) [21]. Previously, NNs have been employed in gait and balance studies to process signals from wearable devices in PD [22][23][24][25].
In addition, a study used a NN to successfully discern PD from ET using surface electromyography data [26]. Other machine learning algorithms such as support vector machine (SVM) and k-nearest neighbor (kNN) were used to differentiate between PD and ET based on IMU sensors, but they mainly investigated upper body tremors [27][28][29][30]. To our knowledge, no study has utilized machine learning techniques to differentiate between PD and ET based on data collected from gait and balance characteristics from wearable IMU sensors.
Therefore, the primary aim of this data-driven study was to distinguish between PD and ET using balance and gait characteristics obtained from IMU sensors using machine learning techniques. The secondary aim of the study was to compare and evaluate different machine learning performances in distinguishing between PD and ET using balance and gait data obtained from IMU sensors.
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted April 22, 2020

Participants
The instrumented gait and balance tests were administered at the Parkinson's Disease and Movement Disorder Clinic of the University of Kansas Medical Center. Between January 3, 2017 and December 11, 2018, a total of 1,468 people was tested. We excluded people if they were diagnosed with both PD and ET (n = 29) and/or if they had a history of deep brain stimulation surgery (n = 468). For those who visited the clinic more than once during the study period (n = 628), we only included the data from their first visits. Additionally, we excluded people with no data recorded due to technical error of the measuring device (n = 65), leaving a total of 567 people with PD or ET in the study.

Protocol and materials
Participants wore six IMU sensors (Opal, APDM, Inc., Portland, OR, USA). Two wrist sensors were bilaterally mounted on the dorsal side of the wrist and two foot sensors were bilaterally mounted to the instep (dorsal side of metatarsus) of each foot. The sternum sensor was mounted on the sternum of the chest and the lumber sensor was mounted to the posterior side at the L5 region. All six sensors were firmly tightened to the designated locations using straps during testing.
The instrumented stand and walk (iSAW) test was administered. During the iSAW test, participants were instructed to stand still for 30 seconds, walk straight for 7 meters at a comfortable speed after hearing a beep, turn 180° around at the end of 7-meter marker, then walk back to the start point. The iSAW test is a reliable and valid balance and gait measure for clinical use [31][32][33].
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Pre-processing data set
Among 130 gait and balance parameters computed by the Mobility Lab software, a total of 48 parameters with clinical relevance were included in the study based on a recent review [34] and the clinical expertise of the authors (Table 1). In the data set, 524 participants had PD (age = 66.73 ± 9.17, disease duration = 8.20 ± 5.11 years), while 43 had ET (age = 66.98 ± 9.84, disease duration = 13.83 ± 13.79 years) as their primary diagnoses. The ratio between PD participants and ET participants was highly imbalanced, 92.5% (PD) vs 7.5% (ET), in the data set. To mitigate the effect derived from the imbalanced data, we utilized a Synthetic Minority Over-sampling Technique (SMOTE) [35], an oversampling approach to create synthetic minority class examples. For missing values, we used a univariate feature imputation algorithm to predict missing values in datasets before training the classification model. We initially attempted to reduce the dimension of the data using principal component analysis (PCA); however, we found that there was a slight performance drop when applying PCA. One potential reason behind this phenomenon was that we manually selected useful features before PCA. Hence, we did not utilize the dimension reduction technique.
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 22, 2020. .

Gait -lumbar
Coronal range of motion (degrees) Sagittal range of motion (degrees) Transverse range of motion (degrees)

Gait -trunk
Coronal range of motion (degrees) Sagittal range of motion (degrees) Transverse range of motion (degrees) The classification models included NN, SVM, kNN, decision tree (DT), random forest (RF), gradient boosting classifier (GB), and LR. To find out the optimal values of the hyper-parameters of the classification models, we used a stratified 3-fold cross validation with grid search strategy. Table 2 shows the hyper-parameter search spaces of each classification model.  balanced'} Note: gamma parameter in SVM was applied when the kernel is radial basis function 'rbf'; class_weight was applied when the oversampling approach (SMOTE) was not used.

Performance evaluation
The classification models were evaluated with accuracy (a ratio of correct prediction to total observations), recall (a ratio of correct prediction of positive cases to all observations in actual cases), precision (a ratio of correct prediction of positive cases to all positive cases), and F1 score (a harmonic mean of precision and recall). Of note, all performances were micro-averaged. The accuracy, and precision and recall for the F1-score were calculated as follows (TP = true positive, TN = true negative, FP = false positive, FN = false negative): . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 22, 2020. .

Results
The results of NN, SVM, kNN, DT, RF, GB, and LR with and without the oversampling approach are shown in Table 3  CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 22, 2020. . RF = random forest, GB = gradient boosting, SMOTE = synthetic minority over-sampling technique.

Discussion
This data-driven study aimed to differentiate between two movement disorders, PD and ET based on gait and balance characteristics collected from IMU sensors using various machine learning models. Additionally, the classification performance was compared across different machine learning models.
Recent technological advances enable clinics and research laboratories to employ wearable devices in their balance and gait assessments. This allows precise measurement of balance and abnormalities and accurate monitoring of physical activities of daily living.
However, the data produced by technological devices are often overwhelming and under-utilized due to the size and complexity of the data [36]. Our data set provides useful clinical information for gait and balance such as gait speed, cadence, and postural sway collected by wearable devices. Our current results suggest that machine learning models can increase the utility of these complex data collected by technological devices. Our machine learning models outperformed (F1-scores ranging between 0.49 and 0.61) the dummy model (F1-score = 0.48) to classify the two movement disorders.
In this study, with SMOTE, the NN outperformed other models in classifying two different movement disorders solely based on a clinically available gait and balance data set.
The F1-score of NN was 0.61, showing the highest performance among 8 models in the analysis. The robustness of NN performance typically shows in large and complex data.
Previous studies in PD have demonstrated NN as the superior machine learning technique using data collected from wearable IMU sensors in levodopa-induced dyskinesia assessment and detection [22,23], gait abnormality classification [24], and discrimination between . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 22, 2020. . https://doi.org/10.1101/2020.04.17.20065441 doi: medRxiv preprint people with PD who underwent subthalamic stimulation and healthy controls [25]. Although the GB showed a similar performance based on the F1-score (0.59), the accuracy of GB (85%) was lower than that of the NN (89%). Accuracies of other comparison models including SVM, kNN, DT, RF, and LR ranged between 65% and 87% and their F1-scores were lower than the NN. However, particularly for this data-driven study, a higher performance in accuracy is not likely to reflect superior performance of the model, because the data set was heavily imbalanced with 92.5% of people with PD and 7.5% of people with ET. This implies that a dummy model will be 92.5% accurate in classifying PD if the model categorized each case as PD. The F1-score is a more adequate measure especially for an imbalanced data set in machine learning, because a greater F1-score reflects both low false positives and negatives [37].
The current study has several limitations. First, in our study, the data set was imbalanced towards an overrepresentation of PD. To overcome this limitation, we implemented the SMOTE that generates synthetic minority class samples, which is a widely used oversampling method [38]. Our findings demonstrated the SMOTE increased the classification performance, based on F1-scores, in the majority of models in the study (Table 3). This result may indicate the SMOTE was effective to minimize the influence of imbalanced class distribution in the current data set. However, further classification studies are required to use a data set that better reflects the actual disease representation in the population. Second, the design of the current study was a cross-sectional design including patients with clinically diagnosed PD or ET. Future research needs to include a longitudinal analysis using data over time to inform the accuracy of NNs using gait and balance characteristics from wearable devices to assist in the differential diagnosis or disease progression of PD and ET. Third, unlike past studies that utilized raw signal data captured by IMU sensors [22][23][24][25], the current . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 22, 2020. . study utilized pre-processed data (i.e. gait speed, sway area, cadence, etc.) from raw signal data as input variables. We opted to use the pre-processed data since these are readily available, adding to the clinical relevance of our current findings. However, we acknowledge that using raw data adds more information to the model since the pre-processing procedure might result in a significant loss of raw signal features directly from IMU sensors. In general, the performance of NN can be more precise and accurate when the model is fed more data.
Thus, further examination using raw data collected from wearable IMU sensors can offer new insights that extend our current findings.

Conclusions
Wearable sensors for balance and gait assessments can be implemented in movement disorders clinics to produce a vast amount of potentially informative data for assisting in diagnosis and monitoring disease progression. The current study showed that NN with SMOTE outperformed machine learning models and traditional logistic regression in classifying PD and ET based on the pre-processed balance and gait IMU data set. With further validation, a data-driven approach using machine learning techniques may provide a more efficient diagnostic and prognostic tool that can aid clinicians' decision-making process.
Ethical Statement: The study ethics was approved by the University of Kansas Medical Center Institutional Review Board (#12351 -Movement Disorder Research Registry and #STUDY00143000 -Gait assessment in Movement Disorders). The data set used in this retrospective data base study was collected as part of routine patient care after patient's consent.
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Conflicts of Interest:
The authors declare no conflict of interest.
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted April 22, 2020. . https://doi.org/10.1101/2020.04. 17.20065441 doi: medRxiv preprint