Classification complexity in myoelectric pattern recognition

Nilsson, Niclas; Håkansson, Bo; Ortiz-Catalan, Max

doi:10.1186/s12984-017-0283-5

Research
Open access
Published: 10 July 2017

Classification complexity in myoelectric pattern recognition

Journal of NeuroEngineering and Rehabilitation volume 14, Article number: 68 (2017) Cite this article

3975 Accesses
20 Citations
3 Altmetric
Metrics details

Abstract

Background

Limb prosthetics, exoskeletons, and neurorehabilitation devices can be intuitively controlled using myoelectric pattern recognition (MPR) to decode the subject’s intended movement. In conventional MPR, descriptive electromyography (EMG) features representing the intended movement are fed into a classification algorithm. The separability of the different movements in the feature space significantly affects the classification complexity. Classification complexity estimating algorithms (CCEAs) were studied in this work in order to improve feature selection, predict MPR performance, and inform on faulty data acquisition.

Methods

CCEAs such as nearest neighbor separability (NNS), purity, repeatability index (RI), and separability index (SI) were evaluated based on their correlation with classification accuracy, as well as on their suitability to produce highly performing EMG feature sets. SI was evaluated using Mahalanobis distance, Bhattacharyya distance, Hellinger distance, Kullback–Leibler divergence, and a modified version of Mahalanobis distance. Three commonly used classifiers in MPR were used to compute classification accuracy (linear discriminant analysis (LDA), multi-layer perceptron (MLP), and support vector machine (SVM)). The algorithms and analytic graphical user interfaces produced in this work are freely available in BioPatRec.

Results

NNS and SI were found to be highly correlated with classification accuracy (correlations up to 0.98 for both algorithms) and capable of yielding highly descriptive feature sets. Additionally, the experiments revealed how the level of correlation between the inputs of the classifiers influences classification accuracy, and emphasizes the classifiers’ sensitivity to such redundancy.

Conclusions

This study deepens the understanding of the classification complexity in prediction of motor volition based on myoelectric information. It also provides researchers with tools to analyze myoelectric recordings in order to improve classification performance.

Background

Decoding of motor volition via myoelectric pattern recognition (MPR) has many clinical applications such as prosthetic control [1], phantom limb pain treatment [2], and rehabilitation after stroke [3]. Research on MPR has focused on classifiers [4], pre-processing algorithms [5], and electromyography (EMG) acquisition [6], among other factors that influence the classification outcome. Reaz et al. studied different attributes of EMG signals, such as signal-to-noise ratio, that decrease the complexity of MPR [7]. However, limited studies have been conducted on the complexity of the classification task itself. Information on complexity prior to classification can inform on specific conflicting classes and flawed data acquisition. Understanding of classification complexity can also be used to select optimal features and evaluate trade-offs between the amount of classes and their separability.

Most MPR algorithms use EMG features extracted from overlapping time windows as the classifier input. Therefore, the resulting classification accuracy is dependent on the features used to describe the EMG signals. The performance of a variety of such features, and feature selection algorithms, have been studied previously [8, 9]. Two feature selecting algorithms, namely minimum redundancy and maximum relevance [10], and Markov random fields [11], were applied to an electrode array by Liu et al. [12], who used Kullback–Leibler divergence and feature scatter to rate the relevance and redundancy of features. The features were then ranked and selected into sets according to these ratings. Similarly, Bunderson et al. defined three data quality indices – namely, repeatability index (RI), mean semi-principal axis, and separability index (SI) – to evaluate the changes in data quality over repeated recordings of EMG [13]. Classification complexity estimation was not investigated in the aforementioned studies, but algorithms intended to quantify attributes relevant to the complexity of pattern recognition tasks were introduced.

Classification complexity has been studied outside the field of MPR. Singh suggested two nonparametric multiresolution complexity measures: nearest neighbor separability (NNS) and purity [14]. These complexity measures were compared with common statistical similarity measures, such as Kullback–Leibler divergence, Bhattacharyya distance, and Mahalanobis distance, and were found to yield a higher correlation with classification accuracy. These classification complexity estimating algorithms (CCEAs), along with Hellinger distance, were investigated in the present study with a focus on their relevance for MPR.

In the present study, CCEAs were evaluated based on their correlation with offline classification accuracy and real-time classification performance. Consequently, different attributes were revealed about the CCEAs, classification algorithms, and features descriptiveness. One such attributes – channel correlation dependency – was investigated further. The CCEAs that were found to yield high correlation with classification accuracy (NNS and SI) were then used for feature selection and benchmarked against features sets found in the literature.

The result of these experiments provided evidence of the suitability of CCEAs to predict MPR performance. The algorithms used in this work were implemented and made freely available in BioPatRec, an open-source platform for development and benchmarking of algorithms used in advanced myoelectric control [15, 16].

Methods

Data sets

Two data sets were used in this study and both were recorded on healthy subjects. The first set contained individual movements (IM data): 20 subjects, four EMG channels, 14 bits Analog to Digital Conversion (ADC), and 11 classes (hand open/close, wrist flexion/extension, pro/supination, side grip, fine grip, agree or thumb up, pointer or index extension, and rest or no movement) [15]. The second set contained individual and simultaneous movements (SM data): 17 subjects, eight EMG channels, 16 bits ADC, and 27 classes (hand open/close, wrist flexion/extension, pro/supination, and all their possible combinations) [17]. Disposable Ag/AgCl (Ø = 1 cm) electrodes in a bipolar configuration (2 cm inter-electrode distance) were used in both sets. The bipoles were evenly spaced around the most proximal third of the forearm, with the first channel placed along the extensor carpi ulnaris. Subjects were seated comfortably with their elbow flexed at 90 degrees and forearm supported, leaving only the hand to move freely. The data sets, along with details on demographics and acquisition hardware, are available online as part of BioPatRec [16]. Table 1 summarizes these data sets.

Table 1 Summary of data sets

Full size table

Signal acquisition, pre-processing and feature extraction

BioPatRec recording routines guided the subjects to perform each movement three times with resting periods in between. The instructed contraction time, as well as the resting time, was 3 s. The initial and final 15% of each contraction was discarded as this normally corresponds to delayed response and anticipatory relaxation by the subject, while the remaining central 70% still preserves portions of the dynamic contraction [15].

Time windows of 200 ms were extracted from the concatenated contraction data using 50 ms time increment. Features were then extracted from each time window and distributed in sets used for training (40%), validation (20%), and testing (40%) of the classifiers. The testing sets were never seen by the classifier during training or validation. A 10-fold cross-validation was performed by randomizing the feature vectors between the three sets before training and testing.

The following EMG signal features were used as implemented in BioPatRec [15, 16, 18]. In the time domain: mean absolute value (tmabs), standard deviation (tstd), variance (tvar), waveform length (twl), RMS (trms), zero-crossing (tzc), slope sign changes (tslpch), power (tpwr), difference abs. Mean (tdam), max fractal length (tmfl), fractal dimension Higuchi (tfdh), fractal dimension (tfd), cardinality (tcard), and rough entropy (tren). In the frequency domain: waveform length (fwl), mean (fmn) and median (fmd). Feature vectors were constructed by sets of these features extracted from all channels, as commonly done in MPR and implemented in BioPatRec (for a detailed explanation see reference [15]).

Classification complexity estimating algorithms

The classification complexity estimating algorithms (CCEAs) were designed to return classification complexity estimates (CCEs) for each movement separately (individual result), and averaged over all movements (average results). Individual results provide information that facilitates the choice of movements to be included in a given MPR problem by distinguishing conflicting classes. Average result considers the complete feature space, including all movements, and can therefore be used to evaluate and compare feature sets used to build the feature space. The CCEAs used are outlined below.

Separability index

Separability index (SI) was implemented as introduced by Bunderson et al.; that is, the average of the distances between all movements and their most conflicting neighbor [13]. Figure 1a illustrates the distance and conflict between two classes in an exemplary two-dimensional feature space.

The aforementioned distance was defined by Bunderson et al. to be half the Mahalanobis distance, resulting in the following equation:

$$ SI=\sum_{i=1}^K\left(\underset{j=1,\dots, i-1, i+1,\dots, K}{\mathit{\min}}\frac{1}{2}\sqrt{{\left({\mu}_i-{\mu}_j\right)}^T{S}_i^{-1}\left({\mu}_i-{\mu}_j\right)}\right) $$

where K is the number of classes or movements, and μ_x and S _x are mean vectors and covariance matrices for class x, respectively.

This definition only considers the covariance of the target movement (S _i), and not that of the comparing movement (that is, S _j). We considered this particular formulation as a potential limitation, so we introduced additional distance definitions. The distance definitions were used under the assumption of normality as Mahalanobis distance was defined under the same assumption [19]. The introduced distance definitions are described in Table 2 .

Table 2 Distance definitions for SI

Full size table

Nearest neighbor Separability

Nearest neighbor separability (NNS) was inspired by the algorithm with the same name defined by Singh [14]. It is based on the dominance of nearest neighbors, in feature space, belonging to the same class (movement) as a target data point. The contributions of the nearest neighbors are weighted by their proximity to the target point and the result is normalized to be a value between 0 and 1. Let

$$ b\left({p}_t,{p}_i\right)=\left\{\begin{array}{c}1,\\ {}0,\end{array}\right.\begin{array}{c} if\ {p}_t,{p}_i\in C\\ {}\kern2em if\ {p}_t\in C,{p}_i\notin C\end{array} $$

Where p _t. is the target point, p _i is p _t.:s i-th nearest neighbor and C is a class. The aforementioned dominance is then defined as:

$$ {d}_t={\left(\sum_{i=1}^k\frac{1}{i}\right)}^{-1}\sum_{i=1}^k\frac{b\left({p}_t,{p}_i\right)}{i} $$

A target point and its six nearest neighbors are illustrated in Fig. 1b.

The end result is the average dominance:

$$ NNS=\frac{1}{N}\sum_{i=1}^N{d}_i $$

Where N is the total number of samples.

Unless stated otherwise, the parameter k is set to 120, which is the maximum number of nearest neighbors from the same class for the data sets of this study.

Purity

Purity was computed by dividing the feature hyperspace into smaller hyper cuboids called cells [14]. The cells were rated individually and high dominance of one class in one cell meant high purity for that cell. The final purity of a data set was the average over all cells and different cell resolutions.

Repeatability index

The repeatability index (RI) measures how much individual classes varies between different occurrences using Mahalanobis distance [13]. The three repetitions during the recording session were the occurrences that were evaluated. The end result is the average Mahalanobis distance between the first repetition and the following ones for all movements.

Classifiers and topologies

Three common classifiers for MPR were used in this study: linear discriminant analysis (LDA), multi-layer perceptron (MLP), and support vector machine (SVM). A quadratic kernel function was used for SVM. The classifiers were utilized as implemented in BioPatRec [15] (code available online [16]), where LDA and SVM were implemented using Matlab’s statistical toolbox.

MLP and SVM are inherently capable of simultaneous classification when provided with the feature vectors of mixed (simultaneous) outputs, hereafter referred as “MIX” output configurations; that is, there is one output for every individual movement and combinations of movements produce the corresponding mix of outputs to be turned on. LDA’s output is computed by majority voting, which means it cannot produce simultaneous classification by creating a mixed output. However, classifiers like LDA can still be used for simultaneous classification using the label power set strategy, where the classifier is constructed having the same number of outputs as the total number of classes. This configuration is referred to here as “all movements as individual” (AMI). Ortiz-Catalan et al. showed that AMI could also favor classifiers capable of mixed outputs [17]; therefore, MLP and SVM were evaluated in both MIX and AMI configurations for simultaneous predictions. In addition, LDA was also used in the One-Vs-One topology (OVO), as this has been shown to improve classification accuracy for individual movements [17, 20].

Evaluation and comparison

In order to evaluate the correlation between Classification Complexity Estimates (CCEs) and classification accuracy, all features were used individually to classify all movements from each subject in both data sets, which provided a wide range of classification accuracies and their related CCEs. Correlations were then calculated considering the classification of each movements individually (individual results), or the average over all movements (average results).

The CCEAs were further used to select one set of two, three, and four features. CCEs were calculated for all possible combinations of features and the three sets – one for every number of features – predicting the highest accuracy were selected. The selected sets are referred hereafter as the best sets and were obtained using the IM data set.

Ortiz-Catalan et al. used a genetic algorithm to find optimal feature sets of two, three, and four features based on classification performance [8]. Their proposed sets of two and three features were used as benchmarking sets in this study, along with the commonly used four-feature set proposed by Hudgins et al. [21]. These sets are referred in this study as reference sets:

Ref 2F: tstd, trms [8]
Ref 3F: tstd, fwl, fmd [8]
Ref 4F: tmabs, twl, tslpch, tzc [21]

The best and reference sets of equal number of features were compared to each other based on the resulting classification accuracy, as given by the three different classifiers. Classification accuracy corresponds to offline computations unless otherwise stated. Real-time testing was done using the Motion Tests as implemented in BioPatRec [15, 22]. CCEAs’ proficiency at predicting real-time performance was evaluated by their correlation with the completion time obtained from motion tests, which is the time from the first prediction not equal to rest until 20 correct predictions are achieved. Similar to offline computations, one prediction was the classification of one 200 ms time window, and new predictions were produced every 50 ms (time increment). The subject was instructed to hold the requested movement until 20 correct predictions were achieved. If the number of correct predictions was less than 20 after 5 s, the completion time was set to 5 s. The real-time results were obtained from IM data set and related Motion Tests [22].

Wilcoxon signed-rank test (p < = 0.05) was used to evaluate statistical significant differences. Correlations were calculated using Spearman’s rho, since there was no clear linearity in the dependencies between accuracy and CCE.

Results

Separability index (SI)

The correlations found between classification accuracy and SI using different distance definitions are summarized in Table 3, where the highest value for every classifier is highlighted. Figures 2 and 3 shows plots of average result for IM and SM data sets, respectively, with the most correlating distance definition highlighted for classifiers individually. Table 3, Figs. 2 and 3 indicates that the most adequate distance definitions vary with the classifier.

Table 3 Correlations for the different distance definitions

Full size table

Mahalanobis distance

Mahalanobis distance was found as the distance definition that most closely correlated with LDA in an OVO topology for individual results using SM data. The corresponding classification accuracy against SI is plotted in Fig. 4a.

Kullback–Leibler divergence

Kullback–Leibler divergence was not found to yield higher correlation than any other distance definition for any of the classifiers; however, it was found to correlate most closely with the average results of MLP using both topologies. This correlation is visualized in Figs. 2 and 3. Owing to its low correlation with classification accuracy, Kullback–Leibler divergence was not used in the reaming experiments.

Bhattacharyya distance

Bhattacharyya distance was the most correlating distance definition for MLP in both AMI and MIX configurations. Plots of classification accuracy for the two classifiers against SI based on Bhattacharyya distance is shown in insets B and C of Fig. 4. Individual results are presented and IM data and SM data are used for AMI and MIX configurations, respectively.

Hellinger distance

Bhattacharyya distance and Hellinger distance are highly related as they are both based on the Bhattacharyya Coefficient. Table 3 confirms their resemblance as the correlations related to the two distance definitions are very similar in all cases. Naturally, Hellinger distance and Bhattacharyya distance are the distance definitions that most closely correlate with MLP MIX and AMI for individual result, and with MLP AMI for average result. MLP AMI classification accuracy is plotted against Hellinger distance based SI in Fig. 4e, where individual results using IM data is represented.

Modified Mahalanobis

Modified Mahalanobis was found as the distance definition that correlates most closely with average results of LDA and SVM classification accuracy for all topologies and configurations. The same is true for individual results, except for LDA in an OVO topology. Insets E and F of Fig. 4 show LDA AMI and SVM MIX classification accuracy plotted against SI based on Modified Mahalanobis. Modified Mahalanobis was the version of Mahalanobis distance used in the remaining results because of its overall higher correlation with classification accuracy.

Nearest neighbor separabillity (NNS)

A summary of correlations with all classifiers for both data sets is presented in Table 4.

Table 4 Correlations between classification accuracy and nearest neighbor separability

Full size table

Table 4 also shows the influence of the parameter k. Figures 5 and 6 show plots of average result for the IM and SM data, respectively.

NNS is most correlated with LDA in an OVO topology, which is equivalent to the results obtained by SI based on Bhattacharyya distance for the same classifier. The individual results for LDA using OVO are plotted for both data sets in Fig. 7.

Purity and repeatability index

Purity and repeatability index resulted in low correlation with classification accuracy for all classifiers. The correlations for IM data can be found in Table 5. Figure 8 shows Individual results of MLP for the two algorithms and the aforementioned data set. Because of the low correlation, purity was excluded from the following experiments, and RI from the Feature Sets experiment.

Table 5 Correlation for purity and repeatability index regarding classification accuracy

Full size table

Feature sets

In this section, the best sets are compared with each other and the reference sets. In Fig. 9, the best sets corresponding to the distance definitions of SI are compared. The modified Mahalanobis sets are significantly higher than the other distance definitions sets in eight out of 12 cases, and averagely higher in all but the case where MLP is used with sets of three features. In that case, Bhattacharyya distance and Hellinger distance sets performing higher average classification accuracy.

The influence of parameter k of the NNS algorithm is shown in Fig. 10 by comparing the best sets for k = 120 and k = 20. The higher value of k leads to higher average classification accuracy in all cases. However, it is statistically significant for SVM and three features only.

The members with the highest average classification accuracy were selected from Figs. 9 and 10 – modified Mahalanobis and k = 120, respectively – to be compared with the reference sets in Fig. 11. The NNS sets leads to significantly higher classification accuracy than the reference in all but one case, while modified Mahalanobis is significantly higher for nine out of 12. The average classification accuracy for the NNS sets is higher than modified Mahalanobis for all classifiers except LDA in an OVO topology, where Modified Mahalanobis is consistently higher.

Real time

Figure 12 summarizes the correlations between the motion test result completion time and CCEs corresponding to RI, NNS, and SI based on modified Mahalanobis and Bhattacharyya distance. Statistically significant correlations (p < 0.001) are highlighted by a darker frame.

Feature attribute

As the correlations used to evaluate the CCEAs were derived by use of one feature at a time, attributes of features individually were revealed. Examples of such attributes are average classification accuracy and classification accuracy variance. These two attributes are illustrated in Figs. 13 and 14 for IM and SM data, respectively. Figure 13 shows the five features that resulted in the highest and lowest average classification accuracy for classifiers separately.

One attribute that was observed to highly influence the CCEAs’ correlation with classification accuracy was channel correlation; that is, correlation between feature sequences extracted from the channels separately using only the feature considered. To illustrate this attribute, average determinants of the channel correlation matrices over all subjects for the different features were extracted from SM data and shown in the bar diagram in Fig. 15.

The features marked by red color have low average correlation matrix determinants, which means a high correlation between channels, while the blue color represents features of low channel correlation. Figure 16 shows how the two groups of features, red and blue from Fig. 15, cluster differently in classification accuracy against CCE plots.

The blue group has similar dependency on classification accuracy for the three classifiers, while the red clearly varies between them.

Discussion