First, a dataset of simulated sEMG signals was built to assess the applicability of the LSTM-based approach to muscle activity detection and to compare its performance against the two muscle activity detectors TKEO [29, 30] and Stat [27]. Second, we further compared the performance of the three detectors on simulated sEMG signals, specifically highlighting the effect of the SNR of sEMG signals. Finally, LSTM-MAD was applied also to real sEMG signals, acquired from lower limb muscles during gait, to emphasize its advantages also in real contexts. Figure 1 represents the block diagram of the procedure followed in this study. Each block is described in the following paragraphs.
Simulated data
The sEMG signals acquired during cyclic movements, such as walking, cycling, and running, can be modeled by the superimposition of two different contributions: (i) the electrical activity (s) generated by each muscle during the contraction and (ii) the background noise (\(n\)) mainly generated by the neighboring muscles, the features of the electrode–skin interface, and the acquisition system electronics. Under the hypothesis of cyclic contractions, the sEMG signal can be defined as a cyclostationary process [39] and, therefore, described through the superimposition of two different stationary processes [28]:
-
i.
The muscle activity (\(s\)) modeled as a Gaussian process with zero-mean and variance \({\sigma }_{s}^{2}\), as described in (1):
$$s\left(t\right)\in N\left(0,{\sigma }_{s}^{2}\right)$$
(1)
where \({\sigma }_{s}\) was set equal to \({10}^{({SNR}/20)}\cdot 1\,\upmu V\);
-
ii.
The background noise (\(n\)) modeled as a zero-mean Gaussian process with variance \({\sigma }_{n}^{2},\) as described in (2):
$$n\left(t\right)\in N(0,{\sigma }_{n}^{2})$$
(2)
where \({\sigma }_{n}\) was set equal to \(1\mu V\).
Each realization of the muscle activity process \(s\left(t\right)\) was simulated assuming a time period of 1 s (i.e., the gait cycle duration) and a sampling frequency of 1 kHz [28]. Physiological muscle activity was modeled by time-windowing the Gaussian process \(s(t)\) through a single truncated Gaussian function centered at 50% of the gait cycle [28]. Different standard deviations (σ) and time supports (\(2\alpha \sigma\)) of the truncated Gaussian function have been considered to simulate sEMG signals similar to those observed in leg, thigh, and trunk muscles during gait. More specifically, three different values of the standard deviation (\(\sigma\) = 50, 100, and 150 ms) and four different values of the time support \(2\alpha \sigma\) (with \(\alpha\) = 1, 1.5, 2, and 2.4) have been tested [27]. Then, the background noise process (\(n\left(t\right)\)) was added. Nine different values of Signal-to-Noise Ratio (SNR) were simulated (SNR = 3, 6, 10, 13, 16, 20, 23, 26, and 30 dB) [27]. For each triplet of \(\sigma\), \(\alpha\), and SNR values, 100 different realizations have been simulated and, therefore, a dataset composed by 10,800 different realizations (3 standard deviations × 4 time supports × 9 SNRs × 100 realizations) was built.
The simulated sEMG signals were then band-pass filtered through a 4th order Butterworth digital filter with a lower cut-off frequency of 10 Hz and a higher cut-off frequency of 450 Hz [40]. Figure 2 represents an example of a simulated sEMG signal with the superimposition of the truncated Gaussian function \(s(t)\) used to model the physiological muscle activity.
The time-instants relative to each simulated muscle activity (\(s\)) were defined by a binary mask (\({y}_{Sim}\)) that was set equal to 1 in correspondence of the time-instants in which the truncated Gaussian assumed values higher than 0, and it was set equal to 0 otherwise.
Real data
Gait data acquired from 20 subjects were retrospectively analyzed to test the performance of the three different approaches when applied to real sEMG signals. Subjects were randomly selected from our database to include both healthy individuals and patients affected by neurological or orthopedic pathologies, during walking tasks [8, 41, 42]. This non-homogeneous group of subjects was specifically chosen to verify that the algorithm works under different conditions. In particular, eight out of 20 subjects were healthy adults (healthy age: 38.0 ± 13.1 years, height: 164.9 ± 5.4 cm, weight: 65.4 ± 21.2 kg) [8], six were patients after unilateral Total Hip Arthroplasty (THA age: 73.8 ± 8.4 years, height: 175.5 ± 7.6 cm, weight: 86.8 ± 16.3 kg) [41], and the other 6 were patients affected by idiopathic normal pressure hydrocephalus (NPH age: 75.7 ± 6.3 years, height: 170.5 ± 6.3 cm, weight: 72.5 ± 10.4 kg) [42].
Gait data were recorded through a multichannel acquisition system (STEP32, Medical Technology, Italy) [8, 43, 44]. SEMG signals were acquired through active probes (configuration: single differential, size: 19 mm × 17 mm × 7 mm, Ag-disks diameter: 4 mm, interelectrode distance: 12 mm, gain: variable in the range from 60 to 86 dB) placed over the following 4 muscles of the lower limb: rectus femoris (RF), lateral hamstring (LH), lateral gastrocnemius (LGS), and tibialis anterior (TA). Active probes were positioned according to the guidelines suggested by Winter [45]. Details on the sEMG sensor placement are described in a previous work by Agostini et al. [44]. The dominant lower limb (i.e., the leg used to kick the ball or to start walking) was analyzed for healthy subjects, while the clinically most affected limb was selected for pathological patients. Before applying the tested detectors, the acquired sEMG signals were band-pass filtered through a 4th order Butterworth digital filter with a lower cut-off frequency of 10 Hz and a higher cut-off frequency of 450 Hz [40].
For each subject, 5 gait cycles were randomly selected from the whole walking task to build the real dataset. Therefore, a dataset composed of 400 different sEMG signals (20 subjects × 5 gait cycles × 4 muscles) was obtained. The time-instants relative to each real-muscle activation were visually segmented by three experienced operators through a custom MATLAB® GUI (graphical user interface). More specifically, a binary mask (\({y}_{Real}\)) was set equal to 1 in correspondence of the time-instants in which the majority of the expert operators (at least two out of three) detected a muscle activity and to 0 otherwise.
Standard approach: Teager–Kaiser Energy Operator (TKEO)
One of the most commonly applied standard approaches for muscle activity detection is the Teager–Kaiser Energy Operator (TKEO), which has been demonstrated to increase the accuracy of the simple linear envelope approach [29, 30].
More specifically, a single-threshold was applied to the sEMG signals after the computation of the TKEO (ψ), defined as in (3):
$${\psi }_{x(n)}= {x(n)}^{2}-x\left(n+1\right)x(n-1)$$
(3)
where \(x\) represents the sEMG time-series and \(n\) the sample number. The single threshold was defined as described in (4):
$$Th= {\mu }_{n}\pm j\times {\sigma }_{n}$$
(4)
where \({\mu }_{n}\), \({\sigma }_{n}\) and \(j\) represent the mean of the background noise, the standard deviation of the background noise, and a multiplicative constant, respectively. In this study, the constant \(j\) was set equal to 7 as suggested in [29, 46]. Since the average (\({\mu }_{n}\)) and the standard deviation (\({\sigma }_{n}\)) of the background noise are required as inputs of this approach, the time-instants corresponding to the noise were automatically selected considering those that were simulated or segmented as background noise for the simulated and real sEMG signals, respectively.
The output of this detector was finally defined as a binary mask (\({\widehat{y}}_{TKEO}\)) that was defined as follows:
-
\({\widehat{y}}_{TKEO}\) = 1, if \({\psi }_{x(n)}\) ≥ \(Th\);
-
\({\widehat{y}}_{TKEO}\) = 0, if \({\psi }_{x(n)}\) <\(Th\).
Statistical approach: double threshold statistical detector (Stat)
The statistical approach used in this study is the double-threshold statistical detector proposed in [27]. The principal computation steps are the following:
-
i.
An auxiliary sequence \({z}_{i}\) is computed from the sEMG signals as the sum of the squared values of two successive samples (5)
$${z}_{i}= {x}_{i}^{2}+{x}_{i+1}^{2}$$
(5)
where \({x}_{i}\) and \({x}_{i+1}\) represent two consecutive samples of the sEMG time series;
-
ii.
A first (amplitude) threshold ζ is applied on a sliding detection window defined by m consecutive samples of the auxiliary sequence \({z}_{i}\);
-
iii.
Muscle activation is detected if at least \({r}_{0}\) (temporal threshold) out of m consecutive samples of the sliding detection window are equal to or above the first threshold ζ.
In this study, the length of the observation window (m) was set equal to 5, while the temporal threshold \({r}_{0}\) was set equal to 1 [27].
The output of this detector was finally defined as a binary mask (\({\widehat{y}}_{Stat}\)) that was defined as it follows:
-
\({\widehat{y}}_{Stat}\) = 1, if \({z}_{i}\) ≥ ζ for at least \({r}_{0}\) out of m samples;
-
\({\widehat{y}}_{Stat}\) = 0, otherwise.
Deep learning approach: LSTM-MAD
An LSTM recurrent neural network model is generally composed of the following architecture:
-
i.
An input sequence layer;
-
ii.
One or more LSTM layers used to learn the time-dependencies within the sequential data;
-
iii.
A fully connected layer used to convert the output size of the previous layers into the number of classes to be recognized;
-
iv.
A softmax layer used to compute the belonging probability to each class;
-
v.
A classification output layer used to compute the cost function.
In this study, several LSTM recurrent neural network models were tested to assess the applicability of the proposed approach for muscle activity detection, considering direcly sEMG signals, without any feature extraction step. To define the best LSTM model for muscle activity detection, each of the two datasets of simulated and real sEMG signals was divided into 3 different sets: training set (70%), validation set (15%), and test set (15%), respectively. The training set was used to train the LSTM models, while the validation set (or development set) was used to evaluate the network performance and to avoid the overfitting of the training data. More specifically, the validation set was used to stop training automatically when the validation accuracy stopped increasing to avoid overfitting [35]. Finally, the test set was used for the final validation and the comparison of LSTM-MAD with the other two detectors.
Using the Deep Learning Toolbox of MATLAB® release R2020b (The MathWorks Inc., Natick, MA, USA), 720 different LSTM models were tested. All the LSTM models had a sequence input layer consisting of 1 unit (i.e., the dimension of a single simulated or real sEMG signal) and a fully connected output layer consisting of 2 units (i.e., the number of classes to be recognized). Different numbers of LSTM hidden layers (n), numbers of hidden units for each LSTM hidden layer (\({n}_{units}\)), learning rate (\(\alpha\)) values, and drop period (\(\delta\)) values were tested to achieve the LSTM architecture with the highest performance. More specifically, two different number of LSTM hidden layers (\(n\) = 1 and 2), nine different numbers of hidden units for each LSTM hidden layer (\({n}_{units}\)= 100, 125, 150, 175, 200, 225, 250, 275, and 300), five different learning rates (\(\alpha\)= 0.01, 0.015, 0.02, 0.025, and 0.03), and eight different drop rate values (\(\delta\) = 10, 15, 20, 25, 30, 35, 40, and 45) were tested [35]. The adaptive moment (ADAM) optimization algorithm was adopted in this work to train all the tested LSTM models [47]. The performance of each LSTM model was assessed considering the simulated (or real) test set by computing the overall classification accuracy, defined as the number of correctly classified sEMG samples normalized to the total number of sEMG samples within the test set.
The training process was performed on a workstation with a 3.2 GHz six-core CPU, 32 GB of RAM memory, and a 64-bit Windows operating system.
The \({y}_{Sim}\), extracted from the truncated Gaussian functions, and the \({y}_{Real}\), manually defined by the expert operators through the MATLAB® GUI, were used as target (or ground truth) to train and test each LSTM model for the simulated and real datasets, respectively.
The output of the LSTM approach was computed as a binary mask (\({\widehat{y}}_{LSTM-MAD}\)) that was defined as it follows:
-
\({\widehat{y}}_{LSTM-MAD}\) = 1, if the sEMG time-instant was classified as muscle activity (class 1);
-
\({\widehat{y}}_{LSTM-MAD}\) = 0, if the sEMG time-instant was classified as background noise (class 0).
Post-processing
A post-processing step was applied to the output of each detector (i.e., standard, statistical, and deep learning approach) to reject the erroneous transitions due to the stochastic nature of the sEMG signal. Since it is generally accepted that a muscle activation shorter than 30 ms does not affect the kinetics and the kinematics of gait [48], all the muscle activations lasting less than 30 ms were discarded [27]. Figure 3 illustrates this concept. In particular, Fig. 3A shows a sample realization of a simulated sEMG signal modulated by a truncated Gaussian function (SNR = 16 dB, σ = 100 ms, and α = 1.5). Figure 3B represents the output of the standard approach (\({\widehat{y}}_{TKEO}\)) without any post-processing step, while Fig. 3C shows the effect of the post-processing on the detector’s output.
Performance evaluation
The muscle activations detected by the three different approaches (\({\widehat{y}}_{TKEO}\), \({\widehat{y}}_{Stat}\), and \({\widehat{y}}_{LSTM-MAD}\)) were quantitatively compared against the ground truth in terms of precision, recall, F1-score, Jaccard similarity index, and onset/offset bias. More specifically, the indexes were defined as it follows:
$$precision= \frac{TP}{TP+FP}$$
(6)
$$recall= \frac{TP}{TP+FN}$$
(7)
$$F1-score= \frac{2 \times (recall \times precision)}{(recall+precision)}$$
(8)
$$Jaccard= \frac{\left|{\widehat{y}}_{i}\cap y\right|}{\left|{\widehat{y}}_{i}\cup y\right|}$$
(9)
$$Bias= \frac{1}{N}{\sum }_{i=1}^{N}\left({\widehat{t}}_{i}-t\right)$$
(10)
where \(TP\) represents the True Positive (i.e., number of sEMG time-instants correctly classified by the detectors as muscle activity), \(FN\) describes the False Negative (i.e., number of sEMG time-instants incorrectly classified by the detectors as background noise), and \(FP\) represents the False Positive (i.e., number of sEMG time-instants incorrectly classified by the detectors as muscle activity). \(y\) represents the binary mask computed by the \(i\) th detector (\(i\) = 1: standard approach with TKEO, \(i\) = 2: statistical approach with the double-threshold statistical detector, \(i\) = 3: novel approach with LSTM-MAD), and \({\widehat{y}}_{i}\) represents the ground truth of the simulated (or real) test set. Finally, \(N\) is the number of estimates, \({\widehat{t}}_{i}\) is the estimate of the onset/offset time-instants obtained through the \(i\) th detector, and \(t\) represents the true onset/offset timings.
Effect of SNR on muscle activity detection
The effect of sEMG signals’ SNR on the performance of the three tested muscle activity detectors was assessed considering the simulated test set by computing the performance parameters above-mentioned, separately for each of the nine SNR values (SNR = 3, 6, 10, 13, 16, 20, 23, 26, and 30 dB).
Statistical analysis
The hypothesis of normality of the distribution of the computed performance parameters was tested through the Lilliefors test (“lillietest” MATLAB® function) setting the significance level (α) at 0.05. If the normality hypothesis was satisfied, one-way analysis of variance (ANOVA) for repeated measures (α = 0.05) was performed to assess significant differences in the performance of the three tested approaches and to test the effect of SNR on detectors’ performance, otherwise the Friedman’s test (α = 0.05) was implemented. Then, post-hoc analysis with Tukey’s adjustment for multiple comparisons was performed. The effect size of the statistically significant differences was calculated through the Hedges' g statistic [49] including the correction for small sample sizes. The statistical analysis was performed using the Statistical and Machine Learning Toolbox of MATLAB® release R2020b.