Long short-term memory (LSTM) recurrent neural network for muscle activity detection

Ghislieri, Marco; Cerone, Giacinto Luigi; Knaflitz, Marco; Agostini, Valentina

doi:10.1186/s12984-021-00945-w

Research
Open access
Published: 21 October 2021

Long short-term memory (LSTM) recurrent neural network for muscle activity detection

Marco Ghislieri ORCID: orcid.org/0000-0001-7626-1563^1,2,
Giacinto Luigi Cerone^2,3,
Marco Knaflitz^1,2 &
…
Valentina Agostini^1,2

Journal of NeuroEngineering and Rehabilitation volume 18, Article number: 153 (2021) Cite this article

5873 Accesses
23 Citations
5 Altmetric
Metrics details

Abstract

Background

The accurate temporal analysis of muscle activation is of great interest in many research areas, spanning from neurorobotic systems to the assessment of altered locomotion patterns in orthopedic and neurological patients and the monitoring of their motor rehabilitation. The performance of the existing muscle activity detectors is strongly affected by both the SNR of the surface electromyography (sEMG) signals and the set of features used to detect the activation intervals. This work aims at introducing and validating a powerful approach to detect muscle activation intervals from sEMG signals, based on long short-term memory (LSTM) recurrent neural networks.

Methods

First, the applicability of the proposed LSTM-based muscle activity detector (LSTM-MAD) is studied through simulated sEMG signals, comparing the LSTM-MAD performance against other two widely used approaches, i.e., the standard approach based on Teager–Kaiser Energy Operator (TKEO) and the traditional approach, used in clinical gait analysis, based on a double-threshold statistical detector (Stat). Second, the effect of the Signal-to-Noise Ratio (SNR) on the performance of the LSTM-MAD is assessed considering simulated signals with nine different SNR values. Finally, the newly introduced approach is validated on real sEMG signals, acquired during both physiological and pathological gait. Electromyography recordings from a total of 20 subjects (8 healthy individuals, 6 orthopedic patients, and 6 neurological patients) were included in the analysis.

Results

The proposed algorithm overcomes the main limitations of the other tested approaches and it works directly on sEMG signals, without the need for background-noise and SNR estimation (as in Stat). Results demonstrate that LSTM-MAD outperforms the other approaches, revealing higher values of F1-score (F1-score > 0.91) and Jaccard similarity index (Jaccard > 0.85), and lower values of onset/offset bias (average absolute bias < 6 ms), both on simulated and real sEMG signals. Moreover, the advantages of using the LSTM-MAD algorithm are particularly evident for signals featuring a low to medium SNR.

Conclusions

The presented approach LSTM-MAD revealed excellent performances against TKEO and Stat. The validation carried out both on simulated and real signals, considering normal as well as pathological motor function during locomotion, demonstrated that it can be considered a powerful tool in the accurate and effective recognition/distinction of muscle activity from background noise in sEMG signals.

Background

Dynamic muscle activity can be non-invasively investigated through surface electromyography (sEMG). Determining the start (onset) and end (offset) time-instants of muscle activations during human movements is of great interest in different research fields, such as gait analysis [1], motor rehabilitation and sport science [2], myoelectric control of prostheses [3], human–machine interaction [4], design of biofeedback systems [5], and pre-processing of muscle synergy extraction [6,7,8,9]. In particular, the accurate temporal analysis of muscle activation in terms of burst onset, duration of the activation interval, and burst offset, can be useful in the assessment of the altered locomotion patterns of orthopedic and neurological patients [10, 11].

In an attempt to increase the accuracy of the temporal analysis of muscle activations, several methods have been proposed in the literature, from the simplest approaches based on single-threshold detectors [12] to more complex approaches based on wavelet transform [13, 14], statistical optimal decision criteria [15, 16], or deep learning techniques [17,18,19,20,21,22,23,24,25,26]. One of the most widely used ways to detect the timing of muscle activations from sEMG signals is using a double-threshold detector, such as the double-threshold statistical detector by Bonato et al. [27], specifically developed for gait analysis. The authors of that paper claims an estimation bias on the onset timing of less than 10 ms, evaluated on simulated signals. However, this detector requires, as a necessary input parameter, to set the first (amplitude) threshold, the background-noise power. Furthermore, to fine-tune the second (temporal) threshold, it is important to estimate the Signal-to-Noise Ratio (SNR), i.e., through the algorithm described in [28]. Another standard approach is the single-threshold detector based on the Teager–Kaiser Energy Operator (TKEO) [29, 30], where the reported onset bias is 13 ms and 55 ms on simulated and real sEMG signals, respectively.

The majority of the existing threshold-based methods suffers from two main limitations. First, the selection of the muscle activation intervals is usually based on the extraction of some time- or frequency-domain features (i.e., the signal amplitude or energy, SNR, …) which may not be sufficient to properly detect the onset/offset time instants. Threshold-based algorithms usually rely only on the signal energy or amplitude to detect muscle activation intervals, while ignoring many other features that might lead to a more accurate temporal analysis of the dynamic muscle activity. Second, since the amount of noise superimposed to the sEMG signals may vary during the recording sessions due to changes in the skin–electrode interface characteristics or in the ground reference level, the performance of the threshold-based approaches may be strongly affected. To the best of the authors’ knowledge, only a few studies have been focused on the definition of models able to efficiently work even at very low SNR values [13] or designed to deal with changes in the SNR of the sEMG signal over time [31]. The detection of the start and end time-instants of muscle activations during human movements can also be heavily compromised by the presence of spurious background spikes, both in physiological and pathological conditions, which may have different sources, such as the hyper-excitable motor unit discharges characterizing stroke survivors or spinal cord injury patients and a slight displacement of the skin–electrode interface [32,33,34].

Alternative methods, such as deep learning approaches, are being explored to perform sEMG-based pattern recognition [17,18,19,20,21,22,23,24,25]. The topic dealt with in this paper is somewhat easier compared to a pattern-recognition problem. Indeed, we are not interested in classifying different movements, but simply detecting the presence or absence of muscle activation. Exploiting artificial intelligence, such as a recurrent neural network (RNN), resulted a winning strategy in a wide variety of different applications and might be explored also for our problem. RNN is a powerful learning algorithm inspired by the biological neural networks that constitutes the human brain, and it is trained to present to the network a large number of labeled “examples” [35]. More specifically, long short-term memory (LSTM) neural networks are a widely used type of RNNs designed to recognize patterns and time-dependencies in sequential data, such as numerical time series, texts, and audio tracks [36]. These neural networks were first introduced by Hochreiter and Schmidhuber in 1997 [37] and represent an extension of the recurrent neural networks (RNNs), allowing for a better assessment of the time-dependencies in long sequential data. Nowadays, even if LSTM recurrent neural networks represent the state of art in natural language processing and speech recognition problems [38], no studies applying these recurrent neural networks to the muscle activity detection problem have been published in the literature.

Due to the time-series nature of the sEMG signals, LSTM recurrent neural networks could be applied for identifying the muscle activity time-instants without relying on the selection and extraction of heuristic features from sEMG signals.

This contribution aims at assessing the applicability of a novel approach for muscle activity detection, based on LSTM recurrent neural networks, specifically developed to overcome the main limitations of the standard approaches. The performance of the LSTM-based Muscle Activity Detector (LSTM-MAD) is evaluated and compared against two of the most widely used approaches: a standard approach (single-threshold detector TKEO) [29, 30] and a statistical approach (double-threshold statistical detector Stat) [27] in terms of precision, recall, F1-score, Jaccard similarity index, and onset/offset bias both on simulated and real sEMG signals.

Methods

First, a dataset of simulated sEMG signals was built to assess the applicability of the LSTM-based approach to muscle activity detection and to compare its performance against the two muscle activity detectors TKEO [29, 30] and Stat [27]. Second, we further compared the performance of the three detectors on simulated sEMG signals, specifically highlighting the effect of the SNR of sEMG signals. Finally, LSTM-MAD was applied also to real sEMG signals, acquired from lower limb muscles during gait, to emphasize its advantages also in real contexts. Figure 1 represents the block diagram of the procedure followed in this study. Each block is described in the following paragraphs.

Simulated data

The sEMG signals acquired during cyclic movements, such as walking, cycling, and running, can be modeled by the superimposition of two different contributions: (i) the electrical activity (s) generated by each muscle during the contraction and (ii) the background noise ($n$) mainly generated by the neighboring muscles, the features of the electrode–skin interface, and the acquisition system electronics. Under the hypothesis of cyclic contractions, the sEMG signal can be defined as a cyclostationary process [39] and, therefore, described through the superimposition of two different stationary processes [28]:

i.
The muscle activity ($s$) modeled as a Gaussian process with zero-mean and variance ${\sigma }_{s}^{2}$, as described in (1):
$$s\left(t\right)\in N\left(0,{\sigma }_{s}^{2}\right)$$
(1)
where ${\sigma }_{s}$ was set equal to ${10}^{({SNR}/20)}\cdot 1\,\upmu V$;
ii.
The background noise ($n$) modeled as a zero-mean Gaussian process with variance ${\sigma }_{n}^{2},$ as described in (2):
$$n\left(t\right)\in N(0,{\sigma }_{n}^{2})$$
(2)
where ${\sigma }_{n}$ was set equal to $1\mu V$.

Each realization of the muscle activity process $s\left(t\right)$ was simulated assuming a time period of 1 s (i.e., the gait cycle duration) and a sampling frequency of 1 kHz [28]. Physiological muscle activity was modeled by time-windowing the Gaussian process $s(t)$ through a single truncated Gaussian function centered at 50% of the gait cycle [28]. Different standard deviations (σ) and time supports ($2\alpha \sigma$) of the truncated Gaussian function have been considered to simulate sEMG signals similar to those observed in leg, thigh, and trunk muscles during gait. More specifically, three different values of the standard deviation ($\sigma$ = 50, 100, and 150 ms) and four different values of the time support $2\alpha \sigma$ (with $\alpha$ = 1, 1.5, 2, and 2.4) have been tested [27]. Then, the background noise process ($n\left(t\right)$) was added. Nine different values of Signal-to-Noise Ratio (SNR) were simulated (SNR = 3, 6, 10, 13, 16, 20, 23, 26, and 30 dB) [27]. For each triplet of $\sigma$, $\alpha$, and SNR values, 100 different realizations have been simulated and, therefore, a dataset composed by 10,800 different realizations (3 standard deviations × 4 time supports × 9 SNRs × 100 realizations) was built.

The simulated sEMG signals were then band-pass filtered through a 4^th order Butterworth digital filter with a lower cut-off frequency of 10 Hz and a higher cut-off frequency of 450 Hz [40]. Figure 2 represents an example of a simulated sEMG signal with the superimposition of the truncated Gaussian function $s(t)$ used to model the physiological muscle activity.

The time-instants relative to each simulated muscle activity ($s$) were defined by a binary mask (${y}_{Sim}$) that was set equal to 1 in correspondence of the time-instants in which the truncated Gaussian assumed values higher than 0, and it was set equal to 0 otherwise.

Real data

Gait data acquired from 20 subjects were retrospectively analyzed to test the performance of the three different approaches when applied to real sEMG signals. Subjects were randomly selected from our database to include both healthy individuals and patients affected by neurological or orthopedic pathologies, during walking tasks [8, 41, 42]. This non-homogeneous group of subjects was specifically chosen to verify that the algorithm works under different conditions. In particular, eight out of 20 subjects were healthy adults (healthy age: 38.0 ± 13.1 years, height: 164.9 ± 5.4 cm, weight: 65.4 ± 21.2 kg) [8], six were patients after unilateral Total Hip Arthroplasty (THA age: 73.8 ± 8.4 years, height: 175.5 ± 7.6 cm, weight: 86.8 ± 16.3 kg) [41], and the other 6 were patients affected by idiopathic normal pressure hydrocephalus (NPH age: 75.7 ± 6.3 years, height: 170.5 ± 6.3 cm, weight: 72.5 ± 10.4 kg) [42].

Gait data were recorded through a multichannel acquisition system (STEP32, Medical Technology, Italy) [8, 43, 44]. SEMG signals were acquired through active probes (configuration: single differential, size: 19 mm × 17 mm × 7 mm, Ag-disks diameter: 4 mm, interelectrode distance: 12 mm, gain: variable in the range from 60 to 86 dB) placed over the following 4 muscles of the lower limb: rectus femoris (RF), lateral hamstring (LH), lateral gastrocnemius (LGS), and tibialis anterior (TA). Active probes were positioned according to the guidelines suggested by Winter [45]. Details on the sEMG sensor placement are described in a previous work by Agostini et al. [44]. The dominant lower limb (i.e., the leg used to kick the ball or to start walking) was analyzed for healthy subjects, while the clinically most affected limb was selected for pathological patients. Before applying the tested detectors, the acquired sEMG signals were band-pass filtered through a 4^th order Butterworth digital filter with a lower cut-off frequency of 10 Hz and a higher cut-off frequency of 450 Hz [40].

For each subject, 5 gait cycles were randomly selected from the whole walking task to build the real dataset. Therefore, a dataset composed of 400 different sEMG signals (20 subjects × 5 gait cycles × 4 muscles) was obtained. The time-instants relative to each real-muscle activation were visually segmented by three experienced operators through a custom MATLAB® GUI (graphical user interface). More specifically, a binary mask (${y}_{Real}$) was set equal to 1 in correspondence of the time-instants in which the majority of the expert operators (at least two out of three) detected a muscle activity and to 0 otherwise.

Standard approach: Teager–Kaiser Energy Operator (TKEO)

One of the most commonly applied standard approaches for muscle activity detection is the Teager–Kaiser Energy Operator (TKEO), which has been demonstrated to increase the accuracy of the simple linear envelope approach [29, 30].

More specifically, a single-threshold was applied to the sEMG signals after the computation of the TKEO (ψ), defined as in (3):

$${\psi }_{x(n)}= {x(n)}^{2}-x\left(n+1\right)x(n-1)$$

(3)

where $x$ represents the sEMG time-series and $n$ the sample number. The single threshold was defined as described in (4):

$$Th= {\mu }_{n}\pm j\times {\sigma }_{n}$$

(4)

where ${\mu }_{n}$, ${\sigma }_{n}$ and $j$ represent the mean of the background noise, the standard deviation of the background noise, and a multiplicative constant, respectively. In this study, the constant $j$ was set equal to 7 as suggested in [29, 46]. Since the average (${\mu }_{n}$) and the standard deviation (${\sigma }_{n}$) of the background noise are required as inputs of this approach, the time-instants corresponding to the noise were automatically selected considering those that were simulated or segmented as background noise for the simulated and real sEMG signals, respectively.

The output of this detector was finally defined as a binary mask (${\widehat{y}}_{TKEO}$) that was defined as follows:

${\widehat{y}}_{TKEO}$ = 1, if ${\psi }_{x(n)}$ ≥ $Th$;
${\widehat{y}}_{TKEO}$ = 0, if ${\psi }_{x(n)}$ <$Th$.

Statistical approach: double threshold statistical detector (Stat)

The statistical approach used in this study is the double-threshold statistical detector proposed in [27]. The principal computation steps are the following:

i.
An auxiliary sequence ${z}_{i}$ is computed from the sEMG signals as the sum of the squared values of two successive samples (5)
$${z}_{i}= {x}_{i}^{2}+{x}_{i+1}^{2}$$
(5)
where ${x}_{i}$ and ${x}_{i+1}$ represent two consecutive samples of the sEMG time series;
ii.
A first (amplitude) threshold ζ is applied on a sliding detection window defined by m consecutive samples of the auxiliary sequence ${z}_{i}$;
iii.
Muscle activation is detected if at least ${r}_{0}$ (temporal threshold) out of m consecutive samples of the sliding detection window are equal to or above the first threshold ζ.

In this study, the length of the observation window (m) was set equal to 5, while the temporal threshold ${r}_{0}$ was set equal to 1 [27].

The output of this detector was finally defined as a binary mask (${\widehat{y}}_{Stat}$) that was defined as it follows:

${\widehat{y}}_{Stat}$ = 1, if ${z}_{i}$ ≥ ζ for at least ${r}_{0}$ out of m samples;
${\widehat{y}}_{Stat}$ = 0, otherwise.

Deep learning approach: LSTM-MAD

An LSTM recurrent neural network model is generally composed of the following architecture:

i.
An input sequence layer;
ii.
One or more LSTM layers used to learn the time-dependencies within the sequential data;
iii.
A fully connected layer used to convert the output size of the previous layers into the number of classes to be recognized;
iv.
A softmax layer used to compute the belonging probability to each class;
v.
A classification output layer used to compute the cost function.

In this study, several LSTM recurrent neural network models were tested to assess the applicability of the proposed approach for muscle activity detection, considering direcly sEMG signals, without any feature extraction step. To define the best LSTM model for muscle activity detection, each of the two datasets of simulated and real sEMG signals was divided into 3 different sets: training set (70%), validation set (15%), and test set (15%), respectively. The training set was used to train the LSTM models, while the validation set (or development set) was used to evaluate the network performance and to avoid the overfitting of the training data. More specifically, the validation set was used to stop training automatically when the validation accuracy stopped increasing to avoid overfitting [35]. Finally, the test set was used for the final validation and the comparison of LSTM-MAD with the other two detectors.

Using the Deep Learning Toolbox of MATLAB® release R2020b (The MathWorks Inc., Natick, MA, USA), 720 different LSTM models were tested. All the LSTM models had a sequence input layer consisting of 1 unit (i.e., the dimension of a single simulated or real sEMG signal) and a fully connected output layer consisting of 2 units (i.e., the number of classes to be recognized). Different numbers of LSTM hidden layers (n), numbers of hidden units for each LSTM hidden layer (${n}_{units}$), learning rate ($\alpha$) values, and drop period ($\delta$) values were tested to achieve the LSTM architecture with the highest performance. More specifically, two different number of LSTM hidden layers ($n$ = 1 and 2), nine different numbers of hidden units for each LSTM hidden layer (${n}_{units}$= 100, 125, 150, 175, 200, 225, 250, 275, and 300), five different learning rates ($\alpha$= 0.01, 0.015, 0.02, 0.025, and 0.03), and eight different drop rate values ($\delta$ = 10, 15, 20, 25, 30, 35, 40, and 45) were tested [35]. The adaptive moment (ADAM) optimization algorithm was adopted in this work to train all the tested LSTM models [47]. The performance of each LSTM model was assessed considering the simulated (or real) test set by computing the overall classification accuracy, defined as the number of correctly classified sEMG samples normalized to the total number of sEMG samples within the test set.

The training process was performed on a workstation with a 3.2 GHz six-core CPU, 32 GB of RAM memory, and a 64-bit Windows operating system.

The ${y}_{Sim}$, extracted from the truncated Gaussian functions, and the ${y}_{Real}$, manually defined by the expert operators through the MATLAB® GUI, were used as target (or ground truth) to train and test each LSTM model for the simulated and real datasets, respectively.

The output of the LSTM approach was computed as a binary mask (${\widehat{y}}_{LSTM-MAD}$) that was defined as it follows:

${\widehat{y}}_{LSTM-MAD}$ = 1, if the sEMG time-instant was classified as muscle activity (class 1);
${\widehat{y}}_{LSTM-MAD}$ = 0, if the sEMG time-instant was classified as background noise (class 0).

Post-processing

A post-processing step was applied to the output of each detector (i.e., standard, statistical, and deep learning approach) to reject the erroneous transitions due to the stochastic nature of the sEMG signal. Since it is generally accepted that a muscle activation shorter than 30 ms does not affect the kinetics and the kinematics of gait [48], all the muscle activations lasting less than 30 ms were discarded [27]. Figure 3 illustrates this concept. In particular, Fig. 3A shows a sample realization of a simulated sEMG signal modulated by a truncated Gaussian function (SNR = 16 dB, σ = 100 ms, and α = 1.5). Figure 3B represents the output of the standard approach (${\widehat{y}}_{TKEO}$) without any post-processing step, while Fig. 3C shows the effect of the post-processing on the detector’s output.

Performance evaluation

The muscle activations detected by the three different approaches (${\widehat{y}}_{TKEO}$, ${\widehat{y}}_{Stat}$, and ${\widehat{y}}_{LSTM-MAD}$) were quantitatively compared against the ground truth in terms of precision, recall, F1-score, Jaccard similarity index, and onset/offset bias. More specifically, the indexes were defined as it follows:

$$precision= \frac{TP}{TP+FP}$$

(6)

$$recall= \frac{TP}{TP+FN}$$

(7)

$$F1-score= \frac{2 \times (recall \times precision)}{(recall+precision)}$$

(8)

$$Jaccard= \frac{\left|{\widehat{y}}_{i}\cap y\right|}{\left|{\widehat{y}}_{i}\cup y\right|}$$

(9)

$$Bias= \frac{1}{N}{\sum }_{i=1}^{N}\left({\widehat{t}}_{i}-t\right)$$

(10)

where $TP$ represents the True Positive (i.e., number of sEMG time-instants correctly classified by the detectors as muscle activity), $FN$ describes the False Negative (i.e., number of sEMG time-instants incorrectly classified by the detectors as background noise), and $FP$ represents the False Positive (i.e., number of sEMG time-instants incorrectly classified by the detectors as muscle activity). $y$ represents the binary mask computed by the $i$ th detector ($i$ = 1: standard approach with TKEO, $i$ = 2: statistical approach with the double-threshold statistical detector, $i$ = 3: novel approach with LSTM-MAD), and ${\widehat{y}}_{i}$ represents the ground truth of the simulated (or real) test set. Finally, $N$ is the number of estimates, ${\widehat{t}}_{i}$ is the estimate of the onset/offset time-instants obtained through the $i$ th detector, and $t$ represents the true onset/offset timings.

Effect of SNR on muscle activity detection

The effect of sEMG signals’ SNR on the performance of the three tested muscle activity detectors was assessed considering the simulated test set by computing the performance parameters above-mentioned, separately for each of the nine SNR values (SNR = 3, 6, 10, 13, 16, 20, 23, 26, and 30 dB).

Statistical analysis

The hypothesis of normality of the distribution of the computed performance parameters was tested through the Lilliefors test (“lillietest” MATLAB® function) setting the significance level (α) at 0.05. If the normality hypothesis was satisfied, one-way analysis of variance (ANOVA) for repeated measures (α = 0.05) was performed to assess significant differences in the performance of the three tested approaches and to test the effect of SNR on detectors’ performance, otherwise the Friedman’s test (α = 0.05) was implemented. Then, post-hoc analysis with Tukey’s adjustment for multiple comparisons was performed. The effect size of the statistically significant differences was calculated through the Hedges' g statistic [49] including the correction for small sample sizes. The statistical analysis was performed using the Statistical and Machine Learning Toolbox of MATLAB® release R2020b.

Results

First, we present the results supporting the applicability of the LSTM-based approach for muscle activity detection, considering only simulated sEMG signals. Second, we further compare the performance of the three tested approaches (standard, statistical, and deep learning approach) on simulated sEMG signals, highlighting the effect of the SNR. Finally, we present the architecture and the performance of the LSTM-MAD model applied on the real sEMG signals.