Activities of daily living with bionic arm improved by combination training and latching filter in prosthesis control comparison

Background Advanced prostheses can restore function and improve quality of life for individuals with amputations. Unfortunately, most commercial control strategies do not fully utilize the rich control information from residual nerves and musculature. Continuous decoders can provide more intuitive prosthesis control using multi-channel neural or electromyographic recordings. Three components influence continuous decoder performance: the data used to train the algorithm, the algorithm, and smoothing filters on the algorithm’s output. Individual groups often focus on a single decoder, so very few studies compare different decoders using otherwise similar experimental conditions. Methods We completed a two-phase, head-to-head comparison of 12 continuous decoders using activities of daily living. In phase one, we compared two training types and a smoothing filter with three algorithms (modified Kalman filter, multi-layer perceptron, and convolutional neural network) in a clothespin relocation task. We compared training types that included only individual digit and wrist movements vs. combination movements (e.g., simultaneous grasp and wrist flexion). We also compared raw vs. nonlinearly smoothed algorithm outputs. In phase two, we compared the three algorithms in fragile egg, zipping, pouring, and folding tasks using the combination training and smoothing found beneficial in phase one. In both phases, we collected objective, performance-based (e.g., success rate), and subjective, user-focused (e.g., preference) measures. Results Phase one showed that combination training improved prosthesis control accuracy and speed, and that the nonlinear smoothing improved accuracy but generally reduced speed. Phase one importantly showed simultaneous movements were used in the task, and that the modified Kalman filter and multi-layer perceptron predicted more simultaneous movements than the convolutional neural network. In phase two, user-focused metrics favored the convolutional neural network and modified Kalman filter, whereas performance-based metrics were generally similar among all algorithms. Conclusions These results confirm that state-of-the-art algorithms, whether linear or nonlinear in nature, functionally benefit from training on more complex data and from output smoothing. These studies will be used to select a decoder for a long-term take-home trial with implanted neuromyoelectric devices. Overall, clinical considerations may favor the mKF as it is similar in performance, faster to train, and computationally less expensive than neural networks.

Background Modern prosthetic hands can now recreate the complex movements of the human hand [1][2][3][4][5][6][7][8]. Unfortunately, unintuitive control makes prostheses difficult to use [9] and is a major factor in prosthesis dissatisfaction and abandonment [10]. Neural and electromyographic (EMG) signals from residual muscles provide a rich source of movement information that could be used to provide intuitive prosthesis control. With rare exceptions [11,12], commercial prosthetic control uses only two-electrode EMG setups that provide sequential control of up to two degrees-of-freedom [1,3,5,6,13,14], far inferior to the abilities of modern prosthetic hands.
Using more neuromuscular inputs can provide more intuitive prosthesis control through richer, more diverse data that can be classified into different movements based on residual neuromuscular activation patterns. Several groups have demonstrated intuitive prosthesis control with decoders that classify distinct movements [15][16][17][18][19][20][21][22]. Generally, these decoders classify EMG patterns to several pre-determined grip patterns (e.g., close hand, open hand, pinch). More advanced classification strategies incorporate proportionality into the classes (e.g., enable partial hand closure) [23] or allow simultaneously active classes (e.g., simultaneously rotate wrist and close hand) [18]. Classifiers can be limited by allowing the user only a predetermined, fixed number of movement types.
Continuous decoders provide another approach to intuitively control a prosthesis. Although nomenclature varies in the literature, here we refer to a continuous decoder as a decoder that predicts kinematic positions anywhere within some movement window (i.e., an infinite number of positions) and learns from continuous kinematic values, not classes with predefined kinematic values. We choose not to refer to continuous decoders as regression-based because not all continuous decoders use regression [24,25]. Instead of distinct classes of movements, continuous decoders allow independent and simultaneous movement of individual degreesof-freedom (e.g., finger or wrist movements), allowing the user to potentially produce any grip pattern. Continuous decoders have been used to provide prosthesis control with linear [26][27][28] and nonlinear [29][30][31] algorithms. Nonlinear algorithms generally take longer to train and are subject to overfitting; however, they may capture the nonlinear nature of multi-dimensional EMG data [32], especially for complex movement patterns. Recently, hybrid decoders have combined classification and regression control with promising results [33,34].
Kalman filters, and related variations, provide a linear approach to continuous control that has been used with cortical [35] and peripheral [26,36] recordings. Multilayer perceptrons (MLPs), a relatively simple artificial neural network, have been used for both classification [32,37,38] and continuous [29,[39][40][41] control. Convolutional neural networks (CNNs), which, in the prosthetic domain, convolve temporally over their input, have also been used for classification [42,43] and continuous [30,31] control. CNNs temporal convolution enables them to learn features from raw EMG data [31,42], whereas other strategies typically use time-domain features, such as the mean absolute value of segmented EMG.
Beyond the algorithm itself, the data used to train the algorithm affects prosthesis control. Training an algorithm in multiple arm positions can improve prosthesis control whether the position data is used to train the algorithm [32,44] or not [45], although incorporating positional data may not benefit linear algorithms [46]. For continuous controllers without a discrete set of classes, using training data with simultaneous movements of more than one degree-of-freedom may expose nonlinearities in EMG data and preferentially benefit nonlinear algorithms with greater learning capacities.
Modifying a continuous decoder's output can improve prosthesis control by improving stability. Smoothing filters, such as a traditional low-pass filter can smooth the output and reduce unwanted jitter [47]. One study incorporated a camera system to selectively low-pass filter the output when a grasp was detected [48]. Low-pass filters inevitably introduce delay into the control. Nonlinear smoothing filters can reduce small-amplitude jitter without slowing larger movements [49,50].
Prosthesis control improvements are generally demonstrated sequentially in three domains: offline, online, and real-world. Offline comparisons measure the ability of a decoder to map EMG inputs to kinematic outputs, without the user in the loop. Several offline comparisons have helped improve the state of the art for both continuous [41] and classification [17,18,32,43,[51][52][53] control. Although they are a useful first step in improving control, offline improvements do not necessarily imply online improvements [54], largely because they do not incorporate the continuous interaction of the user with Keyword: Prosthetic control, Activities of daily living, Neuroprosthetics, Neural prostheses, Bionic arm, DEKA LUKE arm, Electromyography (EMG) the controller. Online improvements incorporate user feedback and have been demonstrated through Fitts' law or virtual target tasks. Several online comparisons have demonstrated differences among control strategies [28,30,31,37,42,[54][55][56][57][58]. The relationship of online performance to real-world performance is debated [59][60][61]. Comparing decoders through real-world activities of daily living provides the most realistic approximation of how a decoder would function in everyday life. Realworld comparisons have demonstrated the benefits of continuous [62,63], classifier [20,50,[64][65][66][67], and hybrid [34] control strategies over commercial control; however, only a couple of comparisons have been made among research-grade control strategies [50,67], both of which were classifiers.
In the present work, we conduct a two-phase realworld comparison of 12 continuous decoders that span the three major components affecting control (training data, algorithms, and smoothing). In the first phase, we compare training an algorithm with individual finger or wrist movements vs. combination movements (e.g., simultaneous flexion of all fingers to make a functional grasp) and compare the effect of using a nonlinear smoothing filter [49]. We study these conditions in a two-degree-of-freedom decode using advanced, previously published, continuous control decoders: a modified Kalman filter (mKF) [36], an MLP [29], and a CNN [30]. All variations together yield 12 different decoders for comparison. In the second phase, we performed a more extensive evaluation of five-degree-of-freedom decodes with the mKF, MLP, and CNN using the combination training and smoothing filter found most beneficial in phase one. We focus on both performance-based objective and user-centric subjective measures. To our knowledge, this study provides the first comparison of the same individuals using multiple continuous decoders and a physical prosthesis to complete real-world tasks. A large part of our motivation for this study is to identify the "best" decoder among the advanced decoders we have previously published, in part because the results of this study will guide our clinical translation of advanced prosthesis control in a long-term take-home clinical trial.

Signal acquisition and prosthesis setup
Signal acquisition has been described previously [36]. In brief, surface EMG (sEMG) was collected from a sEMG sleeve [68] with the 512-channel Grapevine System (Ripple Neuro LLC, Salt Lake City, UT). Thirty-two singleended channels were acquired at 1 kHz and filtered with a 6th-order high pass Butterworth filter (15 Hz), 2ndorder low-pass Butterworth filter (375 Hz), and 60, 120, and 180 Hz notch filters. After connecting the sEMG sleeve to the acquisition device, channels were manually inspected and removed if shorted channels were detected (generally less than two channels). The differential pairs of all monopolar channels were calculated, and features (single-ended and differential) were created at 30 Hz using the mean absolute value of a 300-ms buffer (i.e., 528 features from an overlapping 300-ms boxcar filter). At 30 Hz, the buffer is updated every 33 ms, resulting in a 266-ms overlap between features. This update rate and buffer length has been used by our group extensively with various decoders [30,36].
Non-amputee participants donned the DEKA LUKE Arm [1] with a bypass socket enabling non-amputee use of a prosthetic arm, which has been shown to be representative of use by individuals with amputation [69]. The Fig. 1 Prothesis setup and phase one task a The bypass socket and sEMG sleeve enable non-amputees to complete real-world tasks with prosthetic systems. b The clothespin relocation task tests prosthesis control with simultaneously active grasping and wrist movements, typical of many activities of daily living Paskett et al. J NeuroEngineering Rehabil (2021) 18:45 prosthesis, bypass socket, and sEMG sleeve are shown in Fig. 1a. The bypass socket, which has a freely-rotating wrist attachment, had rubber bands providing wrist rotation resistance (0.34 Nm to 0.85 Nm at rest and maximal rotation, respectively). Baseline EMG activity was subtracted from each feature before collecting training data by subtracting the average EMG activity during a 10-s period where the user supported the weight of the bypass socket and DEKA LUKE Arm in a neutral position approximately waist height. Training data were recorded as participants mimicked the pre-programmed movements of the DEKA LUKE Arm. The kinematics from the pre-programmed movements were used to fit the algorithms described in the next section.

Decode algorithms
The mKF was implemented as has been demonstrated previously [36]. A Gram-Schmidt orthogonalization algorithm [70] was used to select 48 features as inputs for the mKF. Using 48 features was empirically found to be an effective number in [49]. Prior to selecting features, we temporally aligned the kinematics and features by maximizing the average correlation between every feature and kinematic. The 48 EMG features and kinematics were used to fit the parameters of a linear Kalman filter [25]. The post-hoc thresholds which were used to modify the Kalman filter online [36] were optimized offline [36]. The feature selection, Kalman filter fitting, and threshold optimization were completed using all the training data (i.e., no training and validation split). The mKF was trained, and the threshold optimized using MATLAB 2018b on a Windows 10 computer with 64 GB RAM and an Intel Xeon E5-2620v3. The same computer was used for runtime predictions.
The nonlinear MLP was implemented as has been demonstrated previously [29]. The MLP consisted of two hidden layers, each with 128 nodes, and a hyperbolic tangent (tanh) activation layer. The training and testing data were split 80% and 20%, respectively. As with the mKF, the MLP used 48 input features as chosen by the same stepwise Gram-Schmidt orthogonalization algorithm to provide cleaner feature data. Prior to selecting features, we temporally aligned the kinematics and features by maximizing the average correlation between every feature and kinematic. The MLP trained for three dataset aggregation iterations, each with ten epochs (found sufficient in [29]). The MLP was trained in Python 3.6 using TensorFlow 1.13 on a Windows 10 computer with 64 GB RAM and an Intel Xeon E5 1650v3 (not trained on a GPU). The same computer was used for runtime predictions.
The nonlinear CNN was implemented as has been demonstrated previously [30]. The CNN, comprising eight total layers, used all 528 features to fit the kinematic values. 75% of the data was used to train the algorithm, and the remaining 25% was used as a validation set. The network trained until the loss on the validation set was equal to or larger than the previously smallest loss for five iterations. The CNN was trained using MATLAB 2018b on an NVIDIA Quadro M4000 8 GB graphics card. The system ran on Windows 10 and had 128 GB RAM and an Intel Xeon E5-2620v4 processor. The trained network was transferred to a Windows 10 computer with 64 GB RAM and an Intel Xeon E5-2620v3 processor for runtime predictions.
After training, users were given position control of the prosthesis. In position control, the prosthesis returns to its rest position if the user is not actively making a movement. In velocity control (the counterpart of position control), the prosthesis does not move when the user rests. For tasks requiring multiple active movements, position control forces the user to complete both movements simultaneously, whereas velocity control allows the user to complete these movements sequentially.

Smoothing filter
A smoothing filter was implemented as has been demonstrated previously [49]. The "latching filter" we used is a computationally inexpensive, recursive, nonlinear filter that smooths small-amplitude jitter but allows quick changes to its output. The latching filter nonlinearly adjusts its level of smoothing based on the difference between the previous, smoothed decoder estimate and the current decoder estimate to determine how much to move in the direction of the current decoder estimate (i.e., how much to smooth the transition from the previous estimate to the new estimate). For small estimate changes, the output is heavily smoothed, resulting in stability for low-amplitude movements. For large estimate changes, the output is nearly unsmoothed, resulting in quick movements. The latching filter makes the reasonable assumption that jitter amplitude is small relative to the movement range and that jitter occurs more frequently than do intended small movements.

Study overview
We conducted this study in two phases (Fig. 2a). In the first phase, we compared training paradigms and the use of a smoothing filter for all three algorithms, totaling 12 decoders: three algorithms, two training paradigms, and two output types. Due to the number of decoders and the consequential experiment duration, we selected a simple task, the clothespin relocation task, with a low, two degree-of-freedom control strategy (grasp and wrist rotation), which simplified the training and allowed the experiments to be completed in 2-3 h.
In phase two, we increased the decoder output to five degrees-of-freedom to evaluate the three algorithms' control in a high-degree-of-freedom setting representative of modern myoelectric prostheses. The degrees-offreedom used were: (1) thumb flexion, (2) index finger flexion, (3) middle, ring, and little finger flexion (which are mechanically coupled on the DEKA LUKE Arm), (4) wrist flexion, and (5) wrist rotation. We used the results of the first phase to inform the training paradigm and smoothing filter use. As we were comparing only three algorithms in the second phase, we were able to perform four tasks (fragile egg, zipper, pouring, and folding) in a 2-3-h experimental session.

Phase one: training type and output smoothing comparison Training paradigms
Ten participants (24 ± 4 years [mean ± SD]; seven righthanded, three left-handed; nine male, one female) with no prior prosthesis experience mimicked the DEKA LUKE Arm movements in three training periods to provide training data for the control algorithms. The two-degree-of-freedom decode training consisted of opening and closing the hand as well as rotation of the wrist. Prior to each training period, study administrators coached the participants through one trial of each movement, to acquaint the participants with mimicking the   Fig. 2 a Overview of the two-phase functional comparison of 12 prosthesis control strategies. In phase one, we compared collecting training data with individual movements vs. combination movements and compared using raw algorithm output vs. nonlinear output smoothing with a latching filter for an mKF, MLP, and CNN in a clothespin relocation task. In phase two, we compared the mKF, MLP, and CNN with fragile egg, zipping, pouring, and folding tasks. b Example combination movement, where two degrees-of-freedom are simultaneously active during the middle portion of the trial movements. In the first training period, users completed 12 trials of individual movements: close hand, open hand, pronate, supinate. Each individual movement consisted of a 0.7-s rise time, 3-s hold, and 0.7-s return to rest position (for a total of 4.4 s per movement). The second training period repeated the first period; however, only four trials of each movement were completed. The first and second training periods took 5 and 1.5 min, respectively.
The third period introduced combination movements, where both degrees-of-freedom were simultaneously active for the middle portion of the trial (see Fig. 2b  To compare individual and combination training paradigms, the training sets were merged into two sets of equal duration before being used to train the decoder (so that any improved performance would not be due to the decoder training on more data). The first and second training periods were combined to produce an individual-movement-only set. The second and third training periods were combined to produce a set with individual and combination movements. After merging the datasets, the mKF, CNN, and MLP were trained on the kinematic and EMG data for continuous control.

Modified clothespin relocation task
In the modified clothespin relocation task (Fig. 1b), participants transferred three large clothespins from a horizontal bar to a vertical bar on the side of the dominant hand, referred to as the "inside-out" task [71]. The task was chosen for several reasons: its simplicity allows the user to quickly learn and master the task, mitigating effects due to learning; its short duration allows several trials to be completed rapidly, which was necessary for comparing 12 decoders; and, it necessarily requires simultaneously active degrees-of-freedom, typical of many daily activities. We have used this task with the DEKA LUKE Arm previously [72].
Participants started a timer with the prosthetic hand, transferred a single clothespin, and stopped the timer with the prosthetic hand. A trial was considered failed when a clothespin was dropped. Participants were instructed that the success rate (proportion of successfully transferred clothespins) and the transfer time would be recorded. There was no time limit for the task. Participants completed nine trials (three sets of three) with each decoder. The order of the control paradigm was randomized among participants to prevent an order effect. After completing nine trials with a particular decoder, participants completed the NASA Task Load Index (TLX) subjective workload survey [73].
Participants completed the task for 12 total decoders, including each possible combination of training paradigm (individual and combination), algorithm (mKF, MLP, and CNN), and output type (raw and smoothed).

Analysis
We compared the effect of training paradigm and smoothing filter for each algorithm with α = 0.05. The data were significantly non-normal for several conditions (Shapiro-Wilk test [74]), so nonparametric statistics were used. Median completion times for successful trials were calculated for each condition for each participant (N = 10 participants). The number of drops was calculated for each condition for each participant. For the training paradigm comparison, the data were aggregated to compare individual and combination trainings with Wilcoxon's signed-rank test [75] for each algorithm; each group contained both smoothed and raw output types (N = 20; two medians for each participant). For the output type comparison, the data were aggregated to compare raw and smoothed outputs with Wilcoxon's signed-rank test for each algorithm; each group contained both individual and combination trainings (N = 20; two medians for each participant). Because the hypotheses were pre-planned, no correction for multiple comparisons was applied [76].
We compared the extent that the algorithms predicted simultaneously active degrees-of-freedom. Because this was a post-hoc analysis, we did not have digital markers indicating when a clothespin trial was started. Instead, we examined the proportion of time an algorithm predicted hand movement and wrist rotation simultaneously, divided by the total time the hand or wrist was active. We determined the threshold for "active" movements by the range of movements needed to complete the clothespin task. The wrist needed to rotate a minimum of 35% of its movement window to place the clothespin, and the hand needed to close a minimum of 20% of its movement window to hold the clothespin. We counted only simultaneous movements over 1 s in duration to exclude potentially sporadic predictions that were likely not part of the task completion. We used the median proportion of time each algorithm predicted simultaneous movements for the algorithm comparisons. The medians across participants were distributed normally (Shapiro-Wilk test [74]), so we conducted a one-way ANOVA [77] to determine if there were significant differences among the algorithms. If the ANOVA revealed differences, individual comparisons were made using Tukey's Honestly Significant Difference procedure [78].

Phase two: algorithm comparison Training paradigm
In the second phase, we extended the degrees-of-freedom from two to five to assess the algorithms' performance in more complex, biologically realistic conditions. Users controlled the following degrees-of-freedom: flexion/extension of the thumb, flexion/extension of the index finger, flexion/extension of the middle, ring, and little fingers (which are mechanically coupled on the DEKA LUKE Arm), flexion/extension of the wrist, and pronation/supination of the wrist. Study administrators coached the participants through a single trial of each movement type before each training period. The training for phase two consisted of four distinct training periods, each 5 min long, for a total of 20 min.
Nine participants (30 ± 14 years [mean ± SD]; seven right-handed, two left-handed; seven male, two female) with some prior prosthesis experience (2-3 h) mimicked the prosthesis through movements of individual degreesof-freedom. Six participants from phase one had completed the phase one study; however, not all the previous participants were able return for the phase two study. The remaining three had completed other prosthesis studies with our group. For the first training period, movements consisted of a 0.7-s rise time, 5.2-s hold time, 0.7-s fall time (for a total of 6.6 s per movement). Participants mimicked four trials on each degree-of-freedom. For the second through fourth training periods, combination movements were introduced.
The second period consisted of grasp movements (simultaneous flexion of the thumb, index, and middle, ring, and little degrees-of-freedom) combined with wrist movements (flexion, extension, pronation, and supination). Participants completed the first grasp movement alone, with a 0.7-s rise time, 5.2-s hold time, and 0.7-s fall time (6.6 s total). Participants then completed grasp and wrist combination movements, where both the grasp and wrist movement were simultaneously active in the middle portion of a trial (Fig. 2b). In these combination trials, the grasp movement rose to its intended position in 0.7 s, held for 5.2 s, and returned to its original position in 0.7 s. The wrist movement started 0.4 s after the first movement reached its intended position, moved to its intended position with a 0.7-s rise time, held for 3.0 s, and returned to its original position in 0.7 s. These combination movements were completed for each wrist movement (wrist flexion, extension, pronation, and supination). Then, they were completed in the reverse order, where the wrist movement occurred first, and the grasp movement occurred part-way through the wrist movement. We chose to complete the movements in both orders due to the CNN convolving over time. Participants completed four trials of each sequence, resulting in 36 total trials.
Periods three and four repeated the second period; however, the grasping movement was replaced with opening the hand (extending the thumb, index, and middle, ring, and little digits) and a pinch movement (flexing thumb and index digits), respectively. After merging the datasets from the four training periods, the mKF, CNN, and MLP were trained on the kinematic and EMG data for continuous control.

Activities-of-daily-living
In the second phase, participants completed four tasks representative of activities of daily living in the following order: moving a fragile egg, using a zipper, pouring, and folding a towel (Fig. 3). A time limit was specified for each task. The order of the three decode algorithms (mKF, CNN, and MLP) was randomized for each participant to avoid order effects, with the same random order being used within each of the four tasks. After completing a task with each algorithm, participants ranked the algorithms in order of preference. Examples of the tasks are included in the supplementary material (Additional file 1: Video S1). Each task is now discussed in greater detail.

Fragile egg task
In the fragile egg task, participants moved a mechanical "fragile egg" [79,80] horizontally 15 cm from one side of a 6.25-cm vertical barrier to the other within a 45-s time limit, beginning on the side of the dominant hand. This task evaluated the dexterity provided by each decode algorithm in that if too much force was applied, the mechanical egg would "break" and emit an audible click. The participants were instructed to use a two-finger pinch to pick up the mechanical egg. The breaking force for these experiments was set at about 20 N, and the device mass was about 615 g (0.03 N/g).
For each trial, participants started a timer with the prosthetic hand, transferred the mechanical egg, and stopped the timer with the prosthetic hand. A trial was considered failed if the mechanical egg was broken or dropped. Participants were instructed that both the success rate (the proportion of trials in which a transfer was successful) and the transfer time would be recorded. When introduced to the task, participants were given two minutes of practice with each decode algorithm to decrease the learning effects in the recorded trials. Participants attempted 15 trials with each decode algorithm. Immediately before the recorded trials commenced, participants were given one final practice trial with the given decode algorithm. A verbal warning alerted the participants when there were 15 s remaining. After attempting the 15 recorded trials with a particular algorithm, participants completed a NASA-TLX survey.

Zipper task
In the zipper task, participants closed a horizontally mounted zipper within a 30-s time limit, beginning on the side of the dominant hand. Timed-out trials were reported as 30 s. The zipper task tests precision grasping with combination wrist movements. For a trial to be successful, the zipper must have closed by at least 6.5 cm, as indicated with a mark. We modified the base of the zipper so that the handle would not lie less than 45 degrees from being flat because the shape of the prosthetic fingers prevented gripping the zipper when lying flat. For each trial, participants started a timer with the prosthetic hand, closed the zipper, and stopped the timer with the prosthetic hand. Participants were instructed that both the number of releases (times the zipper slipped from their grasp) as well as the time would be recorded. When introduced to the task, participants were given three practice trials with each decode algorithm in order to decrease the learning effects in the recorded trials. Participants attempted five trials with each decode algorithm. Immediately before the recorded trials commenced, participants were given one final practice trial with the given decode algorithm. After attempting the five recorded trials with a particular algorithm, participants completed a NASA-TLX survey.

Pouring task
In the pouring task, participants poured rice from an aluminum can into a cup within a 1-min time limit. The ricefilled can was placed on the side of the dominant hand, and the empty cup was placed near the midline. This task assessed the ability of the decode algorithms to maintain a grasp during wrist rotation. For these experiments, the aluminum can was filled with 200 g of rice.
For each trial, participants started a timer with the prosthetic hand, poured the rice from the can (7.6 cm diameter, 11.3 cm tall) to the cup (8.6 cm diameter, 11.7 cm tall), and stopped the timer with the prosthetic hand. Participants were instructed that the amount of rice (in grams) successfully transferred to the cup as well as the transfer time would be recorded. The participants Fig. 3 Phase two: Activities of daily living represent realistic prosthesis use. a The fragile egg task tests precision control and tests the proportionality of a decode algorithm. b The zipper task requires stable pinching and simultaneous wrist rotation. c The pouring task requires stable grasping with smooth wrist rotation. d The folding task requires two-handed object manipulation and stable grasps during positional changes were not allowed to use their native hand to stabilize the cup, and if the can was dropped the trial was considered unsuccessful. When introduced to the task, participants were given three practice trials with each decode algorithm in order to decrease the learning effects in the recorded trials. Participants attempted five trials with each decode algorithm. Immediately before the recorded trials commenced, participants were given one final practice trial with the given decode algorithm. After attempting the five recorded trials with a particular algorithm, participants completed a NASA-TLX survey.

Folding task
In the folding task, participants folded a small hand towel (39 cm by 58 cm) twice using a pinch grasp within a 1-min time limit. The first fold was done from short-end to short-end. The participant could then use their nondominant native hand to rotate the towel 90 degrees for the second fold, which was also from short-end to shortend. For both folds, participants were instructed not to use the edge of the table to facilitate grasping the towel. Additionally, they were not to use their native hand to guide the end of the towel into the prosthesis. Ideally, participants simultaneously grasped the short edge of the towel with their native hand and the prosthesis.
For each trial, participants started a timer with the prosthetic hand, folded the towel twice, and stopped the timer with the prosthetic hand. Participants were instructed that the number of releases (times the towel slipped from their grasp) and the time would be recorded. When introduced to the task, participants were given three practice trials with each decode algorithm to decrease the learning effects in the recorded trials. Participants attempted five trials with each decode algorithm. Immediately before the recorded trials commenced, participants were given one final practice trial with the given decode algorithm. After attempting the five recorded trials with a particular algorithm, participants completed a NASA-TLX survey.

Analysis
We compared algorithms for each metric within each task with α = 0.05. The data were significantly non-normal for several conditions (Shapiro-Wilk test [74]), so nonparametric statistics were used. Median completion times for successful trials were calculated for each condition for each participant (N = 9 participants). For the pouring task, the task was considered successful if the total amount transferred was greater than 199.9 g (no spill). For each metric, a Kruskal-Wallis test [81] was used to determine if there were differences among the decode algorithms. If the Kruskal-Wallis revealed differences, paired comparisons were made using Wilcoxon's signed-rank test [75]. Because the hypotheses were preplanned, no correction for multiple comparisons was applied [76]. Post-hoc, we decided to compare aggregate user preferences. We grouped the user preferences for each task and performed a Kruskal-Wallis test across algorithms. As this was an unplanned comparison, we corrected the pairwise comparisons with the Dunn-Sidak correction for multiple comparisons [82].

Phase one: comparing latching filter and training type Latching filter comparison
The latching filter generally reduced the number of dropped clothespins but sometimes increased the clothespin transfer time (Fig. 4a-c). For the MLP, dropped clothespins were significantly reduced from 3 to 1 (p < 0.01; Fig. 4a). For the CNN, dropped clothespins were significantly reduced; however, the median values were both 1 (p < 0.05; Fig. 4a). The latching filter significantly decreased transfer times from 11.0 to 7.4 s for the MLP (p < 0.01), but significantly increased transfer times for the CNN and mKF from 6.9 to 7.2 s (p < 0.05) and 7.8 to 10.1 s (p < 0.001), respectively (Fig. 4b). The latching filter significantly reduced subjective workload for the MLP from 54 to 41 (p < 0.05) but had no significant effect on subjective workload for the CNN and mKF (Fig. 4c).

Training type comparison
Training with combination movements generally improved the outcomes for the clothespin relocation task (Fig. 4d-f ). Training with combination movements significantly reduced dropped clothespins for the MLP from 2.5 to 1 (p < 0.05; Fig. 4d). For the CNN, dropped clothespins were significantly reduced; however, the median values were both 1 (p < 0.05; Fig. 4d). Training with combinations significantly reduced the median time needed to transfer a clothespin for CNN and MLP from 8.3 to 6.7 s (p < 0.05) and 11.8 to 7.9 s (p < 0.001), respectively (Fig. 4e). Combination training significantly reduced the subjective workload for the CNN, from 44 to 32 (p < 0.01; Fig. 4f ). Training type had no significant effect on the subjective workload for the mKF and MLP; however, both were reduced with combination training (Fig. 4f ).

Phase two: comparing algorithms Fragile egg task
Preferences for the fragile egg task were significantly different for the different decode algorithms (p < 0.01; Fig. 5e). The CNN was significantly preferred over the MLP (median ranks 1 and 3, respectively; p < 0.05). Similarly, the mKF strongly trended toward being preferred over the MLP (median ranks 2 and 3, respectively; p = 0.05). Subjective workload also significantly differed among algorithms (p < 0.05; Fig. 5b). The CNN had significantly lower workload than did the MLP (51, 74, respectively; p < 0.01). The mKF had significantly lower workload than did the MLP (51, 74, respectively; p < 0.05). There were no significant differences in success rate across the algorithms; however, the mKF and CNN were generally more successful than the MLP (Fig. 5c).
There were no significant differences in completion time across algorithms, although the mKF generally performed faster than the CNN and MLP (Fig. 5a).

Zipper task
Preferences for the zipper task differed significantly among algorithms (p < 0.05; Fig. 5e). The CNN was significantly preferred over the MLP (median ranks 1 and 3, respectively; p < 0.05). There were no significant differences in completion time (Fig. 5a), but the median completion time for the mKF was the fastest. Inadvertent releases and subjective workload for the decodes did not significantly differ among algorithms (Fig. 5d, b, respectively).

Pouring task
The pouring task was generally perceived as less difficult than the fragile egg and zipper tasks (Fig. 5b). There were no statistical differences for any of the recorded metrics (time, success rate, preference, and subjective workload). The MLP was preferred by a few participants, in contrast to the fragile egg and zipper tasks where it was never preferred (Fig. 5e). Participants had high success rates for the pouring task across algorithms and rarely spilled (Fig. 5c); we report only on whether the task was completed without spills or drops. Completion times were similar among algorithms (Fig. 5a). Phase two: subjective, user-focused metrics reveal differences that objective performance-based measures may not. a Completion times did not significantly differ for any of the tasks. The mKF was generally faster for tasks perceived as more difficult, and slower for tasks perceived as less difficult. b The mKF and CNN were significantly easier to use than the MLP for the fragile egg task, as evidenced by a lower NASA Task Load Index (TLX) score. c mKF and CNN outperform MLP in fragile egg task, although the difference was insignificant. All algorithms had high success in the pouring task. d All algorithms performed well and did not significantly differ in the zipping and folding tasks. e The CNN was significantly preferred over the MLP, aggregated across tasks. The mKF trended towards significance over the MLP across all tasks. The CNN was significantly preferred over the MLP for the fragile egg and zipper tasks. Boxplots show median (red line), inter-quartile range (colored boxes), and most extreme, non-outlier values (1.5 × inter-quartile range; black whiskers) Paskett et al. J NeuroEngineering Rehabil (2021) 18:45

Folding task
The folding task was generally perceived as less difficult than the fragile egg and zipper tasks (Fig. 5b) and did not result in any significant differences for the recorded metrics. More participants seem to prefer the CNN for this task (Fig. 5e) even though the trend suggests they inadvertently released the towel more than with the mKF or MLP. The CNN and MLP may also perform faster than the mKF (Fig. 5a).

Aggregate preferences across tasks
Aggregated across tasks, user preferences were significantly different (p < 0.05; Fig. 5e). Users preferred the CNN over the MLP (median ranks 1 and 3, respectively; p < 0.01), and the mKF trended toward being preferred over the MLP when aggregated across tasks (median ranks 2 and 3, respectively; p = 0.07).

Algorithm training and prediction times
On average, the mKF trained in 0.7 s, the MLP trained in 47 s, and the CNN trained in 135 s in phase one. The threshold optimization averaged 155 s in phase one. In phase two, the mKF trained in 2.7 s, the MLP trained in 193 s, and the CNN trained in 262 s. The threshold optimization took 480 s in phase two. For testing, the mKF took 0.6 ms, the MLP took 1.4 ms, and the CNN took 3.0 ms. The latching filter took 0.027 ms to modify the algorithm output. Detailed training and testing results are included in Table 1.

Discussion
In this two-phase study, we demonstrate the importance of using diverse training sets that include combinatorial movements; the value of smoothing the algorithm output; and the efficacy of linear and nonlinear continuous decode algorithms for restoring functional ability to prosthesis users. To our knowledge, no study has attempted to compare real-world functional improvements of continuous prosthesis control using multiple algorithms, training paradigms, and output smoothing from the same participants.
In phase one, we found that training on combination movements improves functional performance and that a latching filter [49] can reduce jitter albeit with some cost in speed. In phase two, we found that a linearly-based decode algorithm (the mKF) can perform similarly to nonlinear deep-learning techniques (the CNN and MLP). From a clinical perspective, the mKF may be preferable sometimes due to reduced algorithm training time and computational resources required for real-time control. Phase two demonstrates how user-centric outcomes (user preferences and subjective workload) elucidated differences among algorithms when performance metrics alone could not.
Adding combination movements to the training data benefited the participants two-fold: they completed tasks faster and with greater control. The improved speed and control seem to preferentially benefit nonlinear algorithms (MLP, CNN) by producing more consistent control among participants, whereas the mKF was relatively consistent in both conditions. In terms of speed, the MLP benefited the most from the combination movement training; however, it was also the slowest in the single-movement training paradigm and had the most room for improvement. In terms of subjective workload, results from combination training suggest that nonlinear approaches benefited most from combination training; however, the difference was significant only for the CNN. Perhaps combination trainings introduce nonlinearities that preferentially benefit nonlinear neural networks.
The latching filter generally enhanced control at the cost of speed. We were surprised how little the latching filter benefited the mKF relative to the CNN and MLP. The latching filter uses a hyperparameter that adjusts the level of "latching" (and inherent speed) of kinematic changes. The latching filter uses the hyperparameter to preferentially dampen small changes in the decoder estimates (which are assumed to be inadvertent movements) while allowing gross movements to occur with little or no dampening [49]. Our piloting and informal use of these algorithms in lab suggest nonlinear approaches generally predict kinematic positions similar to the data used for training and thus move to heavily trained positions quickly and unimpeded by the latching filter (e.g., rest, full flexion); in contrast, the Kalman filter predicts kinematics better dispersed throughout the kinematic range, which may be due to a relatively low Kalman gain [30] that places more value on past estimates than current measurements and decreases responsiveness. Thus, gross movements could be slowed by a low Kalman gain and the latching filter as they are initiated. We expect that the hyperparameter setting favored the nonlinear algorithms, as raw decoder output from the nonlinear algorithms likely moved between extreme positions (e.g., open hand to closed hand) faster and were not slowed as much by the latching filter when gross movement was intended. We expect that optimizing the hyperparameter could improve speed and control for the mKF especially, but also the CNN and MLP; however, such an optimization is beyond the scope of this study. The lack of improvement in dropped clothespins with the mKF might also have occurred because its control was different than the other two decoders, and so the participants did not become as accustomed to its responsiveness. In phase two, we examined the decode algorithm's performance using combination training and the latching filter in a high-degree-of-freedom setting (five; increased from two in phase one). With the reduced experimental condition count in this phase, we were able to thoroughly investigate the algorithms' performance across a broader range of functional tasks. This phase also included a larger, more diverse training dataset, which we anticipated might preferentially benefit nonlinear algorithms.
We chose tasks representative of daily life that probe qualitatively different aspects of prosthesis control. The fragile egg task, generally judged most difficult by participants, requires precise movements and tests the decoder's proportionality. The zipper task, also judged difficult by participants, requires consistent precision pinching with simultaneous wrist rotation. The fragile egg tasks and zipper tasks, which test precise movements, could be more difficult if the task were occluded by the bypass socket or prosthesis itself. However, a similar occlusion may occur in the real world. We did not have any participants comment on the difficulty of the tasks as being due to occlusion. The pouring task necessitates smooth wrist rotation during stable grasping. The folding task requires two-handed object manipulation and grasp stability with positional changes which often degrade control performance [60]. We found the variety of the tasks to be useful, but the tasks were easier for the participants than pilot experiments suggested and, in some cases, resulted in floor and ceiling performance effects. The tasks would need to be more difficult to better elucidate performance differences among the algorithms in the future.
In part because of these floor and ceiling effects, user preferences provided strong evidence for identifying the "best" algorithm. Whereas many performance metrics varied only slightly, user preferences showed greater differences among algorithms. Aggregated across tasks, preferences indicated the CNN was preferred over the MLP, and the mKF also trended toward being preferred over the MLP. As testing multiple algorithms in the same experimental sessions allowed users to rank their preferred algorithms, we see incorporating user preferences as a strength of the study. Translating research practices to the clinic requires user-centric, data-driven approaches. Although performance differences may be small among algorithms, our findings show that differences in user experience exist across the participants, providing additional evidence toward clinical translation. Subjective workload also provided useful information on user experience, showing that the CNN and mKF were easier to use during the fragile egg task. From the user's perspective, subjective workload may be the difference between adopting or abandoning a prosthesis; while two prosthetic systems may be capable of similar performance, the most intuitive, easy-to-use prosthesis will certainly be preferred.
We found it surprising that the mKF performed similarly to the nonlinear CNN whereas the MLP underperformed. Our informal use of the mKF suggests that the mKF performed well because it has a smoother, more predictable response to input changes, which can improve functional control [54]. We anticipate the CNN performed well because it used all 528 of the EMG features (versus 48 from the channel selection algorithm for mKF and MLP) and convolves across time, thus training on more complex data and potentially learning from temporal characteristics which could be advantageous over the MLP. In [29], the MLP uses the first 16 principal components as input features, different from our use of 48 selected features, which could be another factor in its poorer performance. We informally observed that participants struggled to proportionally control the prosthesis for the MLP, and it would often close quickly and break the fragile egg, which may indicate a lack of generalization from the training data.
It is important to note that the purpose of our study was not to provide an exhaustive comparison of the mKF, CNN, and the MLP per se; rather, our goal was to determine whether combination training sets and the use of a latching filter would improve these previously published decoders, and which combinations to recommend for clinical translation. Although one algorithm may perform better under different circumstances (e.g., the number or type of features provided, training kinematics associated with mirror movements, or the amount of data in testing and validation sets) exploring all possible variations would have required even more participant time, which was already limited. Even with the reported experimental variations, most participants were starting to fatigue and were eager to remove the heavy prosthesis by the end of the experimental session-hence there was a practical limit to the number of decoders and variations in our comparison (e.g., only training type and latching filter using previously published algorithms).
The training data used are critical for the performance of any decoder. In our study, we used mimicry training, where the user follows the prosthesis through preprogrammed movements, and the algorithm fits the feature inputs to the kinematics of the prosthesis. Inherent in this training paradigm is the assumption that the user perfectly mimics the kinematics of the hand, which inevitably is an imperfect assumption. Imperfect kinematic and feature data could have been preferentially detrimental to certain decoders (e.g., the MLP may perform better with cleaner labels). Another method for creating kinematics is to use mirror training, where the kinematics of the opposite hand are recorded while the user completes bilaterally mirrored movements. This approach has been shown to provide reliable training data for prosthesis control [83,84] and could be further studied in realworld environments. Even though we did not use mirror movements, we aligned our kinematics and features during channel selection to improve performance. Additionally, some of the training data used single finger motions not exclusively in use for the tasks. It is possible that the training paradigms could have been improved for the specific tasks completed in this study; however, in realworld settings the tasks are not known a priori and therefore a generalizable training dataset is preferable. Recent studies have reported on some of the most used grasp types [85,86]. Based on the findings of these studies, a specific, yet generalizable, training set could be developed for real-world usage. Another method of improving features could include optimizing electrode placement. Although beyond the scope of this study, one could use the channel selection algorithm [70] to find geometric or anatomical patterns in the location of the most useful electrodes. These patterns could guide electrode placement for advanced prostheses used by persons with transradial amputation; however, the usefulness of the patterns would vary depending on the level of the amputation.
Our study did not directly measure or restrict compensatory movements. Future work would benefit from quantifying or restricting compensatory movements. For some tasks (e.g., pouring task), compensatory movements could make stable grip control more important to the user than the ability to simultaneously activate multiple degrees-of-freedom. One benefit of the bypass socket [69] used in this study is that the rotating wrist attachment generally keeps the prosthesis hanging below the user's wrist, regardless of shoulder, elbow, or wrist orientation, thus reducing the ability of the user to make compensatory movements and hence indirectly requiring simultaneous movements. The clothespin inside-out task conducted in phase one is particularly difficult to complete with compensatory movements. As a post-hoc analysis of the clothespin task, we found that the mKF and MLP had more simultaneous movements than the CNN, suggesting that they may have required less compensation. Although our study did not compare the approaches above with conventional direct control, one previous study [87] found that arm movements with an advanced decoder [34] were more similar to the endogenous hand when compared with conventional direct control in the clothespin relocation task. This finding suggests more advanced decoders can provide the user with more natural movements and reduce compensation.
Reliable, simultaneous movements in position mode could provide the user with a prosthesis experience like that of the endogenous hand, in which simultaneous movements are natural. In velocity mode, a user may choose to complete multi-degree-of-freedom tasks sequentially, moving a single degree-of-freedom at a time. Sequential movements are far less natural and much slower than simultaneous movements if the simultaneous movements can be completed reliably. Our results suggest simultaneous movements were used for task completion; however, simultaneous movements alone do not mean the movements were intended. The mKF and the MLP had significantly more simultaneous movements than the CNN; however, reported user experiences and our informal observation suggest that the MLP may not have made the simultaneous movements reliably. The CNN, with less simultaneous movements, may have provided reliable grip control which benefited the task and was reflected in user experiences, although successful task performance would have required greater compensatory movements from the user.
Although common performance metrics were used, direct comparisons with other studies are difficult due to different experimental conditions (e.g., degrees-offreedom, physical prosthesis) and methodologies. In [62], where a two-degree-of-freedom, continuous, linear regression-based control strategy was compared with a conventional control strategy, the participant required about 10 s per clothespin for the horizontal to vertical transfer with the regression-based approach, whereas the conventional strategy required about 14 s. In [50], the two-degree-of-freedom "proportional simultaneous" control strategy (most similar to what we have termed continuous control) required about 5 s per clothespin transfer (participants completed both the horizontal to vertical and vertical to horizontal transfers), whereas the conventional strategy required about 10 s per clothespin. Neither study [50,62] controlled for compensatory movements, which may lead to faster completion times, as compared with our use of the bypass socket [69] which reduces compensatory movements. Our median transfer times per clothespin ranged from 7 s to 11 s to depending on the decoder, using two-degrees-of-freedom. Classifier [20,66,67] and hybrid [34] approaches on the clothespin relocation task have ranged from about 5 s to 30 s per clothespin. In [66], three participants averaged 24 s per clothespin prior to a take-home trial with their prostheses, most like our study with naïve prosthesis users; after the trial, the participants averaged 8 s. In [88], a relatively large cohort (n = 17) of persons with belowelbow amputation completed the clothespin relocation task using conventional direct control. Users took about 6 s per clothespin transfer. Because compensation was not measured or restricted in [88], users may have been able to complete the task quickly with task-effective body compensation.
The fragile object task has been performed with various prostheses and fragile objects. Generally, the ratio between the break force and mass determines the task difficulty; a lower ratio implies greater difficulty. In the present study, non-amputee participants achieved a 63% median success rate. The break force in the present experiments was about 20 N, and the device mass was 615 g, resulting in a 0.03 N/g ratio. In a previous study from our group, an individual with an amputation achieved about 55% success with the mKF on a similar fragile egg task without sensory feedback [27]. In [89,90], the success rates without sensory feedback were 38% and 45% with a fragile object that broke at 1.23 N and had a mass of 80 g, a ratio of 0.15 N/g, approximately five times the ratio of our study. In [91], the success rate was about 86% without sensory feedback, also using a different fragile object, which broke at 10.7 N and weighed 8 g, a 1.34 N/g ratio (approximately 45 times the ratio of our study). Transfer times, where reported, were similar among the aforementioned studies and the present study, ranging from 10 s to 13 s per transfer.
One goal of this study was to determine the best decoders and training paradigms for a portable take-home system. This study provides strong evidence from real-world functional tasks for using diverse training sets as input for a decoder and modifying the decode output with an LF. Selecting between linear and nonlinear decoders is less clear. Our findings indicate comparable performance between an mKF and a CNN, where user preferences were slightly increased for the CNN. Clinical deployment, however, may favor the mKF, which trains faster and is less computationally expensive. Furthermore, in a take-home clinical trial where more training data may be collected over time, aggregation of these datasets may begin to favor deep learning approaches [68] like the CNN or MLP as technologies continue to improve. Overall, our results provide valuable information toward the clinical implementation of advanced control strategies for prosthesis users.

Conclusions
This work compared myoelectric prosthetic control using activities of daily living with 12 continuous decoders in a two-phase study. In the first phase, comparing training paradigms and decoder smoothing, we found that training control algorithms with more complex movement patterns (i.e., simultaneously active digits and/or wrist) improves control accuracy and speed. Our results showed that nonlinear smoothing of the decoder output can improve accuracy although sometimes at a cost in speed. In the second phase, objective performance among the mKF, MLP, and CNN was similar for many tasks, but subjective preferences favored the CNN and mKF. Users' subjective experiences are a key additional consideration.
This study was completed to select a decoder for a long-term take-home trial with implanted neuromyoelectric devices. The results demonstrate the importance of using rich training paradigms and nonlinear algorithm smoothing to improve continuous prosthetic control. The results also highlight the importance of user-focused subjective metrics in comparing decoders. Where performance differences could not effectively select a "best" decoder, user preferences added strong evidence toward selecting the mKF or CNN. Clinical considerations may favor the mKF as it is faster to train and computationally less expensive than the CNN. Subjective experiences may be even more important in long-term real-world environments and may influence voluntary prosthesis use. Overall, the results herein demonstrate the efficacy of continuous decoders for enabling intuitive prosthetic control of modern bionic prostheses.