We present results of single-trial classification of MI signals using a novel wireless fNIRS instrument. Our findings show, that using a simple feature combination selected by linear discriminant analysis, it is possible to discriminate between single-trials in response to MI tasks differing in tasks complexity, i.e. simple versus complex tasks. Our results revealed an average accuracy of 81% that was achieved by selecting for each subject a best-performing combination consisting of one channel, a certain time interval and up to four Δ[O2Hb] signal features. In the following discussion we address each of these aspects, their limitations for future single-trial classification approaches and their relevance for neurorehabilitation.
5.1 Channels selected for classification
As shown in Table 2, the signal locations, i.e. channels selected for optimal classification, differed across subjects. As a result of this subject-to-subject variability, classification in our study required the individual selection of a suitable channel in which an appropriate time interval with significant oxygenation changes was detected in both task conditions MI-simple and MI-complex. This is in line with previous studies which selected channels and/or time intervals for individual subjects [7, 8].
In this study, the channel most frequently selected for classification was channel 3 (N = 6 (50%)), followed by channel 2 (N = 4 (33%)) and 1 (N = 2 (16%)). As illustrated in Figure 2, channel 3 was positioned more lateral over the left hemisphere as compared to channel 1 and 2. This might indicate that either the signals obtained from the very lateral positioned part of the sensor, i.e. channel 3, or the cortical areas covered by that part of the sensor were better suitable for discrimination of the presented MI tasks. Using an approximated topographical assumption we suggested that while the medial part of the sensor was detecting signal derived from SMA, the more lateral part was detecting signal located in areas of PMC. Hence, the signals originating from PMC might have been favoured for greater classification accuracy in the given MI tasks in our study. This might have been unexpected considering that channel 3 elicited the smallest oxygenation changes over all subjects both in response to MI-simple and MI-complex (Figure 3). However, the proportionally larger SNR associated with that smaller signal in channel 3 (Table 1) might have allowed for better classification results. Hence, part of the subject-to-subject variability in signal location might be explained by these observations, i.e. indicating that the more lateral the position of a specific sensor channel and the smaller the signal was - accompanied with a good SNR -, the higher the resulting classification accuracy.
Further reasons for this subject-to-subject variability in signal location might be explained by methodological aspects of fNIRS which can be related to sensor positioning. Although, external landmarks can be used for sensor positioning using the international 10-20 system [38, 39], these landmarks offer only probabilistic guidelines for individual differences in location. Hence, as with several other non-invasive brain imaging methods (e.g., EEG) anatomical information and variability between individuals are not directly obtained, making the localization of externally recorded signals difficult with respect to the underlying brain. These and the limitation of the usually restricted NIRS sample volume  in our study may have lead to differences in exact location of the interrogated tissue from subject to subject. Therefore, by using F3 as landmark, we could only assume to cover secondary motor areas such as SMA or PMC in the individual subjects.
5.2 Analysis time intervals selected for classification
Similar to the signal location, the individual time intervals after onset of the stimulation phase that yielded the best classification accuracy differed between subjects from five to eleven seconds (Table 2, Figure 5). Consequently, the analysis time intervals required for the best classification accuracy varied between subjects within a range from four to ten seconds. This time frame is comparable to those reported by Sitaram et al.  who required ten seconds of stimulation data in response to MI of finger-tapping and by Tai et al.  who choose intervals between four and 19 seconds during positively and negatively-emotional induction tasks. However, it needs to be taken into account that these time intervals were obtained with offline classification, while online classification has been shown to require at least 15 seconds of MI performance . We suggest that the subject-to-subject variations in the selected time intervals are most likely due to individual latency differences in the delay of the Δ[O2Hb] response after onset of the imagination task. Part of these subject-to-subject variations might be explained by differences in the cognitive processes underlying MI performance in our experimental tasks. Although, subjects were explicitly instructed to perform kinesthetic MI, i.e. using imagery to imagine how movements feel, instead of visual imagery, i.e. imagine watching oneself performing a task, or any other form of imagination, we can not provide a measure for the individual strategies used. Another explanation might be the training status of our subjects. Although the answers of the VMIQ revealed relatively good imagery ability among subjects, none of them were explicitly trained in the use of MI. Hence, it might be suggested that subject-to-subject variability may have been lower if recorded in experienced or trained subjects.
5.3 Δ[O2Hb] signal features selected for classification
Previous studies investigating fNIRS single-trial classification reported the use of different signal features and diverse numbers of trials collected per subject. The majority of studies used mean Δ[O2Hb] and/or Δ[HHb] amplitude changes in the hemodynamic response and collected from ten trials per subject during MI  to 60 trials per subject during emotional induction . The feature set used in our study - Δ[O2Hb] mean amplitude, variance, skewness and kurtosis - was chosen from the selection reported by Tai et al.  who found classification accuracies between 75% and 94.67% using these features. We hypothesized that using these additional four features, instead of only the mean amplitude, would enhance potential classification accuracies. This was confirmed in some of our subjects which required up to four of the features to reach higher classification accuracies as compared to only using the mean amplitude. Overall, as with channel and time interval selection, subject-to-subject variability was found also in the feature set selection:
Δ[O2Hb] variance (N = 10 (83%)): This feature was selected most frequently indicating that our data contained a large variation in variance between individual signals and between the two task conditions, MI-simple and MI-complex. However, the value of the variance within an individual signal was relatively stable from trial-to-trial, therefore serving a suitable feature for discrimination between the two tasks. Overall subjects, the averaged value of Δ[O2Hb] variance revealed a significant negative correlation with the classification accuracies in both conditions, i.e. classification rates improved with decreasing variance (MI-simple: r = -0.688*, p = 0.028; MI-complex: r = -0.701*, p = 0.024) (Figure 6). This finding is in line with the tendency that has been observed for the selection of channels (section 5.1), i.e. channels with larger SNR (in particular channel 3) revealed higher classification accuracies.
Δ[O2Hb] mean amplitude (N = 8 (66%)): The mean amplitude as feature reflected those individual time intervals in which both a significant increase within a given condition and a significant difference between the two conditions was found. As shown by the previous studies the mean amplitude is a reliable feature selected for classification, in particular for classification of two different conditions as in our case. In our study, as again discussed for the selection of channels (section 5.1), there was a slight tendency that smaller mean amplitudes did reveal higher classification accuracies, but no significant correlations were found.
Δ[O2Hb] skewness (N = 6 (12%)): Classification rates also improved in relation to skewness. However, the relationship differed between the two conditions. Skewness of signals in response to MI-simple were negatively correlated with increasing accuracy (r = -0.850*, p = 0.032), i.e. the smaller the value of the skewness the higher the accuracy of classification in a given subject. In contrast, in MI-complex a positive correlation was observed (r = 0.854*, p = 0.031), i.e. the higher the skewness the higher the accuracy of classification in a given subject (Figure 6). This finding may reflect differences in the shape of the signal between the simple and the complex imagery task. While in response to the simple task, higher accuracies may have favoured a slower signal increase, i.e. the tail on the left side of the probability density function was longer than the right side and the bulk of the values was located to the right of the peak; contrary, in response to the complex task a faster signal increase may have been favoured reflected by a positive skewness, i.e. the tail on the right side was longer than on the left side.
Δ[O2Hb] kurtosis (N = 5 (41%)): The last feature was selected only in a few subjects, but was relevant in these to achieve the reported classification accuracies. No correlations were found with the classification accuracy.
Although the classification accuracies look promising they are nevertheless subject of limitations. We hypothesized that the use of simple feature sets would facilitate potential implementation in future applications. However, due to the observed subject-to-subject variability such an implementation would require quite different feature sets per subject to achieve sufficient classification accuracy. Although, the necessity for individualized classifier training has been recognized as a well-known issue in single-trial classification , the following aspects might have accounted for the subject-to-subject variability observed in our study and could be considered in future classification studies:
First, the number of trials on our study was 12 which is comparable to previous studies . However, it is conceivable that the number of features required for individual subjects may have been lower if more trials were collected. On the other side, the experimental length was inherently limited by the repetitive nature of the protocol and the mental demand of the task on the participant. Future study may explore different numbers of trials to find a suitable balance between features needed, classification accuracy and the demand of the task.
Second, subject-to-subject variability in the hemodynamic onset latency in response to MI performance may be improved. The hemodynamic response measured by fNIRS is temporally delayed from the onset of the underlying neural activity about 6 s. Further, it is known that MI signals generally exhibit longer onset latencies as compared to ME signals. Previous studies found that Δ[O2Hb] in response to MI increased about 2 s later compared to real movement execution . However, envisioning an application in neural interfaces, MI as mental task therefore still limits the practical use of NIRS based systems. Compared to other mental tasks this delay might be explained by the training status of the individual subject. For example, while mental tasks such as preference decision making  or emotional evaluation  might be performed more intuitively without training, MI for use in neural interfaces does require considerable training as shown by recent evidence from both neurorehabilitation applications  and operating BCIs . It might be therefore suggested that subjects experienced or trained in MI might have elicited faster and less variable responses.
5.4 Future work
Considering future applications, while MI training may be possible in most healthy subjects and the majority of patients, some patients, especially those severely impaired, may not provide sufficient cognitive capabilities to train MI. This might further limit the use of MI in neural interfaces as compared to alternative BCI paradigms using more intuitive mental tasks . To evaluate the potential use in a BCI or in neurorehabilitation, it would be therefore necessary to test our classification approach in several patient groups, such as affected by stroke, cerebral palsy, amyotrophic lateral sclerosis, and other motor neuron diseases. Such future work would further require including solutions for the reduction of subject-to-subject variability, such as specifically designed training sessions.
Last, future studies could also address methodological options to reduce the hemodynamic response delay in NIRS signal. A recent example has been given by Cui et al. 2010  who reported that it may be possible to decode the true behavioral state from the measured neural signal - instead of the hemodynamic signal - using fNIRS. The authors reported that using a multivariate pattern classification technique (linear support vector machine, SVM) and systematically evaluation of the performance of different feature spaces (signal history, history gradient, signal and spatial pattern of Δ[O2Hb] and Δ[HHb]), the latency to decode a change in behavioral state could be reduced by 50% (from 4.8 s to 2.4 s), which would enhance the feasibility of MI based real-time NIRS applications.
5.5 Relevance of MI classification for neurorehabilitation
Our experimental design was motivated by two aspects related to the use of MI as mental task in neurorehabilitation. First, our attempt to classify two tasks differing in complexity was motivated by the known fact that there is a difference in (re)learning a simple as compared to a complex task. One hypothesis is that the cognitive processing demands may be inherently greater for the learning of complex tasks . This has demonstrated the need to use both simple and complex skills in motor-learning research in order to gain further insights into these potentially distinct learning processes and - in our case - the underlying signal features. Therefore, current neurorehabilitation strategies usually address tasks differing in complexity, e.g. fine coordination and precise dexterity versus gross movements, single finger versus whole hand or arm movements or with versus without the use of objects for goal-directed actions such as in our case the keyboard. Thus, we suggested that our approach of evaluation tasks differing in complexity, i.e. both simple and complex finger-tapping tasks for single-trial classification is of relevance for neurorehabilitative applications.
Second, several mental tasks have been recently investigated in the development of neural interfaces, e.g. mental arithmetic tasks , language-, visual- and auditory-based imagery tasks or spatial navigation imagery . Those mental tasks are suitable to fulfil the main goals of neural interfaces, i.e. communication such as using spelling devices or the control of external devices such as neuroprostheses. In neurorehabilitation an additional goal is to combine neural interfaces with the training or relearning of impaired motor function . An example for such a combined approach would be a combination of BCI training and physical therapy such as in stroke patients . For such applications, MI has been suggested as a suitable mental task as it - according to the simulation hypothesis - not only activates the impaired motor areas responsible for task execution , but also accesses the motor network independently of the impaired function thereby improving recovery . Especially in less severe disabled persons, e.g. in individuals with upper-limb paralysis, MI based BCI systems could be used as tools to recruit and reinforce spared cortical networks by activating the corresponding neural representations. As Dobkin  suggested, using such a combined training-BCI approach, researchers and therapists may be able to improve the effects of a rehabilitation treatment aimed at impairment and disability. Further, MI signals may enhance training possibilities by providing insight whether an individual is indeed engaging the network for mental rehearsal. For example, therapists could use the change in the MI signal to get immediate feedback about whether an individual is optimally focussing on the imagined movement thereby monitoring treatment progress. Last, signals derived from MI performance may be used as direct online feedback for the individual. Such feedback may represent the Δ[O2Hb] amplitudes of the recruited motor pools elicited in the individual's brain, which in turn may motivate for increased subsequent MI output and improve the timing and completeness of imagined movements. As a result, individuals may regain strength and precision if they can find a way to practise with MI signals thereby accelerating normal recovery.