- Open Access
Inter-rater reliability of kinesthetic measurements with the KINARM robotic exoskeleton
Journal of NeuroEngineering and Rehabilitation volume 14, Article number: 42 (2017)
Kinesthesia (sense of limb movement) has been extremely difficult to measure objectively, especially in individuals who have survived a stroke. The development of valid and reliable measurements for proprioception is important to developing a better understanding of proprioceptive impairments after stroke and their impact on the ability to perform daily activities. We recently developed a robotic task to evaluate kinesthetic deficits after stroke and found that the majority (~60%) of stroke survivors exhibit significant deficits in kinesthesia within the first 10 days post-stroke. Here we aim to determine the inter-rater reliability of this robotic kinesthetic matching task.
Twenty-five neurologically intact control subjects and 15 individuals with first-time stroke were evaluated on a robotic kinesthetic matching task (KIN). Subjects sat in a robotic exoskeleton with their arms supported against gravity. In the KIN task, the robot moved the subjects’ stroke-affected arm at a preset speed, direction and distance. As soon as subjects felt the robot begin to move their affected arm, they matched the robot movement with the unaffected arm. Subjects were tested in two sessions on the KIN task: initial session and then a second session (within an average of 18.2 ± 13.8 h of the initial session for stroke subjects), which were supervised by different technicians. The task was performed both with and without the use of vision in both sessions. We evaluated intra-class correlations of spatial and temporal parameters derived from the KIN task to determine the reliability of the robotic task.
We evaluated 8 spatial and temporal parameters that quantify kinesthetic behavior. We found that the parameters exhibited moderate to high intra-class correlations between the initial and retest conditions (Range, r-value = [0.53–0.97]).
The robotic KIN task exhibited good inter-rater reliability. This validates the KIN task as a reliable, objective method for quantifying kinesthesia after stroke.
The identification and measurement of sensorimotor deficits after stroke has historically placed considerable focus on motor impairment. Proprioceptive deficits (our sense of limb position and motion ) have received far less attention in both research and clinical practice. Evidence has shown that sensory impairments occur in the majority of stroke survivors [2,3,4,5], and are thought to negatively impact functional ability and recovery after stroke [6,7,8,9]. Further, differences have been found in the timing and trajectory of motor and proprioceptive recoveries after stroke . Our understanding of proprioceptive impairments after stroke has been limited by the fact that proprioception is difficult to measure using standard clinical examinations .
Clinical assessments for measuring impairments in proprioception typically detect only the most severe impairments. They often rely on the examiner to move a body segment (e.g., the finger) and ask the subject whether the finger has been moved upward or downward . Other tests, such as the Thumb Localizer Test , rely on the examiner to position the thumb of the affected arm above the head and have the patient locate their thumb, without vision, using their unaffected arm. These clinical assessments often have poor sensitivity because they collapse across different components of proprioception (position sense and kinesthesia) and often utilize simplistic 2- or 3-point ordinal scales [10, 12].
Furthermore, these proprioceptive assessments have been shown to have low reliability among assessors . Efforts to shorten clinical evaluation time of longer and more thorough sensory assessments, such as the Nottingham Sensory Assessment, has been shown to negatively impact inter-rater reliability . Further, due to the limited numerical range of measurement, the Nottingham is susceptible to floor and ceiling effects similar to measures of motor impairment (Fugl-Meyer)  that exhibit reduced detection of and sensitivity to sensorimotor impairment.
New methodology and technology for assessment has taken steps to improve measurement of proprioceptive function after stroke [3, 4, 15, 16]. The use of robotics for assessing sensorimotor impairment has gained significant popularity [3, 4, 17,18,19] due to the ability to obtain objective, reliable, sensitive measurements capable of detecting sensorimotor deficits that clinical measures often miss . In addition, robotic assessments can be completed relatively rapidly without the need for a clinician to be present. However, for these measures to be determined reliable, it is necessary to test the reproducibility of results. This can be influenced by intrinsic subject variability, as well as factors related to operator setup of the subject in the robot.
Developing measures that accurately and reliably assess proprioception after stroke is important, because both proprioceptive and motor deficits have been found to be significantly correlated with the performance of activities of daily living following stroke [8, 21]. To better inform neurorehabilitation therapies and practices, there is a need for more sensitive and reliable measurement tools for identifying proprioceptive impairments after stroke. Our previous study investigating kinesthetic deficits after stroke  did not examine the reliability of the kinesthetic robotic measure. Good reliability is a critical component to any clinical test to be sure that it can properly evaluate change over time. To examine reliably of the robotic kinesthesia task we quantified performance in neurologically normal subjects and subjects with stroke that performed the same task with multiple operators to evaluate the reliability of the task.
We evaluated inter-rater reliability of a previously described kinesthetic matching task (KIN) [4, 5, 22]. We evaluated kinesthetic behavior in 25 neurologically-intact control subjects and 15 individuals with first-time stroke. To be included in the study, all subjects had to be aged 18 years or older. For subjects with stroke, inclusion criteria required them to have first-time, clinically identified unilateral stroke. Subjects with stroke were excluded if they had aphasia, apraxia or significant cognitive impairments that limited them from understanding three-step instructions. Neurologically-intact control subjects were recruited from the Calgary community. Stroke subjects were recruited from the acute stroke and rehabilitation units at Foothills Hospital in Calgary. The study was approved by the University of Calgary Ethics Board, and all subjects provided informed consent.
Robotic kinesthesia task
Subjects were seated in the robotic exoskeleton (Fig. 1a) with their arms supported by gravity. Each subject was custom-fitted and calibrated in the robot based on their limb geometry by one of three experienced robot operators.
In brief, in the KIN task (Fig. 1b) the robot moved the subjects’ stroke-affected arm at a predetermined speed, direction and distance. For each trial, subjects were instructed to mirror-match the direction while matching the speed and magnitude of the robotic movement as soon as they felt the robot begin to move. Neurologically-intact subjects were tested on both the dominant and non-dominant arms.
Subjects were initially set up in the robot by one rater (initial time-point) and the subject completed the KIN robot assessment one time without the use of vision and one time with visual feedback of the limbs. The condition without vision always preceded the condition with vision to avoid the potential confound of subjects using visual cues about target location learned in the condition with vision.
At a second time-point, subjects were run in a second session, where the subject was custom fitted by a second robot operator. Subjects then completed the KIN task again, both with and without the use of vision. Three subjects with stroke did not complete the KIN task in the condition with vision (N = 12), due to subject time constraints.
Robotic and statistical analyses
For each subject, we computed the mean of 8 robotic parameters across the 36 trials to quantify kinesthetic performance : 1) Initial Direction Error (IDE) – angular deviation relative to the direction of the robotic movement; 2) Path Length Ratio (PLR) – length of matching movement relative to the length of the robotic movement; 3) Response Latency (RL) – time to initiate a matching movement in response to the robot movement, 4) Peak Speed Ratio (PSR) – peak speed of the matching movement relative to the peak speed of the robotic movement. We also calculated the variabilities for each of the individual parameters to evaluate consistency of error (IDEv, PLRv, RLv, PSRv). To evaluate the inter-rater reliability of each parameter, we computed two-way random average measures intra-class correlations (ICCs) .
To determine overall task performance, we computed normalized z-scores for each parameter. These scores were compared to 95% normative ranges derived from a large sample of neurologically intact subjects (N = 166), a group that includes the control subjects described in this study. We considered the potential influence of age, sex and handedness on task performance . If a subject scored outside of the 95% range (one-tailed, z > 1.65), they were determined to have failed the individual parameter, as lying outside the 95% range indicates that behavior on that parameter was significantly different from controls. For overall task performance, subjects who failed more than 2 out of 8 parameters were determined to have failed the task. This failure threshold was determined based on the fact that only 5% of the sample of 166 neurologically intact subjects fall outside the normative range on 3 or more parameters.
Subjects in both the neurologically intact and stroke groups were evaluated for handedness with the Edinburgh Handedness Inventory . Subjects with stroke were evaluated on a variety of clinical measures: 1) Functional Independence Measure, which measures functional ability in motor and cognitive domains, and is scored out of 126 ; 2) Behavioural Inattention Test, which evaluates the presence or absence of visuospatial deficits via six conventional subtests (line bisection, letter cancellation, star cancellation, line cancellation, figure copying and drawing) and is scored out of 146 ; 3) Thumb Localization Test, which measures proprioceptive impairment, and is scored on a 4-point scale (0 indicates intact ability to find the thumb, 1 indicates ability to locate the thumb via locating the wrist, 2 indicates ability to locate the thumb via locating the arm, and 3 indicates completely unable to locate the thumb) ; 4) Chedoke-McMaster Stroke Assessment, which measures motor impairment of the arm and hand, and is scored on a 7-point scale (1 = flaccid paralysis, 2 = no voluntary movement, but spasticity present, 3 = marked spasticity and synergy patterns, 4 = decrease in spasticity and synergy patterns, 5 = mild spasticity, synergy pattern present but can be reversed, 6 = indicates near normal movement, 7 = indicates normal movement) ), 5) Purdue Pegboard, which evaluates manual dexterity by requiring subjects to insert as many pegs into holes as they can in 30 sec .
The neurologically-intact group (N = 25) was an average age of 38.3 ± 13.0 (SD) years old, 17 subjects were female, eight subjects were male, 22 were right-handed, and three were left-handed. The stroke group (N = 15) (Table 1) was an average age of 54.5 ± 13.6 (SD) years old, three subjects were female, 12 subjects were male, 14 subjects were right-handed, and 1 subject was left-handed. Subjects with stroke were tested, on average 62.4 ± 63.4 days post-stroke. One subject (subject 10) had visuospatial neglect as determined by the Behavioral Inattention Test. This subject was also classified as having moderate (≥40 or ≤ 80) impairment on the Functional Independence Measure. All other subjects scored within the mild functional impairment range (>80) on the Functional Independence Measure. The time from initial session to the retest session of the robot for neurologically-intact subjects was 1.3 h (median, range = [0.6–192.4 h]) and 21.5 h for stroke subjects (range = [0.3–52.8 h], Table 1). Clinical scores for stroke subjects are reported in Table 1.
We compared overall robotic performance for the initial and retest sessions for neurologically-intact subjects and subjects with stroke. We found that, in the no vision condition, one neurologically-intact subject failed the KIN task (failed > 2 parameters) in the initial session, but no neurologically intact subjects failed the KIN task in the retest session. In comparison, we found that 40% (N = 6) of stroke subjects failed the KIN task in both the initial and retest sessions.
Inter-rater reliability of robotic measures
Figure 1c presents the results of an exemplar neurologically-intact subject during the initial (left panel) and retest (right panel) sessions in the KIN task without vision. Performance on the KIN task was similar for this subject on the initial session (RL = 241.0 ms, PSR = 1.2, IDE = 13.7°, PLR = 1.1) and the second session (RL = 260.6 ms, PSR = 0.96, IDE = 13.5°, PLR = 1.0). In comparison, the subject with stroke (Fig. 1d) qualitatively shows obvious impairment in the initial and second session. The subject with stroke, however, also had similar results in both the initial session (RL = 1299.5 ms, PSR = 1.3, IDE = 32.1°, PLR = 1.3) and second session (RL = 1078.5 ms, PSR = 1.5, IDE = 35.3°, PLR = 1.5).
When we examined the relationship between initial and second performance for each of the 8 parameters in KIN without vision, we found that most parameters had high ICCs (Fig. 2, Table 2) (r-values, RL = 0.95, RLv = 0.94, PSR = 0.72, PSRv = 0.80, IDE = 0.86, IDEv = 0.83, PLR = 0.69, PLRv = 0.93). When subjects completed the task with the use of vision, we found that ICCs of most KIN parameters were similarly high (Fig. 3) (r-values, RL = 0.97, RLv = 0.96, PSR = 0.90, PSRv = 0.53, IDE = 0.94, IDEv = 0.93, PLR = 0.61, PLRv = 0.95).
We examined the inter-rater reliability of each of the subject groups independently and generally found that inter-rater reliability was higher for the stroke group compared to neurologically-intact controls in both the No Vision and Vision conditions (Table 2).
A major issue in the field of neurorehabilitation is that there is generally a poor understanding of the characteristics and recovery of proprioceptive deficits after stroke. These deficits are often difficult to detect and measure clinically, and can even be mistaken for motor deficits . Currently, there is no gold standard for evaluating proprioception or its’ sub-modalities (position sense, kinesthesia)  after stroke. Further, it is thought that many of the current clinical measures of sensory impairment are not sensitive enough to detect clinically meaningful changes in proprioceptive function over time [30, 31], necessitating the development of new more sensitive measurement tools (i.e., robotics). Clinical tests, such as the Thumb Localizer Test, test position sense directly, and subcomponents of other clinical tests (Rivermead Assessment of Somatosensory Performance, Nottingham Sensory Assessment, Fugl-Meyer) evaluate elements of proprioception. Typically, the inter-rater reliability of many of these measures have been reported anywhere from poor to excellent [13, 32, 33]. Oftentimes, authors will limit an evaluation scale (0–2 vs 0–10), which generally leads to better reliability. However, this is clinically problematic because it typically produces a concomitant decrease in sensitivity to detect change.
Here we present a reliable robotic tool that can identify deficits in kinesthesia after stroke. Performance on the assessment has been previously shown to correlate with a number of clinical measures (e.g., Functional Independence Measure, Chedoke-McMaster Stroke Assessment, etc.) [4, 5, 22]. In the present manuscript, we focused on evaluating the inter-rater reliability of the robotic kinesthetic matching task. In general, we found that the inter-rater reliability for parameters within the KIN task was very high. We believe that developing objective and sensitive tools for measuring proprioception can significantly improve knowledge of proprioceptive impairment after stroke and can be applied to neurorehabilitation practice.
In testing inter-rater reliability, we are evaluating consistency of results when different individuals operate the robot. Over the years, many bedside clinical measures have been examined for both intra- and inter- rater reliability. The Nottingham Sensory Assessment, which evaluates several aspects of sensory function (tactile and kinesthetic sensation), has previously demonstrated poor inter-rater reliability . Other investigators have specifically examined inter-rater reliability of the proprioceptive components of the Rivermead Assessment of Somatosensory Performance  and the Nottingham Sensory Assessment . Their findings demonstrated low to fair agreement on many proprioceptive parameters with values ranging from r = 0.25 to 0.36 for the Rivermead (average = 0.31) and κ = 0.31–0.73 (average = 0.49) for the Nottingham. Other measures and methodologies for evaluating proprioception have shown fair to good test-retest reliability [34,35,36], but these measures also tend to rely on ordinal scales that are typically less sensitive to specific components of proprioceptive impairment. Our robotic parameters have, on average, very high inter-rater agreement and utilize continuous scales which should prove to be more sensitive than the simplistic ordinal scales used in the Rivermead or the Nottingham. Another advantage is the ability to evaluate specific kinematic aspects of kinesthesia (e.g., impairment in matching sensed movement length vs impairment in matching sensed movement speed).
Establishing a kinesthetic measure with high inter-rater reliability, as with the robotic kinesthesia task, is important for advancing the use of objective measures that are sensitive to impairments in proprioception. The development of such assessments is pivotal to advancing our understanding of post-stroke impairment and recovery. It is also vital for application of this knowledge to neurorehabilitative practices and therapies. Our robotic kinesthesia task allows for accurate and reliable measurement of kinesthesia following stroke. An advantage of this task is that it utilizes objective, continuous data that easily allows for comparison of individuals with stroke to neurologically-intact controls. However, a limitation of this task is that we observe that some parameters perform better than others when tested for inter-rater reliability (e.g., PLR (without vision), r = 0.69, Fig. 2; PSRv (with vision), r = 0.53, Fig. 3), it is possible that these parameters are more susceptible to within subject variability and may be less robust indicators of kinesthetic reliability. However, these values are still higher than those reported for most of the existing clinical measures of proprioception.
We find that our robotic measurement of kinesthesia has very good inter-rater reliability in neurologically intact subjects and individuals with stroke. Validation of reliable, objective methods for quantifying kinesthesia in stroke is important in order to aide future identification of specific stroke-related impairments in proprioception. We believe this robotic assessment of kinesthetic impairment will aid in future identification of specific stroke-related impairments and will help us to better understand how various stroke-related impairments in kinesthesia contribute to functional deficits both on their own and in combination with motor impairments. This is significant because it allows us to identify whether potential treatments for kinesthesia are effective as well as allowing a better understanding of how various impairments in kinesthesia change over time in response to interventions.
Initial direction error
Initial direction error variability
Kinesthetic matching task
Path length ratio
Path length ratio variability
Peak speed ratio
Peak speed ratio variability
Response latency variability
Thumb localizer test
Sherrington C. On the proprio-ceptive system, especially in its reflex aspect. Brain. 1907;29(4):467–82.
Connell LA, Lincoln NB, Radford KA. Somatosensory impairment after stroke : frequency of different deficits and their recovery. Clin Rehabil. 2008;22:758–67.
Dukelow SP, Herter TM, Moore KD, Demers MJ, Glasgow JI, Bagg SD, et al. Quantitative assessment of limb position sense following stroke. Neurorehabil Neural Repair. 2010;24(2):178–87.
Semrau JA, Herter TM, Scott SH, Dukelow SP. Robotic identification of kinesthetic deficits after stroke. Stroke. 2013;44:3414–21.
Semrau JA, Herter TM, Scott SH, Dukelow SP. Examining differences in patterns of sensory and motor recovery after stroke with robotics. Stroke. 2015;46:3459–69.
Carey L. Somatosensory loss after stroke. Crit Rev Phys Rehabil. 1995;13:51–91.
Rand D, (Tamar) Weiss PL, Gottlieb D. Does Proprioceptive Loss Influence Recovery of the Upper Extremity After Stroke? Neurorehabil. Neural Repair. 1999;13:15–21.
Winward CE, Halligan PW, Wade DT. Somatosensory recovery: a longitudinal study of the first 6 months after unilateral stroke. Disabil Rehabil. 2007;29:293–9.
Campfens SF, Zandvliet SB, Meskers CGM, Schouten AC, Van Putten MJAM, Van Der Kooij H. Poor motor function is associated with reduced sensory processing after stroke. Exp Brain Res. 2015;233:1339–49.
Lincoln N, Crow J, Jackson J, Waters G, Adams S, Hodgson P. The unreliability of sensory assessments. Clin Rehabil. 1991;5:273–82.
Bickley L, Szilagyi P. Bates’ guide to physical examination and history-taking. Home Healthc Nurse J Home Care Hosp Prof. 2012;13:992.
Hirayama K, Fukutake T, Kawamura M. “Thumb localizing test” for detecting a lesion in the posterior column-medial lemniscal system. J Neurol Sci. 1999;167:45–9.
Lincoln NB, Jackson JM, Adams SA. Reliability and revision of the Nottingham sensory assessment for stroke patients study 1: revision of the NSA. Physiotherapy. 1998;84:358–65.
Gladstone DJ, Danells CJ, Black SE. The fugl-meyer assessment of motor recovery after stroke: a critical review of its measurement properties. Neurorehabil Neural Repair. 2002;16:232–40.
Carey LM, Oke LE, Matyas TA. Impaired limb position sense after stroke: a quantitative test for clinical use. Arch Phys Med Rehabil. 1996;77:1271–8.
Schwamm LH, Pancioli A, Acker JE, Goldstein LB, Zorowitz RD, Shephard TJ, et al. Recommendations for the establishment of stroke systems of care: recommendations from the American Stroke Association’s Task Force on the Development of Stroke Systems. Stroke. 2005;36:690–703.
Simo LS, Ghez C, Botzer L, Scheidt RA. A quantitative and standardized robotic method for the evaluation of arm proprioception after stroke. Conf Proc IEEE Eng Med Biol Soc. 2011;2011:8227–30.
Simo L, Botzer L, Ghez C, Scheidt RA. A robotic test of proprioception within the hemiparetic arm post-stroke. J Neuroeng Rehabil. 2014;11:77.
Coderre AM, Zeid AA, Dukelow SP, Demmer MJ, Moore KD, Demers MJ, et al. Assessment of upper-limb sensorimotor function of subacute stroke patients using visually guided reaching. Neurorehabil Neural Repair. 2010;24:528–41.
Winstein CJ, Stein J, Arena R, Bates B, Cherney LR, Cramer SC, Deruyter F, Eng JJ, Fisher B, Harvey RL, Lang CE. Guidelines for adult stroke rehabilitation and recovery. Stroke. 2016;47(6):e98–169.
Mercier L, Audet T, Hébert R, Rochette A, Dubois M. Impact of motor, cognitive, and perceptual disorders on ability to perform activities of daily living after stroke. Stroke. 2016;32(11):2602–9.
Semrau JA, Wang JC, Herter TM, Scott SH, Dukelow SP. Relationship between visuospatial neglect and kinesthetic deficits after stroke. Neurorehabil Neural Repair. 2015;29:318–28.
Mcgraw KO, Wong SP. Forming inferences about some intraclass correlation coefficients. Psychol Methods. 1996;1:30–46.
Oldfield R. The assessment and analysis of handedness. Neuropsychologia. 1971;9:97–113.
Keith RA, Granger CV, Hamilton BB, Sherwin FS. The functional independence measure: a new tool for rehabilitation. Adv Clin Rehabil. 1987;1:6–18.
Halligan PW, Cockburn J, Wilson BA. The behavioural assessment of visual neglect. Neuropsychol Rehabil An Int J. 1991;1:5–32.
Gowland C, Stratford P, Ward M, Moreland J, Torresin W, Van Hullenaar S, et al. Measuring physical impairment and disability with the Chedoke-McMaster Stroke Assessment. Stroke. 1993;24:58–63.
Tiffin J, Asher EJ. The Purdue Pegboard: norms and studies of reliability and validity. J Appl Psychol. 1948;32:234–47.
Dukelow SP, Herter TM, Bagg SD, Scott SH. The independence of deficits in position sense and visually guided reaching following stroke. J Neuroeng Rehabil. 2012;9:72.
Connell LA, Tyson SF. Measures of sensation in neurological conditions: a systematic review. Clin Rehabil. 2012;26:68–80.
Hillier S, Immink M, Thewlis D. Assessing proprioception: a systematic review of possibilities. Neurorehabil Neural Repair. 2015;29:933–49.
Winward CE, Halligan PW, Wade DT. The Rivermead Assessment of Somatosensory Performance (RASP): standardization and reliability data. Clin Rehabil. 2002;16:523–33.
Stolk-Hornsveld F, Crow JL, Hendriks EP, van der Baan R, Harmeling-van der Wel BC. The Erasmus MC modifications to the (revised) Nottingham Sensory Assessment : a reliable somatosensory assessment measure for patients with intracranial disorders. Clin Rehabil. 2006;20:160–72.
Juul-kristensen B, Lund H, Hansen K, Christensen H, Danneskiold-Samsøe B, Bliddal H. Test-retest reliability of joint position and kinesthetic sense in the elbow of healthy subjects. Physiother Theory Pract. 2008;24:65–72.
Lonn J, Crenshaw AG, Djupsjobacka M, Johansson H. Reliability of position sense testing assessed with a fully automated system. Clin Physiol. 2000;20:30–7.
Anstey KJ, Smith GA, Lord S. Test-retest reliability of a battery of sensory, motor and physiological measures of aging. Percept Mot Skills. 1997;84:831–3.
We would like to thank Janice Yajure, Megan Metzler, Mark Piitz and Sophie Gobeil for data collection.
This work was supported by funding from the Canadian Institutes of Health Research (MOP 106662) and a Heart and Stroke Foundation Grant-in-Aid. JAS was supported by an Alberta Innovates Health Solutions (AIHS) post-graduate fellowship.
Availability of data and materials
The authors do not wish to share the data included in this manuscript, as our university ethics does not allow for public sharing of data.
JAS designed and completed data and statistical analyses, as well as drafting the manuscript and figures. TMH, SPD and SHS designed the robotic kinesthetic task, provided input on data and statistical analyses, and were involved in editing the manuscript. All authors read and approved the final manuscript.
SHS is co-founder and chief scientific officer of BKIN technologies, the company that manufactures the robotic exoskeleton. All other authors (JAS, TMH, SPD) have no competing interests to declare.
Consent for publication
Additionally, signed consent for the inclusion of images was obtained from subjects.
Ethics approval and consent to participate
The study was approved by the Conjoint Health Research Ethics Board (CHREB) at the University of Calgary, and all subjects provided informed consent.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Semrau, J.A., Herter, T.M., Scott, S.H. et al. Inter-rater reliability of kinesthetic measurements with the KINARM robotic exoskeleton. J NeuroEngineering Rehabil 14, 42 (2017) doi:10.1186/s12984-017-0260-z
- Inter-rater reliability