A robotic test of proprioception within the hemiparetic arm post-stroke

Background Proprioception plays important roles in planning and control of limb posture and movement. The impact of proprioceptive deficits on motor function post-stroke has been difficult to elucidate due to limitations in current tests of arm proprioception. Common clinical tests only provide ordinal assessment of proprioceptive integrity (eg. intact, impaired or absent). We introduce a standardized, quantitative method for evaluating proprioception within the arm on a continuous, ratio scale. We demonstrate the approach, which is based on signal detection theory of sensory psychophysics, in two tasks used to characterize motor function after stroke. Methods Hemiparetic stroke survivors and neurologically intact participants attempted to detect displacement- or force-perturbations robotically applied to their arm in a two-interval, two-alternative forced-choice test. A logistic psychometric function parameterized detection of limb perturbations. The shape of this function is determined by two parameters: one corresponds to a signal detection threshold and the other to variability of responses about that threshold. These two parameters define a space in which proprioceptive sensation post-stroke can be compared to that of neurologically-intact people. We used an auditory tone discrimination task to control for potential comprehension, attention and memory deficits. Results All but one stroke survivor demonstrated competence in performing two-alternative discrimination in the auditory training test. For the remaining stroke survivors, those with clinically identified proprioceptive deficits in the hemiparetic arm or hand had higher detection thresholds and exhibited greater response variability than individuals without proprioceptive deficits. We then identified a normative parameter space determined by the threshold and response variability data collected from neurologically intact participants. By plotting displacement detection performance within this normative space, stroke survivors with and without intact proprioception could be discriminated on a continuous scale that was sensitive to small performance variations, e.g. practice effects across days. Conclusions The proposed method uses robotic perturbations similar to those used in ongoing studies of motor function post-stroke. The approach is sensitive to small changes in the proprioceptive detection of hand motions. We expect this new robotic assessment will empower future studies to characterize how proprioceptive deficits compromise limb posture and movement control in stroke survivors.


Background
Motor impairments are the most frequent and conspicuous deficits that occur after stroke, and many clinical tests have been developed to obtain standardized measures of limb mobility. By some reports however, more than 50% of survivors exhibit somatosensory deficits that negatively impact quality of life and rehabilitation outcome [1][2][3]. Somatosensory deficits may involve any of the various proprioceptive sensors that signal the physical state of the limb (muscle spindle receptors, Golgi tendon organs and mechanoreceptors in the skin) [4]. Proprioception includes the sense of position, motion and effort. Proprioception of limb kinematics is particularly important for forming the feedforward motor commands that coordinate the arm's complex nonlinear dynamics during reaching [5][6][7][8] and the feedback commands that stabilize the hand against environmental perturbation [9]. Because experimental evidence suggests that these two aspects of control may be differentially affected post-stroke [10], increased understanding of how proprioceptive deficits compromise control of limb posture and movement may prove useful for developing new physical rehabilitation strategies specifically targeting each aspect of control after stroke and for determining which therapeutic approach might best benefit any given patient.
Clinical assessments of proprioception currently suffer poor reliability [11][12][13] and lack resolution (i.e. they only provide an ordinal classification of proprioceptive integrity: intact, impaired or absent). Consequently, several groups have designed standardized tests [13][14][15][16] and automated procedures [17][18][19] that quantify somatosensory deficits by assessing a person's ability to identify the static posture of an unseen limb or to indicate when a passively moved limb has changed position. While automated tests can provide greater resolution than clinical tests of proprioception, currently-proposed robotic tests require subjects to actively match the stationary position [17,18,20] or motion [19] of one limb with the other. Limb matching tests require integration of proprioceptive information across the two limbs. Therefore, proprioceptive deficits in either limb as well as deficits in the central integration of that sensory information can compromise test performance. For determining how deficits in proprioceptive perception contribute to motor control deficits, it seems necessary to assess proprioceptive perception within the moving limb itself rather than across limbs.
Here we describe an automated test of proprioceptive integrity in the upper extremity that does not require the integration of sensory information from both arms. Our approach focuses on kinesthetic proprioception by requiring people to detect small position-or force-perturbations applied to one hand within the context of a two-interval, two-alternative forced choice test. We compared this approach to a standard clinical test of proprioception within a small cohort of volunteers to evaluate the ability of the new technique to discriminate individuals with proprioceptive deficits from those without. We identified two ratiometric outcome variables [21] that, when considered together, discriminate individuals with vs. without proprioceptive deficits at least as well as a common clinical test of proprioception. We repeated the robotic assessment on separate days spaced > 1 week apart in an initial evaluation of test-retest reliability. Repeat classification yielded excellent agreement with initial testing but also revealed a subtle learning effect across days of practice on the task. Thus, the robotic procedure we describe can offer a reliable, sensitive and ratiometric test of proprioceptive function.

Methods
Sixteen unilateral, hemiparetic stroke survivors (SS) and sixteen neurologically intact control (NIC) subjects gave written, informed consent to participate in this study, which was approved by IRB committees at Marquette and Northwestern Universities. Seven SS had clinically-determined proprioceptive deficits and are referred to as SS-P participants. The remaining SS are referred to as SS + P. All SS-P and two SS + P had tactile deficits as well. All SS were recruited from the pool of outpatients at the Rehabilitation Institute of Chicago and ranged in age from 36 to 69 years. Inclusion criteria for SS included upper-extremity Fugl-Meyer (FM) scores ranging between 15-60 (out of 66) ( Table 1). Exclusion criteria included presence of neurological or muscular disorders that interfere with neuromuscular function and current use of agents that may interfere with neuromuscular function. NIC subjects (34 to 66 yrs) were age matched (±5 yr) to the stroke survivors.
All SS participated in an initial evaluation session wherein the same physician assessed motor function and impairment with the subject seated in an armless chair. Clinical assessments included: visual field/visual search evaluation [22]; the upper extremity portion of the Fugl-Meyer assessment [23]; the modified Ashworth scale (MAS) assessing abnormal muscle tone at the shoulder and elbow -MAS scores were averaged across the two joints and across testing directions (flexion, extension) to estimate abnormal muscle tone in the upper extremity [24]; hand dynamometer grip strength (Fabrication Enterprises, Irvington, NY); and evaluation of proprioceptive and tactile discrimination with the subject's eyes closed. Proprioception at the shoulder, elbow, wrist, and metacarcophalangeal articulations was evaluated by moving the tested joint up and down (the "up or down?" test [25,26]). When the joint stopped moving, the subject was to indicate joint position. Six repetitions were performed at each joint. If response was brisk and accurate (i.e. they made no errors), proprioception was rated "intact"; if the subject was unable to respond with confidence (i.e. they made 1 error), proprioception was rated "impaired"; if the subject was unable to determine position (2 or more errors), proprioception was rated "absent". Touch was evaluated by the two-point discrimination test using an aesthesiometer [27]. Subjects indicated whether they felt one or two points of contact at finger tips, hand, forearm, and upper arm (six repetitions each); if response was brisk and accurate, tactile discrimination was rated "intact"; if the subject was unable to respond with confidence, it was rated "impaired"; if the subject was unable to discriminate between one and two points, it was rated "absent".

Experimental procedures
Subjects were seated in a high-backed chair. A harness minimized trunk movement. Subjects interacted with a planar robot [28] ( Figure 1A). The upper arm was supported against gravity (75°to 90°abduction;~60°horizontal flexion) using a light-weight, chair-mounted support. Subjects wore a wrist brace on their paretic (SS) or right (NIC) hand to limit motion to the shoulder and elbow. The brace was fixed to the robot handle. An opaque screen placed above the plane of hand motion occluded direct view of shoulder, arm and robot.

Training task
A tone discrimination task familiarized all participants with the two-alternative forced choice procedure. This task also evaluated each subject's ability to comprehend multistep instructions, to concentrate, and to use working memory to encode and recall sensory stimuli for comparison in a forced-choice decision process. The task included 24 trials, each of which included two, 3 s observation intervals presented in close succession ( Figure 1B). In one interval (randomly selected), a constant-pitch, 500 Hz tone was embedded in auditory white noise ( Figure 1C, left). In the other interval, a conspicuously rising pitch was embedded within the noise ( Figure 1C, right). Immediately after the second interval, the subject was to indicate via a two-button response box ( Figure 1A) whether the first or second noise masked the rising pitch.
Primary experiment: arm movement detection 14 SS (seven with impaired/absent proprioception) and 11 NIC participated in this experiment. We tested each participant's ability to detect hand displacements of various magnitudes in a series of 130 trials performed at a single, comfortable location in the center of the arm's reachable workspace. We programmed the robot to generate whatever force needed (within limits) to enforce desired displacements. Hand displacements were composed of separate sum-of-sinusoids in the "X" (1.75 Hz, 1.2 Hz and 0.25 Hz) and "Y" directions (1.65 Hz, 1.1 Hz and 0.25 Hz). As in the training task, each trial included two observation intervals marked by 3.0 s of auditory white noise and an intervening 1.0 s of silence. One interval (randomly selected) included a perturbation ( Figure 1D, left) whereas the other included no perturbation ( Figure 1D, right). Participants were instructed to indicate via response box whether the first or second noise masked the hand motion. A fixed set of 9 perturbation magnitudes (w) ranged from 0.0 to 1.0 cm ( Figure 1D). Each perturbation was presented between 10 and 20 times in pseudorandom order following the Method of Constant Stimuli [29]. To assess test-retest reliability, seven SS and five NIC returned to the lab >1 week later for repeat testing.

Supplemental experiment: hand force detection
The experimental approach we describe may easily be extended to assess perceptibility of other stimuli commonly used to evaluate motor performance and control following stroke. As a demonstration, we performed a final experiment quantifying detection of force-perturbations applied to the hand. Seven SS (three with impaired or absent proprioception) and seven NIC participants attempted to detect sum-of-sinusoid force perturbations in this singlesession experiment comprised of 130 trials. We programmed the robot to generate a specific temporal pattern of desired force vectors regardless of hand position in the workspace. Stimulus detection in this task was therefore driven by controlled hand-force perturbations whereas the magnitude of resulting hand displacement was allowed to vary freely. Nine perturbation magnitudes (w's) spanned the range from 0.0 N to 2.5 N. The within-trial sequence of events and psychophysical task were otherwise as described for the primary experiment. Five of the SS and two of the NIC participants had previously participated in the primary study (see Table 1).

Data analysis
The training task screened for participants unable to perform the complex psychophysical discrimination task due to an inability to follow multi-step instructions, working memory impairments, and/or attention deficits. We analyzed task performance using an equal-variance Gaussian model of the two-alternative forced-choice task [30]. In that model, the sensations of the signals within the two observation intervals (X noise , X signal+noise ) are drawn from Gaussian distributions with identical variances (σ 2 ) but different means (μ noise , μ signal + noise ). Under the assumption of unbiased observation of the two stimuli, the probability of a positive difference X signal+noise -X noise (thus correctly identifying the interval containing signal) is: where Φ is the standard cumulative normal distribution function. For each participant, we discarded the first four training trials to account for initial task learning. We then calculated P C , the percentage of the remaining 20 trials wherein the interval with the rising tone was correctly identified. We defined the limit of acceptable performance as that which would be expected if the sensation variance of the participant under test (i.e. σ 2 test ) exceeded that expected of our neurologically-intact participants (σ 2 NIC ) by 50% or more. Given the observed variance of NIC performance, Eqn [1] suggests a minimum acceptable performance threshold of 80.4% (n.b. all NIC participants exceeded 90% correct). We therefore considered training task performance ≤80% as indicating potential concerns with the participant's general ability to perform a twoalternative forced choice test. While other thresholds could have been chosen (for example, allowing acceptable sensation variance to be twice normal yields a performance limit of 73.9%), an 80% threshold seemed a conservative and reasonable limit. For the motion detection task, we sought to verify the repeatability of robotic hand displacements despite marked variations in arm spasticity across the study population. We measured hand path length during the perturbation then averaged this value across trials, within each perturbation magnitude, for each participant. We also quantified reaction forces induced by the imposed motion using measures of hand force bias and variability. (We defined hand force bias as the average magnitude of horizontal planar hand force during perturbation.) We used these outcome measures to test the hypothesis that spasticity, as assessed by MAS score, would manifest as increased arm stiffness in response to imposed robotic perturbation.
For the primary and supplemental experiments, we characterized proprioceptive sensitivity using detection threshold (DT): the minimum magnitude of displacement (or force) that subjects begin to detect reliably when comparing to the no-perturbation condition. We characterized the acuity of proprioceptive sensation using choice uncertainty (CU), which quantifies variability of the individual's responses about the detection threshold.
More specifically, for each participant and each perturbation magnitude on each day, we calculated P correct , the percentage of trials wherein the perturbation interval was correctly identified. Because zero-magnitude perturbations were indistinguishable from their paired comparisons (which were also zero-magnitude), P correct at zero-magnitude was assigned a likelihood of 0.5. We then fit a cumulative-normal psychometric function (Matlab command: cdf) to P correct at the 9 perturbation magnitudes using nonlinear optimization (Matlab: fmincon), for each participant on each day: Here, DT was identified as the perturbation magnitude at which the fitted curve passed through 75% likelihood. CU was defined as one standard deviation of the underlying normal distribution. CU values are low when the slope of the psychometric function is steep whereas CU is high when the slope is shallow. The fit was constrained such that: 1) the quantity (DT -CU) > 0.0, and 2) the standard deviation of the underlying normal distribution was greater than 0.002 cm. The first constraint ensures that P(0) is close to 0.5. The purpose of the second constraint is to handle situations where subjects respond with perfect accuracy at and above some perturbation magnitude w i and at chance for all sample points below w i . In such cases, enforcing a minimum CU coerced the optimization to yield DT values centered within the range bounded by w i and the next lower magnitude.

Statistical hypothesis testing
For the primary experiment, we used analysis of variance (ANOVA) and post-hoc Bonferroni t-test to evaluate the effect of proprioceptive integrity (group: NIC, SS + P, SS-P) on performance variables DT and CU obtained during Day 1 testing. We then used repeated-measures ANOVA to test the repeatability of our psychophysical assessment of limb proprioceptive integrity for subjects who returned to the lab for Day 2 testing. For that analysis, we compared parameters DT and CU across testing sessions {Day 1, Day 2} and across the three participant groups (n = 5 NIC; n = 3 SS-P; n = 4 SS + P). We treated "group" and "session" as fixed factors and "subject" as a within-group random factor. For SS, we used linear regression to evaluate correlation between the variables {DT, CU} and performance on the auditory training task. For these participants, we also used multiple linear regression to evaluate correlation between the variables {DT, CU} and the various clinical measures {FM, MAS, Grip Strength}. For the supplemental experiment, we used one-way ANOVA to compare performance measures {DT, CU} across the three groups. Statistical testing was carried out in the Minitab computing environment (Minitab, State College, PA). Effects were considered significant at the α = 0.05 level. Table 1 summarizes the clinical test results for the SS group. No participant exhibited visual field deficits or hemispatial neglect. Between the SS-P and SS + P groups, two-sample t-test found no difference in spasticity as quantified by MAS (T = 0.73, p = 0.479). Across groups, all but one participant scored 85% or better on the training task, indicating that they commanded sufficient cognitive resources to perform the forced choice evaluations in the primary and supplemental experiments. We cannot say with confidence whether the remaining subject (SS04) was able to perform two-alternative discrimination tasks reliably because he responded correctly only 70% of the time when presented with contrasting stimuli that were readily discriminated by every other participant. Consequently, we excluded that subject from further statistical analysis.

Primary experiment: arm movement detection
We evaluated the repeatability of robotic perturbation by using repeated measures ANOVA to compare average total hand displacements across the three subject groups and across perturbation magnitudes ( Figure 2). Whereas perturbation strength influenced path length [F (8,224) = 1729, p < 0.0005], we found no effect of group [F (2,224) = 0.56, p = 0.57] and no interaction between factors [F (16,224) = 0.87, p = 0.61]. Thus, the robot imposed repeatable displacements (Figure 2, inset) despite large differences in spasticity across groups (Table 1). This finding is likely due to the fact that hand displacements in our test were very small in comparison to the large limb manipulations applied during MAS testing.
A separate repeated measures ANOVA revealed that mean hand force varied systematically across groups [F (2,224) = 68.13, p < 0.0005] but not across displacement magnitudes [F (8,224) = 0.03, p = 1.0]. The two SS groups produced more average force against the robot's handle than did the control group (NIC: 1.33 N ± 0.05 N; SS + P: 2.38 N ± 0.04 N; SS-P: 1.89 N ± 0.05 N). Post-hoc regression analysis found no significant correlation between mean hand force and MAS score (r 2 = .01, p = 0.703). Furthermore, repeated measures ANOVA revealed that the within-trial variability of hand force (i.e. σ f ) depended systematically on perturbation magnitude as expected [F (8,224) = 70.29, p < 0.0005], but did not vary across subject groups [F (2,224) = 0.18, p = 0.83]. Regression analysis found no correlation between σ f and MAS (r 2 = 0.02, p = 0.510). Because the hand displacement profile was spatially complex, we would have expected σ f to increase if the arm's mechanical stiffness had increased along with the MAS score. These findings suggest that elevated muscle tone in SS manifested as a hypertonic bias at the hand's testing location rather than a simple increase in the arm's mechanical stiffness.
Participant responses were well-fit by cumulative Gaussian functions of perturbation size (cf. Figure 3A). All participants readily detected large 1.0 cm displacements and all responded with chance accuracy at low perturbation magnitudes. In contrast, responses varied markedly across groups for moderate perturbation magnitudes. For SS, linear regression analyses found no support for a correlation between either of our primary performance measures (DT, CU) and upper extremity FM, MAS or paretic hand-grip scores (p > 0.05 in every case).
A one-way ANOVA of DAY 1 testing revealed a main effect of group on hand motion DT [F (2,22) = 29.31, p < 0.0005]. Post-hoc Bonferroni t-test found the group effect due to higher thresholds in SS-P (0.32 ± 0.11 cm) relative to NIC (0.08 ± 0.03 cm) and SS + P participants (0.09 ± 0.04 cm) (p < 0.0005 in both cases), whose values did not differ from each other (p = 1.000). A separate, planned, one-way ANOVA found a similar main effect of group on choice uncertainty [F (2,22) = 54.74, p < 0.0005]. Post-hoc Bonferroni t-test revealed that the group effect was due to higher choice uncertainty in the SS-P group (0.14 ± 0.03 cm) relative to the NIC (0.03 ± 0.03 cm) and SS + P (0.02 ± 0.02 cm) groups (p < 0.0005 in both cases), whose values did not differ from each other (p = 1.000).
These outcomes were replicated in separate, two-way, repeated measures ANOVA performed on data from subjects who returned to the lab for a second day of displacement detection testing. Here, we observed a main effect of group on DT [F (2,9) = 29.45, p < 0.0005], a main effect of testing day on DT [F (1,9) = 8.32, p = 0.018] but no interaction between factors [F (2,9) = 1.89, p = 0.206]. Post-hoc Bonferroni t-test found the group effect due to higher thresholds in SS-P relative to NIC and SS + P participants, whose values did not differ from each other ( Figure 3B). We then analyzed test-retest reliability by correlating DT values across days. We found strong correlation (r = 0.93) such that DT day2 = 0 + 0.77 DT day1 . That is, DT decreased~0.04 cm from Day1 to Day 2. Because this effect was seen in all three participant groups, it was likely a learning effect mediated by repeat exposure to the task. Note also that this training effect was much smaller than the effect of group, wherein thresholds differed more than 0.2 cm between SS-P and the other two groups.
A separate, planned, two-way repeated measures ANOVA found a main effect of group on choice uncertainty [F (2,9) = 16.30, p = 0.001] but no effect of testing day [F (1, 9) = 0.05, p = 0.822] or interaction between the factors [F (2,9) = 0.07, p = 0.933]. After collapsing across days, Bonferroni t-test revealed that the group effect was due to higher choice uncertainty in the SS-P group relative to the NIC and SS + P groups, whose values did not differ from each other ( Figure 3C). Test-retest analysis only found a modest correlation of CU values across days (r = 0.54) such that CU day2 = 0.01 + 0.84 CU day1 .
For subjects who performed above criterion on the auditory training task, linear regression found no evidence of correlation between training task performance and either DT or CU (p > 0.40 in both cases). If we instead include SS04, who performed below criterion, correlation between auditory task performance and CU achieved statistical significance (p = 0.012). Because many factors could lead to impaired performance in two-alternative forced-choice tasks -including failure to follow complex multi-step instructions, memory deficits and attention deficits -inability to perform the auditory test suggests that subject SS04 may have had deficits unrelated to proprioceptive integrity that biased performance on the proprioception test, with choice uncertainty seeming particularly sensitive to confound.
For NIC subjects, linear regression found no correlation between DT and CT values (r 2 = 0.07, p = 0.168) and so we estimated the likelihood of intact proprioception at each point in the two-dimensional plane encompassing all {DT, CU} data pairs as the product of the individual DT and CU cumulative likelihood functions from the NIC group ( Figure 4, color map). For our cohort of subjects, all NIC and SS + P data points were enclosed within the 99.9% isolikelihood contour whereas all SS-P data resided outside this same contour. This nonlinear contour yields excellent discriminability between subjects with proprioceptive deficits and those without. In the same way, the normative space established by the bivariate NIC {DT, CU} distribution provides an intuitive interpretation for all points on the {DT, CU} plane in terms of the likelihood of intact proprioception. The displacement detection test was reliable in the sense that retesting > 1 week later did not result in shift from low-to high-likelihood of intact proprioception or vice-versa.

Supplemental experiment: hand force detection
We next assessed participant ability to detect a range of hand force perturbations in a two-alternative forced choice experiment ( Figure 5A) to determine whether a hand force detection task might also be sensitive to proprioceptive deficits in a small cohort of stroke survivors. Participant responses were again fit by cumulative Gaussian functions of perturbation magnitude ( Figure 5B). ANOVA found a main effect of group on hand force DT [F (2,13) = 38.69, p < 0.0005]. Bonferroni t-test revealed that the group effect was due to higher detection thresholds in the SS-P group (0.72 ± 0.12 N) relative to NIC (0.32 ± 0.06 N) and SS + P (0.27 ± 0.03 N) groups (p < 0.0001 in both cases), whose values did not differ from each other (p = 0.957; Figure 5C). A separate ANOVA found a main effect of group on CU [F (2,12) = 5.20, p = 0.028]. Bonferroni t-test found the group effect due to higher uncertainty in SS-P (0.27 ± 0.14 N) relative to SS + P participants (0.09 ± 0.07 N) (p = 0.039); the difference between the NIC (0.14 ± 0.04 N) and SS-P group did not survive Bonferroni correction (p = 0.065). Choice uncertainty values in the SS + P and NIC groups did not differ from each other (p = 1.00) ( Figure 5D). By considering detection threshold and choice uncertainty together, we again found that an iso-likelihood contour could separate SS with proprioceptive deficits from those without ( Figure 5E). Although a SS + P participant with tactile deficits, paresthesia, and pain in the arm (SS09; Figure 5E, encircled triangle) had a DT value very close to that of the SS-P participants, this subject's {DT, CU} data point fell within the 95% confidence bound of the NIC distribution whereas all SS-P data points were outside that contour. Because the bivariate NIC distribution is broader in Figure 5E vs. Figure 4 and because  one of the SS-P data points fell within the 99% contour in Figure 5E, empirical results suggest that controlled displacements may provide a better assessment of proprioceptive integrity than force perturbations when using our approach.

Discussion
Limb movements and environmental interactions are known to be adversely impacted by proprioceptive deficits [10,19,24,31]. However, the causal relationship between proprioceptive deficits and control deficits after stroke is less well understood [32]. Current clinical tests lack resolution to detect small differences in proprioceptive integrity that might reveal, for example, whether there is some sensory threshold below which the likelihood of regaining useful function of the hemiparetic limb is practically small. We therefore developed a psychophysical assessment of proprioceptive integrity using limb displacements and force perturbations, two types of robotic manipulations commonly used to study motor function in stroke survivors. Our approach demonstrates excellent ability to discriminate limbs with impaired or absent proprioception (determined clinically) from those with intact proprioception. The displacement detection test is also reliable; upon retesting > 1 week later, no participant shifted from lowto high-likelihood of intact proprioception or vice-versa. Moreover, variations in observed DT and CU values were largely due to participant group membership, although a subtle learning effect was evident across sessions in all participant groups. Thus, the robotic assessment described here can characterize proprioceptive deficits post-stroke with a resolution that is superior to that of common clinical tests. Ultimately, we expect this new approach will be useful for characterizing how impaired proprioception contributes to motor control deficits after stroke.
Comparison to the "up or down" test of upper extremity proprioception Current clinical tests of somatosensation post-stroke are useful because they are easy to administer and can give clinicians a quick, rough estimate of a patient's proprioceptive and tactile integrity. However, the clinical test administered herethe "up or down?" test -is limited in that there are only three possible grades of proprioception: intact, impaired and absent. By collapsing a continuum of impairment onto just three ordinal classes, the clinical test sacrifices measurement resolution for speed and ease of administration. By contrast, the robotic test introduced here yields a pair of ratiometric performance variables (DT, CU) that, when plotted within the normative performance space identified using data from a cohort of NI subjects, indicates the likelihood of proprioceptive impairment in the tested limb as well as a quantitative measure of that impairment.
Another limitation of the "up or down?" test is a ceiling effect [33], whereby subjects who claim impaired proprioceptive perception can nevertheless report the spatial orientation of their elbow and (especially) their shoulder joint accurately and reliably. This is likely due to the unavoidable production of secondary sensory cues by the clinician as she or he moves the proximal limb segments up and down. For example, manipulating the position of a relaxed shoulder can affect the posture of the trunk, cause clothing to shift against the skin, or can cause the head to move slightly. Each of these cues could conceivably be used to infer limb segment orientation. By contrast, manipulations of the wrist and fingers, as well as our robotic test, apply very small perturbations to the hand that are not likely to cause significant motion of the trunk, head or clothing. We measured and reported proprioceptive integrity in the distal limb segments and we grouped all SS with impaired/absent proprioception anywhere in the limb into the SS-P group to avoid this possible confound. The fact that all of the SS-P participants performed poorly on our robotic testeven those with "intact" proprioception at the elbow and/or shoulder -suggests that secondary sensory cues may indeed be a source of confound for the "up or down?" test.
A final advantage of the proposed robotic test over current clinical tests of proprioception is that our test is specifically designed to quantify proprioceptive sensitivity to horizontal planar perturbations similar to those currently used in ongoing studies of robotic interventions for promoting recovery of functional arm movement post-stroke. Future studies seeking to understand the impact of proprioceptive deficits in the control of limb posture and movement post-stroke should quantify proprioceptive sensitivity on a scale commensurate with the environmental perturbations used to challenge sensorimotor performance.

Comparison to other instrumented tests of upper extremity proprioception
Instrumented tests of arm proprioception fall into two general categories. In the first, one arm is moved passively and unseen to some reachable location and the subject's task is to actively match the stationary position of the hand [18,20] or configuration of the arm [17] with the other limb. These tests assess a person's ability to integrate information about the state of both limbs presumably from muscle spindles, which are sensitive to both muscle length and rate of length change. Such tests are ideally suited for assessing the integrity of the entire neuromuscular control arc spanning both limbs. However, limb matching tests are not ideal for determining how proprioceptive deficits in the hemiparetic limb contribute to motor deficits in that same limb. First, some stroke survivors (up to 20%) exhibit proprioceptive deficits in the ipsilesional arm, with most but not all of these also having proprioceptive deficits in their contralesional arm [3]. In addition, some stroke survivors also exhibit subtle motor deficits in their ipsilesional arm [34][35][36]. As a result, limb matching tests can confound proprioceptive deficits in the arm under evaluation with sensory and motor deficits in the matching limb.
Carey and colleagues have developed a position sense apparatus whereby an examiner moves the unseen wrist to a predefined test position and the subject indicates the perceived wrist angle by aligning a goniometric pointer with an imagined line linking the middle of the wrist and the index finger [14]. Although this approach avoids confounds associated with imitating or matching tasks that involve both limbs, Carey's test only assesses position sensibility at the wrist. It also lacks automation and does not evaluate the participant's ability to sense limb motion, which is also important for the control of multijoint movement [37,38].
The second test category requires participants to indicate (verbally, or otherwise) when they detect motion in a slowly moving arm (threshold to detection of passive motion) or when the moving limb's position matches a previously presented position (passive reproduction of joint position). Niessen and colleagues used both approaches and found that proprioceptive deficits at the shoulder and abnormalities in scapular kinematics correlate with the presence of shoulder pain [19]. Although the technique used by Niessen appears effective for testing proprioceptive integrity, we believe it possible to improve on their approach by using headphones or acoustic noise to mask audible motor noise from the manipulandum, which can provide subtle cues as to when the device imposes limb motions.
Our experimental design took such considerations into account. Like Carey [14] we avoided confounds associated with matching tasks that involve both limbs by applying perturbations to one limb and using the other only to indicatevia response boxwhether the perturbation was present in the first or second observation interval. As in the study by Niessen [19], we used a task in which arm muscles were stretched over a range of magnitudes and velocities rather than a task that assessed joint position sense because muscle spindle receptors respond to dynamic stimuli with large phasic responses that make them particularly well-suited for motion detection [39,40]. Moreover, because subjects were asked to relax their tested arm as the robot applied perturbations, performance in our test was not confounded by motor deficits in either arm. The button box minimized need for verbal communication during testing, which allowed us to use acoustic noise to mask potential robot noises. Finally, we implemented a training task that controlled for deficits in attention, memory or cognition that might confound the assessment of proprioceptive integrity.

Limitations
Our approach has some limitations. In current clinical practice, therapists use coarse assessments of proprioceptive integrity because they require minimal equipment aside from their own hands and because they are quick to administer. By contrast, the approach described here relies on a costly robot and takes a relatively long time to complete (~45 minutes). Whereas the cost of robotics will decrease over time, testing time must be reduced if the approach is to achieve clinical utility, such as for quantifying subtle week-to-week or month-to-month changes in proprioceptive sensitivity caused by therapeutic intervention. Nevertheless, the proposed robotic assessment will find immediate utility as a research tool because it quantifies proprioceptive sensitivity to limb motion on a ratio scale that is directly commensurate with environmental perturbations used in ongoing studies of rehabilitation robotics. Future studies looking to understand the impact of proprioceptive deficits in the control of limb posture and movement post-stroke should quantify proprioceptive integrity using the same environmental perturbations used to challenge and evaluate motor control and performance. The present study shows how this can be done using both displacement and force perturbations applied to the hand.
Another limitation of our approach derives from our use of a two-alternative forced-choice task to assess proprioceptive integrity. Two-alternative forced-choice tasks require subjects to commit to memory sensory stimuli experienced during an initial observation interval, and to recall those memories for subsequent comparison with stimuli experienced during a second interval. Deficits in comprehension, ability to follow multi-step instructions, attention and/or memory could all negatively impact performance of any two-alternative forced choice task. Here we used an auditory discrimination task to screen participants for cognitive deficits that would impede performance in the proprioception test. We also provided a sound rationale, based on signal detection theory, for excluding individuals from proprioception testing if they exhibit poor performance in the training task. (If hearing loss were suspected, a visual analog of the training task could readily be devised.) The study itself was limited in the breadth of clinical assessments used to characterize the stroke survivors. We observed a lack of correlation between our two outcome variables (DT, CU) and two common measures of motor impairment (FM and MAS). This is understandable because central lesions that compromise motor pathways can spare sensory pathways and vice versa. It would be informative therefore for a future study to determine the extent to which proprioceptive integrity, as measured using the proposed robotic technique, is predictive of other important behavioral characteristics such as arm function [41] and arm use [42] post-stroke. Other future studies could use the proposed test to determine whether the ability of central [43] and peripheral [44] stimulation techniques to enhance motor control derives in part from enhanced central processing of proprioceptive signals or solely by enhancing motor output. Finally, increased understanding of the relationship between proprioceptive deficits and deficits in the control of limb posture [45] and movement [8] may prove useful in identifying patients for whom sensory replacement [46] or augmentation [47] techniques could promote effective control of limb interactions with objects in the environment.
Finally, the current study is limited in that we have tested only a small number of participants. Larger cohorts of participants should be examined to formally evaluate the sensitivity and specificity of the test we introduced here (although it remains unclear which clinical test should be selected as the most appropriate "gold standard" assessment). Additional testing will also refine the normative distributions for DT and CU in NIC individuals, which will facilitate quantification of proprioceptive deficits in stroke survivors both in real (units of [cm]) and relative (% likelihood) terms.

Conclusions
We described a novel robotic assessment of proprioceptive integrity in the arm using hand displacements and force perturbations, which are commonly used to study motor function in stroke survivors. The new approach demonstrates excellent ability to discriminate limbs with impaired or absent proprioception from those with intact proprioception. The test provides a measure of proprioceptive integrity on a continuous scale that is sensitive to small changes in performance such as learning effects due to repeat performance of the task. Thus, the robotic assessment described here can characterize proprioceptive deficits post-stroke with a resolution superior to that of a common clinical test of proprioception. We expect this new approach will prove useful as a research tool in elucidating how impaired proprioception contributes to motor control deficits after stroke [41].