Test–retest reliability of KINARM robot sensorimotor and cognitive assessment: in pediatric ice hockey players

Better diagnostic and prognostic tools are needed to address issues related to early diagnosis and management of concussion across the continuum of aging but particularly in children and adolescents. The purpose of the current study was to evaluate the reliability of robotic technology (KINARM robot) assessments of reaching, position sense, bimanual motor function, visuospatial skills, attention and decision making in youth ice hockey players (ages 10–14). Thirty-four male children attended two testing days, one week apart. On day one, each subject completed five tasks on the robot with two examiners (alternating examiner sequence); the 2nd examiner followed the same procedure as the 1st immediately afterwards. One consistent examiner tested subjects one week later. This is a test-retest reliability study. The robotic tasks characterize sensorimotor and/or cognitive performance; 63 parameters from 5 tasks are reported. Session 1 was the 1st time the subject performed the 5 tasks, session 2 the 2nd time on day 1, and session 3 one week following. Intra-class correlation coefficients ranged from 0.06 to 0.91 and 0.09 to 0.90 for session 1 to 2 and 2 to 3, respectively. Bland-Altman plots showed agreement in a majority of the parameters and a learning effect in 25 % and 24 % of parameters in session 1 vs 2 and 1 vs 3, respectively but none for session 2 vs 3. Of those that showed a learning effect, only 8 % of parameters in session 1 vs 2 and 10 % in session 1 vs 3 had a clinical relevance measure ≥ 0.8. The relative homogeneity of the sample and the effect of learning seen in some of the task parameters appears to have negatively impacted the intra-class correlation coefficients from session 1 to 2, with less impact for 2 to 3. The Bland-Altman analysis supports good absolute reliability in healthy male children with no neurological impairment ranging in age from 10 to 14. The clinically relevant learning effect seen, in a small number of parameters could be addressed by creating a learning effect adjustment factor and/or implementing a practice session, which would eliminate the learning effect.


Background
The incidence of concussion [or mild traumatic brain injury] in the US alone has been estimated at 1.7 million per year accounting for 80 % of all brain injuries [1][2][3][4]. One hundred and sixty thousand Canadians sustain brain injuries each year. [5]. Among Canadian university hockey players, concussion constitutes 13 % of all injuries, ranking as the second most common injury after sprains or strains [6]. More than half of mild traumatic brain injuries occur in children and adolescents [2]. Researchers from London, Ontario, Canada examined a retrospective cohort of concussions in children and adolescents (<18 years) seen in the emergency department from 2006 to 2011, and showed that of the individuals who sustained a sport-related concussion, 36 % did so while playing ice hockey [7]. Evidence suggests children and adolescents may be more susceptible to concussion, and may take longer to recover than adults [8][9][10]. Our understanding of the impact of sport related concussion(s) on motor and cognitive processing in children, with respect to the effect on the developing brain, is limited [11,12].
The injury spectrum associated with concussion is broad, ranging from subtle or imperceptible to obvious changes in motor and/or cognitive performance, and vary dependent on the developmental stage of the central nervous system [13][14][15][16]. One of the primary reasons for the paucity of research related to the effect of concussion in children and adolescents is the lack of sensitive measurement tools that can identify impairments following concussion [17,18]. Better diagnostic and prognostic tools are needed to address issues related to early diagnosis and management of concussion across the continuum of aging but particularly in children and adolescents. The scarcity of age-specific research forces practitioners to use guidelines developed for collegiate or adult populations [19]. Researchers are beginning to examine the efficacy of measurement tools used with adults among children and adolescents [20,21]. Maturation occurs at different rates across various domains within the central nervous system, ranging broadly from 18 (reaching correction) to 30 (precision of number sense) years of age, which can complicate concussion evaluation in children and adolescents [22][23][24]. Clinical tools used to assess neurocognitive processing and postural control (e.g., Trail Making B Task -TMB and Balance Error Scoring System -BESS) have been evaluated to determine their reliability with children and adolescent populations [20,21]. The BESS shows a limited ability to assess postural control in young athletes postmild traumatic brain injury [25]. Other researchers have examined cognitive motor integration in children (mean age: 13.2 years) following concussion [26]. Subjects were required to slide a cursor from a central to a peripheral target on a dual-touchscreen laptop using one finger [26]. The results showed significant impairment in both movement timing and trajectory formation with concussion history (7 to 11 days post-concussion) [26]. Performance of the cognitive motor integration task was not restored to baseline levels until 18 months following concussion [26].
Robotic technology has the potential to offer a clinical diagnostic assessment tool that is ideal for objective, quantitative, rapid and automated assessment of neural function. The KINARM (BKIN Technologies Ltd, Ontario, Canada) is a robotic device that has been used to detect functional impairments across neurological domains [27][28][29]. Subjects grasp two robotic arms while performing automated upper-extremity tasks, while a twodimensional virtual reality display serves as a visual aid in those tasks not testing proprioception. The tasks test visuomotor, proprioceptive, rapid sensorimotor and decision control, and executive function capabilities [27][28][29]. The KINARM end point robot has been used to explore the connection between degradation in performance on the proprioceptive task within 24 hours post mild traumatic brain injury and the prevalence of post concussion syndrome three weeks post injury [30]. Subjects in the study were > 18 years of age. The results identified subjects with post concussion syndrome had more abnormal scores than those without post concussion syndrome [30].
There is evidence that the KINARM robot is reliable and sufficiently sensitive to use in adult stroke and moderate/severe brain injury populations but little research has been published examining its reliability with children and adolescents [27][28][29]. Thus, the primary purpose of the current study was to evaluate the reliability of robotic technology in children and adolescents ranging in age from 10 to 14 using a series of tasks designed to assess neurological impairments. Intra-class correlation coefficients ≥ 0.50 and Bland-Altman plots associated with robotic parameters provided evidence that the KINARM robot is a reliable tool to use with children and adolescents.

Participants
Thirty-four healthy, normally developing boys aged 10-14 years were recruited (individuals available to attend two testing sessions one week apart) from the subject population of a 5-year longitudinal prospective study in children and adolescent ice hockey players. This was a sample of convenience. The Conjoint Health Research Ethics Board at the University of Calgary approved the study (Ethics ID number E24026). Prior to data collection, parents provided signed consent for the participants to partake in all aspects of the study and the children provided assent. Participants were included in the testretest portion of the study if they had not previously been exposed to the robotic assessment. Individuals with prior history of concussion and no neurological signs and symptoms were included in the study. If individuals had a major injury to any joint of their upper extremities, had sustained a concussion within the month of testing and/or between test-retest sessions, or had a learning disability they were excluded from the study.

Robotic assessment
The robotic assessment was performed using the KINARM end point bimanual device, which permits free movement of the upper extremities in the horizontal plane while seated; refer to Fig. 1. A virtual reality system displays visual targets such that they appear in the same plane as the arms. Subjects experience force feedback while grasping the robot handles when hitting targets during specific tasks. Participants attended two testing sessions one week apart on the KINARM robot. On day 1, each subject completed five tasks (63 parameters total) with two examiners (alternating examiner sequence); the second examiner followed the same procedure as the first. Overall there was no reason to expect an examiner effect as all each examiner did was place the subject in front of the robot and read a predetermined set of instructions for each task. Thus the focus of the current study is the test-retest reliability for the robotic testing [31]. Each testing session lasted 17 (1) (mean (SD)) minutes and the two sessions were separated by approximately 2 minutes. Subjects were seated in a chair in front of the robot, asked to avoid slouching, and the robot height adjusted such that each child's head rested on a location in the center of the virtual visual field. Body position was kept constant across subjects. Subjects completed the following 5 tasks during each testing session: Visually guided reaching on right and left, Arm position matching on right and left, Object hit, Object hit & avoid, and Trail making B with the dominant limb. These tasks characterize sensorimotor and/or cognitive performance. Examiner 1 from the first day of the study tested all subjects on the same five tasks (63 parameters) one week later.

Experimental tasks Visually guided reaching task
This task provides a measure of upper limb visuomotor capability (Fig. 2a). The robot handle is represented as a white dot (0.5-cm radius) on the display. The task targets are red circles, each with a 1.0 cm radius. Participants reach out and back between the central and peripheral targets. The four red targets are 10 cm from the initial central target. Participants are instructed to move the white dot from the centre of one target to the centre of the next target that appears, as quickly and accurately as possible. All targets are located near the centre of the workspace for each arm. There are five blocks of trials, target location is randomized within a block and both the reach out and reach back trials are analyzed. This process is repeated forty times to explore the workspace and measure variability of the subject's responses. Each subject completed the task twice, once with each arm; the dominant arm always preceded the non-dominant arm. Although not identical, the task used in the current work is similar to and uses metrics that were described earlier using the KINARM exoskeleton robot [28,29,31,32].

Arm position matching task
This task provides a measure of proprioceptive (position sense) capability (Fig. 2b). The robot moves one arm (passive arm) to one of four different target locations spaced at the corners of a square grid at 20 cm intervals in the X and Y directions. Movements are made with a bell-shaped velocity profile. Then, participants actively move the opposite arm (active arm) to the mirror-image location in space. Participants notify the examiner once the mirror-matched position is reached and the examiner advances the robot to the next trial. Each participant's vision is blocked to ensure that any sensory information about limb position comes from proprioceptive inputs. There are 6 blocks of trials, target location is randomized within a block and 1 trial for each target is completed within a block. The same target is never repeated sequentially. The task was completed twice with dominant arm being the active arm first followed by the non-dominant arm. A similar task has been used with the KINARM exoskeleton robot [28,29,32,33]. To save time the task used in the current work used 4 targets rather that 8 [32].

Object hit task
This task is a rapid sensorimotor, decision and control test (Fig. 2c). It assesses the ability of a subject to select and engage motor actions with both hands over a range of speeds and a large workspace. Virtual paddles appear at the robot handles. Subjects are asked to use the paddles to hit virtual balls that fall from the top of the screen toward them. The robot produces a reactive force that mimics the actual force that would have been felt by the subject if these were real objects contacting a real paddle. As the task proceeds the balls move at greater speeds and appear more often, making the task more difficult as time progresses. Balls fall at random from ten bins, which are spread equally across the workspace, and thirty balls fall from each bin. A total of three hundred balls are dropped during the task in one minute and forty-four seconds. A similar task has been used with older adults and the KINARM exoskeleton robot, with a slight reduction in the total time the balls were dropped for the current work [34].

Object hit and avoid
This task is similar to the Object hit task, but requires higher executive function. Participants must hit target objects while avoiding all others (Fig. 2d). Thus the emphasis is on attention, rapid motor selection, and inhibition. At the start of the task subjects are shown two target shapes of a possible eight, they are instructed to memorize these as the only two shapes to hit during the task, and to avoid all other (6) distractor shapes. If distractors hit the participant's paddles they pass through the paddles but there is no reactive force felt by the subject. This provides immediate and ongoing feedback to the subject that the object was a distracter and not a target. As with the preceding task, when targets are hit the robot produces a reactive force that mimics the actual force that would have been felt by the subject if these were real objects contacting a real paddle. Two hundred objects and one hundred distractors fall in just over [Bourke TC, Lowrey CR, Dukelow SP, Bagg SD, Norman KE, Scott SH. A robot-based behavioural task to quantify impairments in rapid motor decisions and actions after stroke. Submitted].

Trail making B task
This task is the second part of a cognitive test that evaluates executive function (e.g., visual attention and task switching) from the field of neuropsychology that is commonly used in the assessment of brain injury (Fig. 2e) [35]. Normative data from pen and pencil versions of the task have been published for adolescents, adults and older adults across age ranges of 15-20, 20-59, and 55-85, respectively. [36]. Participants trace through an alternating alpha-numeric sequence of targets 1-A-2-B for example, up to 13, for a total of 25 targets. A shortened version of the task that has 5 targets precedes the full task to help familiarize subjects with the task. If the subject touches an incorrect target while moving through the sequence the preceding correct target will turn red and the subject must return to that target before continuing. There are eight possible patterns for the Trail making B task [37]. These patterns were randomly presented within and across subjects who participated in the study.

Outcome measures
Task parameters associated with each task are presented in Table 1. The parameters for each task were developed to quantify task performance, thus behavioral attributes associated with the parameters are included in Table 1.

Data analysis
Statistical analyses were performed in SPSS, version 19.0 [38]. The study was a repeated-measures design. Significance level was set at alpha = 0.05. All subjects and their data were included in the analysis as there were no missing data points. In general, an effect size of 0.10 was considered small, 0.  Feed-forward Control: Initial phase of the movement.
Initial direction error (rad) Angular deviation between (i) a straight line from the hand position at movement onset to the destination target, (ii) a straight line from the hand position at movement onset to the hand position after the initial phase of movement.
Feed-forward Control: Initial phase of the movement.
Initial distance ratio Ratio of (i) the distance the hand travelled during the subject's initial phase of movement to (ii) the distance the hand travelled between movement onset and movement offset (or the end of the trial if the destination target is not reached).
Feed-forward Control: Initial phase of the movement.
Initial speed ratio Ratio of (i) the maximum hand speed during the subject's initial phase of movement to (ii) the global hand speed maximum of the trial.
Feedback Control: Movement corrections after the initial motor response.
Speed maxima count Number of hand speed maxima between movement onset and offset.   Error count Number of times an incorrect target was touched.
Legend: Behavioral attributes and definitions of task parameters are outlined tasks and session two (S2) the second time the five tasks were performed all on day one. Session 3 (S3) was performed following one week. Intra-class correlation coefficients were used to assess consistency or reliability of outcomes from the KINARM robot for S1 to S2 and S2 to S3 [38,39]. Although there are no standard values for acceptable relative reliability associated with intra-class correlation coefficients, the following general guidelines have been suggested, values > 0.75 indicate good reliability and < 0.75 poor to moderate reliability [31]. Researchers and clinicians have been encouraged not to use these general guidelines as absolute standards but to remember that the degree of acceptable precision in the measurement must be taken into account when determining an acceptable reliability cut-off point [31]. For the purposes of the current study coefficients of < 0.50 will indicate poor reliability, coefficients from 0.50 to < 0.75 moderate reliability, and coefficients ≥ 0.75 good reliability [31]. In the current study the intra-class correlation model was a two-way repeated measures, random effects analysis of variance (ANOVA) model and type consistency was performed using SPSS. Session was used as the random sample to compute the intra-class correlation coefficients [38][39][40].
Bland-Altman plots were used to evaluate agreement for S1 to S2 and S2 to S3 and reflect the spread of difference scores (e.g., S1 -S2) around the line of equality, the line all points would lie on if outcomes were exactly the same when tested across sessions, the line at zero on the graph [31,[41][42]. The spread of the difference scores indicates whether the level of observed error is acceptable, in the current study when S1 is substituted for S2 and S2 for S3 [31,42,43]. The sessions are considered to be in agreement when the difference in subject's performance for S1 to S2 or S2 to S3 is small enough, within an acceptable clinical error range, for the methods to be considered interchangeable [31,[41][42][43]. The 95 % limits of agreement define the range within which most differences between measurements will lie based on difference scores for S1 to S2 and S2 to S3. The requirements for agreement are met when 95 % of these difference scores fall within two standard deviations above and below the mean of the difference scores [31,41,42].
Two-way repeated measures ANOVAs (parameters by sessions) were used to identify interaction and/or session effects for S1 to S2, S2 to S3, and S1 to S3 for right and left hands with the Visually guided reaching and Arm position matching tasks, as well as for the Object hit, Object hit and avoid, and Trail making B tasks. Post-hoc Bonferroni corrections were used to determine those parameters that showed a significant learning effect with improvement in the presence of a session effect. Only those parameters showing significant improvement in performance were analyzed to determine clinical relevance based on the individual effect size standards measure [43,44]. This will be referred to as the clinical relevance measure in the current study and was determined using the following formula: ∂ group = clinical relevance measure for the group m 1 = the group mean at baseline m 2 = the group mean at follow-up s 1 = the group standard deviation at baseline [43][44][45].
Group effect size standards for the clinical relevance measure are 0.20 for a small group change, 0.50 for a moderate group change, and 0.80 for a large group change [45,46]. The cut-off benchmark of ≥ 0.8 was selected to coincide with clinical relevance in the current paper [45,46].

Results
Characteristics of the subjects who took part in the study can be found in Table 2. Table 3 presents a summary of the intra-class correlation coefficients and the associated 95 % confidence intervals for parameters bilaterally for Visually guided reach, Arm position matching, and then for Object hit, Object hit and avoid, and the Trail making B tasks. Intra-class correlation coefficients were <0.50 in 25 %, ≥ 0.50 to < 0.75 in 49 %, and ≥ 0.75 in 26 % of the parameters for S1 to S2 and < 0.50 in 27 %, ≥ 0.50 to < 0.75 in 37 %, and ≥ 0.75 in 36 % of the parameters for S2 to S3. Table 4 includes a summary of the data used when determining agreement related to Bland-Altman plots, which are presented in Figs. 3 and 4. The Bland-Altman plots suggest agreement in the majority of the parameters across the five tasks evaluated however a few parameters showed a learning effect. Figure 3 presents Bland-Altman plots comparing S1 to S2 and S2 to S3 for  both reaction time (s) (Visually guided reach-R) and movement time (s) (Visually guided reach-L). All errors appear unbiased as differences are spread evenly and randomly above and below the line of equality in Fig. 3a, b, c, and d. Alternatively, Fig. 4a shows a negative shift in the difference scores related to total hits (Object hit) for S1 to S2 which reflects the presence of a learning effect. When S2 was compared to S3, the learning effect appears to be have been maintained over one week when the subject returned to repeat the testing (Fig. 4b). Figure 4c and d represent test time (s) (Trail making B) for S1 to S2 and S2 to S3, respectively. Although the shift in difference scores seen in Fig. 4c is in the positive direction this also reflects a learning effect. As seen with the previous parameter Fig. 4d shows that the learning effect was maintained over the one week when the subject returned to repeat the testing procedure.
Outcomes from the two-way repeated measures ANOVAs (parameters by sessions) are presented in Table 5. Interactions between parameters and sessions were identified for Object hit, Object hit and avoid, and Trail making B tasks with Bonferroni adjustment showing a main effect of session for S1 to S2 and S1 to S3 in each but not for S2 to S3. This further supports the evidence seen in the Bland-Altman plots that the learning effect had stabilized after the second completion of the tasks, as no significant improvement in performance was seen from S2 to S3. Table 5 also includes p-values from post hoc Bonferroni corrections for a few parameters that indicate the presence of a learning effect with the Object hit, Object hit and avoid, and Trail making B tasks for S1 to S2 and S1 to S3. The clinical relevance measure reflects a clinically relevant change between sessions and coincides with a value of ≥ 0.80.

Discussion
The main purpose of the current study was to evaluate the reliability of the KINARM robot with the objective of using it as an assessment tool to evaluate motor and/ or cognitive performance in male children and adolescents ranging in age from 10 to 14. One of the strengths   of the current study is that performance outcome reliability was evaluated using relative reliability (intra-class correlation coefficients ≥ 0.50) and absolute reliability (Bland-Altman agreement) methodologies. Intra-class correlation coefficients tended to be moderate to high for most parameters across tasks and sessions. Intra-class correlation coefficients have been computed previously for the KINARM robot tasks of Visually guided reaching and Arm position matching in two separate populations that included both young adults and older adults who had suffered a stroke [27,32]. In the study that evaluated Visually guided reaching, 25 % of parameters were ≥ 0.50 to < 0.75 and 75 % were ≥ 0.75 as compared to < 0.50 in 20 %, ≥ 0.50 to < 0.75 in 45 % and ≥ 0.75 in 35 % of parameters in the current study [32]. In the Arm position matching study, 25 % of parameters were ≥ 0.50 to < 0.75 and 75 % were ≥ 0.75 as compared to < 0.50 in 13 %, ≥ 0.50 to < 0.75 in 50 % and ≥ 0.75 in 27 % in the current study [27]. Both studies included individuals with broad functional levels ranging from normal healthy adults to those with significant neurological impairments associated with stroke. The current study included only normal healthy children and adolescents, none with sensorimotor impairment. We suspect that we observed lower ICCs in the present study as a direct result of failing to include individuals with sensorimotor impairments. Inclusion of such individuals in prior studies demonstrated that the robotic scores had relatively low intra-subject test-retest variability across a large range of possible values which led to moderate to high ICCs in the overwhelming majority of parameters. In the present study we observed low ICCs in parameters where we recorded a very small range of scores across subjects (e.g., Visually guided reaching, Initial Speed Ratio -0.92 to 1.0). In the future we plan to re-evaluate the reliability of these parameters in children with brain injury.
Results from the current study are similar to those from a test-retest reliability study (tested 60 days apart) that evaluated a battery of neuropsychological tests, including the Trail making B task in children ages 9-14 [23]. As in the  Legend: Mean Difference and 95 % Limits of Agreement (lower followed by upper limit) for S1 to S2 and S2 to S3 for parameters associated with Visually guided reach, Arm position matching, Object hit, Object hit and avoid, and Trail making B tasks current study only healthy typically developing children with no neurological impairments were included in the reliability analysis, intra-class correlation coefficients ranged broadly from poor to good, 0.46 to 0.83 respectively [23]. The intra-class correlation coefficients for total time (s) associated with the Trail making B task in our study was 0.44 (S1 to S2) as compared to 0.65 in the previous study [23].
In the studies that included the KINARM robot tasks of Visually guided reaching and Arm position matching, data from both patients and controls were included in the intra-class correlation coefficients computation. The presence of sensorimotor impairment in a portion of the population included in these reliability studies resulted in an increased level of variability associated with the outcome measures [27,33]. This was not the case in the current study or the paper that included the Trail making B task, individuals who had sustained a concussion were not included in the reliability testing [23]. Taken together these results suggest that the level of performance variability associated with the neurological impairment post-stroke appears far greater than that associated with neural development. This is an important distinction as variability among subjects' scores must be large to demonstrate reliability [43]. Thus, we posit that the low intra-class correlation coefficients seen in the current study may have resulted from less variability among subjects due to the fact that all participants were healthy typically developing children with no neurological impairments. This is one limitation of the study. This is an argument for the inclusion of individuals across a broad Reaction time (s) for S1 to S2 and S2 to S3, respectively. c and d: Movement time (s) for S1 to S2 and S2 to S3, respectively. The difference scores fall primarily within the 95 % upper and lower limits of agreement functional spectrum when testing the reliability of any measurement tool. When Bland-Altman plots were used to determine agreement with respect to the difference score associated with subject's performance for S1 to S2 or S2 to S3 a learning effect became apparent in a few of the parameters in the current study. Figure 3 presents examples of two parameters that reflect agreement in performance from S1 to S2 and S2 to S3, whereas Fig. 4 shows examples of two parameters that reflect the presence of a learning effect for S1 to S2 but not S2 to S3. Many of the parameters that showed improvement in the Bland-Altman plots showed a statistically significant increase in performance from S1 to S2, which reflected a learning effect, but not S2 to S3 (refer to Table 5). In addition, improved performance was identified when S1 was compared to S3. This shows that the learning effect had stabilized by the third completion of the tasks. As seen in Fig. 4a the improvement in performance associated with the total hits (Object hit) resulted in a negative shift in the difference scores. However, improved performance associated with total time (s) (Trail making B) seen in Fig. 4c resulted in a positive shift in the difference scores. Thus dependent upon the nature of the skill being evaluated improvement in parameter performance was reflected either as an increase or decrease in value. When evaluated, clinically relevant changes were seen only in parameters from Object hit, Object hit and avoid, and Trail making B.
A significant improvement or learning effect has been seen with the use of the paper and pencil version of the Total hits for S1 to S2 and S2 to S3, respectively. a shows a learning effect with a negative shift in the difference scores. c and d: Test time (s) for S1 to S2 and S2 to S3, respectively. c shows a learning effect with a positive shift in the difference scores while b and d reflect the maintenance of the learning effect one week following the initial testing session S1vsS2: p < 0.134 S1vsS2 p < 0.069 S1vsS2 p < 1.000 S1vsS2 p < 1.000 S1vsS2 p < 0.0001 NA S1vsS2 p < 0.001 NA S1vsS2 p < 0001 NA S2vsS3 p < 1.000 S2vsS3 p < 1.000 S2vsS3 p < 0.163 S2vsS3 p < 1.000 S2vsS3 p < 1.000 S2vsS3 p < 1.000 S2vsS3 p < 0.07 S1vsS3 p < 0.811 S1vsS3 p < 0.343 S1vsS3 p < 0.068 S1vsS3 p < 1.000 S1vsS3 p < 0.0001 S1vsS3 p < 0.0001 S1vsS3 p < 0.0001  Trail making B task in children, adolescents and adults [23,47,48]. Practice effects tend to be defined as some improvement in performance between concurrent test sessions based on familiarity with the procedures and/or previous exposure to the assessment, while learning effects relate to the retention of the improvement over a period of time [22,49]. The results in our study showed that the learning effect had stabilized by the third application of the test. Learning effects can be a confounding factor in the interpretation of test scores. The inherent variability associated with a learning effect may artificially inflate intra-class correlation coefficients. This is a limitation when using intra-class correlation coefficients to evaluate reliability and highlights the importance of using more than one method when testing reliability. We can speculate that some of the intra-class correlation coefficients in the current study may have fallen within the low to moderate range in the absence of a learning effect. The Bland-Altman plots however show good absolute reliability in the majority of the parameters for S1 to S2 and all parameters for S2 to S3, with stabilization of the learning effect.
If not addressed, learning effects have the potential to falsely give the impression of "improvement" which can adversely complicate interpretation of outcomes particularly when comparing athletes from pre-season performance to post-concussion. In one reliability study that evaluated neuro-cognitive tests, the learning effect was adjusted using a correction factor; a mean change score was calculated and added to the confidence interval of the outcome scores after repeat testing [23]. Under these conditions no change in performance could be interpreted as absence of neural impairment following concussion. An alternative interpretation would be that the robot tasks did not target areas of the brain susceptible to concussion or that the robot was not able to pick up any change in performance. A significant decline in performance could be attributed to the effect of the concussion. A correction factor was not calculated in the current study.
Since the effect of learning was stable after the second testing session an alternative strategy would be to implement a practice session (P1) preceding experimental testing in the pre-season [50,51]. This would eliminate the effect of learning and changes in performance outcomes post-injury could be directly linked to the effects of concussion. Increased performance outcomes following concussion, with no practice session, could be interpreted as a lack of neural impairment. Alternatively, no change in performance could suggest the presence of neural impairment, as an increase in performance would be expected in the presence of learning. If a practice session (P1) were included in the experimental paradigm no change in performance from pre-season testing (S1) to post-concussion assessment (S2) would be interpreted as clinically insignificant, no neural impairment. Similarly, with no practice session, a significant increase from S1 to S2 would be interpreted as an absence of neurological impairment. In those parameters that did not show a clinically significant learning effect, no change in performance post-concussion would be interpreted as the absence of neural impairment whereas a decline in performance would be associated with the effect of concussion.

Conclusion
In general, the relative homogeneity of the sample as well as the effect of learning seen in some of the outcome parameters appear to have had a negative impact on the intra-class correlation coefficients for session 1 to 2, with less impact for 2 to 3. However, the Bland-Altman analysis supports good absolute reliability. Thus the KINARM robot appears to be reliable for healthy male children with no neurological impairment ranging in age from 10 to 14. Our finding supports further testing to evaluate children pre and post-concussion, to determine whether the KINARM robot is sufficiently sensitive to identify neural impairments. The learning effect could be addressed in one of two approaches by: 1) creating a learning effect correction adjustment factor which would enhance efficiency at baseline testing and/or 2) implementing a training practice session to eliminate the learning effect. These findings begin to establish a group of parameters associated with KINARM robot tasks appropriate for the use in young sports participants and "set the stage" for clinical studies to evaluate the validity of these measures in a younger population.