The aim of this study was to evaluate inter- and intra-rater reliability of a recently developed measurement method assessing isometric muscle force in a driven gait orthosis (DGO). Therefore two experienced therapists tested 16 subjects without and 14 subjects with NMD on the same day to assess inter-rater reliability, and one therapist tested the subjects on two separate days to assess intra-rater reliability. Our results showed that the developed assessment tool for a DGO is a reliable tool for measuring isometric torques in subjects with and without neurological movement disorders. Therefore, it can be applied as an objective outcome measure in rehabilitation units. This novel method allows therapists to assess the muscle status of their patients walking in the DGO with a timesaving method and additionally to control and document the rehabilitation process.
Previous studies have established that isometric tests of muscular function show poor to good reliability depending on the device used to assess the muscle force. For instance, Scott et al. demonstrated for hip flexion and extension fair to good (0.65 – 0.87) inter-rater reliability assessed with a handheld dynamometer and poor to good (0.48 – 0.91) inter-rater reliability assessed with a portable dynamometer anchoring station . Also a fair to good intra-rater reliability (0.76 – 0.98) for hip and knee flexion and extension movement with a slightly lower inter-rater reliability (0.64 – 0.97) was reported using a strain gauge . Using isokinetic dynamometry to measure isometric muscle force mainly results in good reliability. Quittan et al.  showed ICC values for intra-rater reliability for knee flexion and extension between 0.82 and 0.99.
A direct comparison of our results with the above mentioned studies is not possible since we assessed isometric muscle force under different conditions. While subjects in the other studies were in a seated or recumbent position during strength testing, our subjects were in an upright position, mounted to the DGO and suspended with their whole body weight. However, we could also show fair to good inter- as well as intra-rater reliability for our voluntary isometric force measurements. Reliability was slightly higher in subjects without NMD. In contrast with the results of Meldrum et al.  inter-rater reliability was somewhat higher than intra-rater in both groups. This might have been due to the fact that measurements for testing inter-rater reliability were performed on the same day, whereas those for testing intra-rater reliability were conducted on two different days. To produce repeatedly maximal isometric force, a high motivation and full concentration are required from the tested subject . This might have been difficult for some subjects and motivation might have differed on the two testing days. We were not able to control these subject-dependent factors. An additional reason for the lower intra-rater reliability could also have been that some subjects reported aching muscles from the force measurements on day one. This could have been resulted in somewhat poorer performance on day two and consequently resulted in lower intra-rater reliability. A longer break of 3 to 5 days between the two measurements might have reduced this effect.
In subjects without NMD 5 of 32 ICC values showed fair and the rest good reliability. In subjects with NMD 3 of 32 values showed fair, 28 values good reliability and one ICC value was below 0.6. This single poor reliability coefficient increased markedly when the average of two force measurements was used to calculate reliability. This goes in line with another study that suggested that more repetitions in a testing protocol might lead to better reliability . The results from subjects with NMD supported this suggestion in most instances. Intraclass correlation coefficients (ICC) calculated from averaged measurements of the two successive trials were in the majority of cases higher than ICC assessed from single measurements.
The lower reliability values for hip force measurements compared to the knee force measurements indicate that performing a hip extension or flexion movement is more difficult than the knee task. This observation agrees with the results from Meldrum et al.  who also observed lower inter- and intra-rater reliability in hip compared to knee extension and flexion measurements.
The relative variation of the measurement error (CVME) in subjects without NMD was low for inter- and intra-rater reliability (7 – 14%). This shows that the method will be capable of detecting small changes in isometric muscle force. CVME were higher in the group of subjects with NMD (9 – 36% for single measurements and 7 – 26% for averaged measurements).
Even if these values seem to be large, the new method would have detected the changes of a 16 to 24 week training study by Cramp et al. in subjects with unilateral stroke 6 – 12 month post onset where an increase of 58% in isometric torque production in knee extensor muscle group was found . Also the improvement of 29% in isometric knee extensor force in subjects with chronic incomplete spinal cord injury after a 12 week resistance training  would have been detected by the measurement method.
The large heterogeneity in the group of subjects with NMD was chosen because our goal was not to establish reliability values for a specific subject group but rather to investigate if the method is applicable to a wide range of subjects with NMD due to different etiologies. Nevertheless we expect better reliability for a more homogeneous subject groups.
Although reliability was slightly lower when using single measurements than using the average of two measurements, measurements with a single trial match best with clinical daily practice. In a clinical setting, tests are required that deliver reliable data with a minimum of time expenditure. With the presented method, therapists can assess voluntary muscle force during a training session in the DGO and reliably monitor the course of voluntary force generation in leg muscles. Regardless, in cases that require highly reliable force measurements, we propose performing two consecutive measurements in order to minimize bias and enhance reliability.
The fact that healthy athletic male subjects were able to push the DGO out of the desired position limits the application area of the method. Nevertheless we propose the method as appropriate for subjects being trained in the DGO. These subjects are generally very weak or in the case of subjects with hemiparesis the focus of therapists lies on the weak and affected side. The method was developed to optimize the monitoring of the rehabilitation process of subjects training in the DGO. As soon as subjects become too strong for DGO trainings, muscle force measurements have to be assessed with a different device, as necessary.
The ability of the method to document the rehabilitation process is shown in Figure 3. Increasing force measurements go along with increasing outcome measures. Whereas at time point 1 no clinical outcome measures could be collected because the subject was too weak to walk (even with assistance), DGO training and consequently muscle force measurements in the DGO were possible. Additionally it appears that the change in clinical gait function could be more related to changes in extensor muscles (hip and knee) than to those in flexor muscles. This goes in line with the observation that hip and knee extensors are the basic determinant for limb stability during stance phase . Also Cramp et al.  reported that after a low intensity strength training in chronic stroke patients knee extensor force increased significant and correlated with gait speed while knee flexors did not change significantly.
Our preliminary data show the potential of the tool to document and control the rehabilitation process of subjects being trained in the DGO Lokomat. Future studies will be needed to investigate this observation.