Table 2 Psychometric properties: Definition and statistical measures

From: Robot-aided assessment of lower extremity functions: a review

Property Definition Measure
Reliability Consistency of the results obtained on repeated administrations of the same test by the same person (intra-rater or test-retest) or by different people (inter-rater). ICC: based on ANOVA statistics: between-subjects var/(between-subjects var + error), six different computational methods are possible; 0 ≤ ICC ≤ 1, unitless [212, 213].
Acceptance levels for ICC depends on the application. However, a general classification of reliability has been proposed [214]: 0.00 ≤ ICC ≤ 0.10 – virtually none; 0.11 ≤ ICC ≤ 0.40 – slight; 0.41 ≤ ICC ≤ 0.60 – fair; 0.61 ≤ ICC ≤ 0.80 – moderate; 0.81 ≤ ICC ≤ 1.0 – substantial.
\( SEM=SD\sqrt{1-ICC} \) (SD of the scores from all subjects). SEM has the same unit of the measured variable [18].
Bland-Altman plots: mean of two measures vs their difference. LOA = ±1.96∙SD [17]
Cohen’s Kappa k: percent agreement among raters corrected for chance agreement [215].
Validity Extent to which the instrument measures what it intends to measure.
Concurrent validity: degree to which the measure correlates with a gold standard.
Construct validity: ability of a test to measure the underlying concept of interest.
Correlation-based methods: Pearson (r) or Spearman (ρ) correlation coefficient, ICC [216]. For continuous measures of the same data type (e.g. two methods for measuring gait speed): Root Mean Square Error (RMSE) or Bland-Altman plots against gold standard.
Responsiveness Ability to accurately detect changes. Internal responsiveness: ability of a measure to change over a particular specified time frame. External responsiveness: extent to which changes in a measure over a specified time frame relate to corresponding changes in a gold standard [217]
Minimal Detectable Change (MDC): minimal amount of change that is not likely to be due to random variation in measurement [218].
Minimal clinically important difference (MCID): smallest amount of change in an outcome that might be considered important by the patient or clinician [22].
Floor and ceiling effects: the extent to which scores cluster at the bottom or top, respectively, of the scale range.
Internal responsiveness: Cohen’s effect size: observed change in score divided by the SD of baseline score. Standardized response mean (SRM): observed change score divided by SD of change score in the group.
External responsiveness: ROC curves: sensitivity vs specificity based on an external criterion [217] \( \mathrm{M}\mathrm{D}\mathrm{C} = \mathrm{S}\mathrm{E}\mathrm{M} \times 1.96 \times \sqrt{2} \) [18] MCID: anchor-based (compare a change score with external measure of clinically relevant change) or distribution-based methods (based on statistical characteristics of the sample) [218].
Floor and ceiling effects: percentage of the number of scores clustered at bottom/top.