The need for the application of more generalizable and robust methods for outcomes prediction in the post-stroke population has been advocated [8, 12, 38], in order to foster the implementation of CDSS within clinical routine. This aspect could have great importance in the rehabilitation context, both to improve patients’ outcomes and to contain costs of care. About this point, on a cohort of 1197 stroke patients, it has been demonstrated that the length of stay in the rehabilitation setting, accounting for the 70% of the total stroke costs, is strongly associated with the initial stroke severity and the improvement in the recovery [39]. Thus, promoting the optimisation of the rehabilitation path, and improving the clinical recovery is the key target for data-driven solutions. In order to reach a full implementation of data-driven based clinical decision support tools, it is crucial to develop robust and interpretable predictive models.
In this process, functional outcomes, such as the Modified Barthel Index scale, are a good first-step target, as they allow a comprehensive and higher-level evaluation of the patients’ independence [24]. In absence of a validated Minimal Clinical Important Difference on the mBI scale, the mBI in this study was dichotomised and considered as class transition, to determine a more clinically relevant functional recovery [28]. All the implemented classifiers obtained good accuracies, and the weighted results obtained through the sum of posterior probabilities obtained the highest accuracy (79.1%) (Fig. 3).
To compare the results obtained by Lin et al. [16] on the mBI categorised in three classes, sensitivity and specificity for the best classifier (RF) were obtained from the aggregated predictions performed on the test set folds. The results showed higher values both in terms of specificity (0.68 in the previous study and 0.68 in our case) and sensitivity (0.72 and 0.80 in our case). However, the comparison of these numbers should be done in light of the technical differences of the implemented solutions, specifically concerning the different outcome types and validation approaches. In fact, in our work, concerning the validation of the model, nested cross-validation was implemented for each classifier, similarly to what Sale et al. [15] proposed, ensuring a more robust analysis of results generalisability. Indeed, it has been shown that exclusively relying on cross-validation accuracy for both model selection, hyper-parameter tuning and evaluation of the results can carry a significant bias on the prediction (the so-called cross-validation bias [40]). Thus, an approach as nested cross-validation can ensure a reduction of the cross-validation bias, replicating error estimations similar to those obtained with independent external validation [40, 41, 42].
In addition to the development and nested cross-validation of the classifiers, an analysis of the interpretability of the best performing model was also performed. More specifically, the analysis of interpretability is conducted through the application of game-theory approaches that evaluate the weight of each feature on the prediction in a patient-specific manner [43]. Up to our knowledge, the only paper addressing these methods on stroke predictive models is the paper from Qin et al. [44], using prognostic models for the prediction of mortality. This technique gives an insight into the roles and mutual interactions among features and fosters the translational applicability of ML models in the clinical context. Indeed, the understanding of which aspects contribute to the given outcome prediction can empower the clinical users of the information on when such solutions can be trustworthy. Especially in the case of the misclassified patients, the variables obtained from the patients’ assessments, together with the factors contributing to the prediction in the model, can make the clinician understand and further analyse these cases. Moreover, enhancing the concept of personalised treatment optimisation, the interpretability through the use of Shapley values allows for patient-specific analyses of features contributions. As an example, Fig. 4 is representing the specific feature contributions for two patients of the test-set, classified as non-transitioning, specifically showing a correct classification (panel A) and misclassification (panel B) of the model. In panel B, it is visible how the clinical complexity of the patient, i.e. the presence of the bedsore, a global disability and the presence of the bladder catheter, is contributing in the decision of the model toward a non-transition on the mBI class. This information can be crucial for the clinicians in order to select the proper rehabilitation plan for the specific patient.
The analysis of the weights of factors with the SHAP method showed great importance on functional aspects such as the trunk control, communication level, disability level, bladder catheter and the pressure ulcers, rather than the mBI level at admission (Fig. 3). Additionally, the type of stroke, among ischemic, haemorrhagic or both, was confirmed as a predictor. Specifically, the presence of haemorrhage, either alone or in combination with ischemic stroke type, resulted in a worse outcome, representing a proxy of stroke severity at the entrance. This hypothesis was indeed confirmed by a statistically significant difference in mBI total score at admission between the two groups, with lower values for those experiencing haemorrhage (Mann-Whitney test, p-value = 0.007).
The results on trunk control are in line with the literature, showing that trunk control is an essential predictor of functional outcomes and activities of daily living [16, 45]. In fact, trunk control has a deep connection both with mobilisation tasks and the use of extremities. Trunk control is not only representing the ability to keep balance during the sitting and upright position but the capability to perform stabilisation and selectively control the movements of both the upper and lower trunk [46]. It is well known that the proximal stabilisation of the trunk is related to higher control of distal extremities and efficient walking is guaranteed by a proper rotation of the shoulders with respect to the pelvis. Also, Lin et al. [16], specifically developing predictive models on a three-classes mBI, obtained the trunk control as one of the key features involved in recovery.
Also related to mobilisation, the presence of markers of clinical complexity, such as bedsores or bladder catheter, was reported among the most significant predictors in the model. Especially in the first year post-stroke, immobility-related complications can be very common and negatively influence the functional outcome and the independence in basic activities of daily living. A study from Sackley et al. [47], on a cohort of 122 patients, reported 22% of patients suffered from bedsores within 12 months of observation. The same study additionally reported through preliminary analyses how the number of complications is negatively correlated with the Barthel Index score at three months post-stroke.
Finally, the communication level was another important aspect emerging from our results. In this work, the disability on the communication level was measured with the SDC scale. Despite the mBI scale does not directly measure communication components, it is noticeable the importance of communication levels on functional recovery. In the literature, the role of communication limitations, such as aphasia, is controversial [48, 49]. Like other measures of disability, SDC does not explain the specifics of the disorders (aphasia, apraxia, dysarthria, dementia, deafness) which, individually or in combination, can impair communication. The SDC evaluates the difficulties in communication as assessed by the clinician after an anamnestic interview and clinical examination. It may be affected by a combination of neurological problems, being this way an indicator for an aggregate of problems and a severity index. Hence, a disability in communication is necessarily associated with a reduced comprehension of therapeutic instructions and may prevent the development of the therapeutic relationship between the patient and the rehabilitation team, possibly delaying or compromising recovery [49]. Interestingly, in our study, the beeswarm plot (Fig. 3) is showing how the levels from 0 to 2, connected to total to moderate limitations in communicating, are predictive of an absence in class transition, whilst on the contrary levels 3 and 4 (mild and absent communication limitations) have strong positive predictive value on the class transition, as already reported for severe brain injuries [50].
Despite the retrospective nature of the study, the proposed ML methodology was validated through a nested cross-validation approach, ensuring high-level confidence of the achieved results in terms of generalisation capability. The results obtained were promising and could contribute to first-step evidence for the realisation of interpretable CDSS. As already suggested for different conditions, addressing explanation techniques for the output of intensive post-acute rehabilitation [26, 51] provided a data-driven focus on the importance of trunk control, bedsores and communication levels in the recovery of functional outcome of post-stroke patients at discharge from intensive rehabilitation. These aspects, which are in strong agreement with clinical evidence and practice [26, 51], further fostered the reliability and trustworthiness of the predictive model developed.
Limitations and implications for future research
Despite further strategies could be investigated from the technical point of view, (e.g. oversampling techniques), the selection of the variables should be mostly discussed and possibly improved in future research. Indeed, the retrospective nature of the study implied the use of a restricted selection of variables related to limited aspects of the patients’ rehabilitation. For this reason, a prospective observational design was developed for a multifactorial analysis of post-stroke patients’ characteristics [52] and their role for the prediction of functional recovery.
Additionally, the selection of the outcome measure deserves some additional comments. As it was stated within the introduction, the development of predictive models is the first step in the direction of tools for the clinical decision support. Thus, as a preliminary stage, we decided to address to a more generic outcome that could broadly quantify the functional outcome of the patient at discharge. For this reason, we selected the class transition on the Modified Barthel Index, over other measures such as the discharge score, the difference between discharge and admission scores, efficiency, or effectiveness [53], due to its easier interpretation. We are aware some limitations may affect this choice, such as the fact that a linear relationship between the score and the clinical conditions of the patients is assumed, or that even a small change in the total score could lead to a transition, or the fact that transitions of one or more classes are equally considered. However, class transition was chosen since it provides and easily interpretable index of weather the rehabilitation stay is associated to a discrete change in the patient’s disability in activities of daily living. Additionally, the class transition was selected as a measure of a discrete change in the overall disability burden [27, 54], given that the Minimal Clinical Important Difference, MCID, has not been validated yet on the mBI with range 0–100.