Skip to main content
Fig. 2 | Journal of NeuroEngineering and Rehabilitation

Fig. 2

From: Automated calibration of somatosensory stimulation using reinforcement learning

Fig. 2

Reinforcement learning (RL) algorithm for sensory neurostimulation optimization. A General RL architecture. A software agent observes the environment’s state, take an action moving the environment in a new state and receives a reward in return. B During the offline training the environment is simulated through three different machine learning models trained on a dataset of neurostimulation experiments mimicking answers of subjects, for intensity, type and location elicited. In the online condition, the environment is the real subject interacting with the AI-stimulation platform. C The states are represented by the intensity, type, and location of the perceived sensation. Each combination of possible states returns a different reward ranging from Min to Max, corresponding to least and most comfortable reported sensations. The definition of the reward function is different for the low-level and high-level agents, responsible respectively to regulate low- and high- levels of reported sensations (Additional file 1: Fig. S3). D Each agent is a Deep Q-Network consisting of a neural network with two hidden layers, an input layer with three neurons (states of the environment), and an output layer with 9 neurons (Q-values of possible actions). With a probability of ε, the agent selects a random action (exploration), with probability of 1 − ε, the agent selects the action with the highest Q-value (exploitation). Each action consists of increasing/decreasing/maintaining the same PA and PW of the neurostimulation (nine possible combinations)

Back to article page