Pyramid Representations of the set of actions in reinforcement learning

Future robot systems will perform increasingly complex tasks in decreasingly well-structured and known environments. Robots will need to adapt their hardware and software, first only to foreseen, but ultimately to more complex changes of the environment. In this paper we describe a learning strategy based on reinforcement which allows fast robot learning from scratch using only its interaction with the environment, even when the reward is provided by a human observer and therefore is highly non-deterministic and noisy. To get this our proposal uses a novel representation of the action space together with an ensemble of learners able to forecast the time interval before a robot failure.

keywords: Reinforcement learning, Robotics, Ensembles, Learning and adaptation