Combining different algorithms to get stable learning from environment interaction

In this paper we will investigate the results of developing an ensemble of different learners able to retrieve robot control policies from the interaction robot-environment. The main motivation for combining learners is that of improving their generalization ability, and robustness against the failure of the individual components. The increased stability of the ensemble my be a gateway towards the achievement of long learning and adaptation on robots from environment interaction. Nevertheless, the major difficulty with combining expert opinions is that these opinions tend to be correlated or dependent. Diversity will be of great importance to achieve good learning processes. This diversity can be obtained with variations in the learner design or by adding a penalty to the output to encourage diversity. In this paper we will investigate the combination of different algorithms to achieve stable learning from environment-interaction. Nevertheless, the design of the ensemble and the algorithms combined must be done so that the system can work under real-time restrictions, i.e., the computational burden due to the combination of different learners must not prevent the learning of real-time robot-controllers from scratch and on the real robot.

Palabras clave: reinforcement learning, learning from robot-environment interaction, continuous learning, ensembles