Recently reinforcement learning has been widely applied to robotic tasks. However, most of these tasks hide more than one objective. In these cases, the construction of a reward function is a key and difﬁcult issue. A typical solution is combining the multiple objectives into one single-objective reward function. However, quite often this formulation is far from being intuitive, and the learning process might converge to a behaviour far from what we need. Another alternative to face these multi-objective tasks is to use what is called transfer learning. In this case, the idea is to reuse the experience gained after the learning of an objective to learn a new one. Nevertheless, the transfer affects only to the learned policy, leaving out other gained information that might be relevant. In this paper, we propose a different approach to learn problems with more than one objective. In particular, we describe a two-stage approach. During the ﬁrst stage, our algorithm will learn a policy compatible with a main goal at the same time that it gathers relevant information for a subsequent search process. Once this is done, a second stage will start, which consists of a cyclical process of small perturbations and stabilizations, and which tries to avoid degrading the performance ofthesystemwhileitsearchesforanewvalidpolicybutthatalsooptimizesasub-objective. We have applied our proposal for the learning of the biped walking. We have tested it on a humanoid robot, both on simulation and on a real robot.
Keywords: Reinforcement learning,Multi-objective optimization,Robotic tasks,Policy search