Improving reinforcement learning through a better exploration strategy and an adjustable representation of the environment
Reinforcement learning is a promising strategy as all the robot needs to start a random search of the desired solution is a reinforcement function which specifies the main restrictions of the behaviour. Nevertheless, the robot wastes too much time trying the execution of random –mostly wrong– actions, and the user is forced to determine the balance between the exploration of new actions and the execution of already tried ones. In this context we propose a methodology which is able to achieve fast convergences towards good robot-control policies, and it determines on its own the required degree of exploration at every instant. The performance of our approach is due to the mutual and dynamic influence that three different elements exert on each other: reinforcement learning, genetic algorithms, and a dynamic representation of the environment around the robot.
In this paper we describe the application of our approach to solve two common tasks in mobile robotics (wall following and door traversal). The experimental results show how the required learning time is significantly reduced and the stability of the process is increased. On the other hand, the low user-intervention required to solve both tasks –only the reinforcement function is changed–, confirms the contribution of this approach towards robot techniques that are fast, user friendly, and demand little application-specific knowledge by the user, something more and more required nowadays.
keywords: Reinforcement learning, robot control, autonomous agents, genetic algorithms
Publication: Congress
1624015005660
June 18, 2021
/research/publications/improving-reinforcement-learning-through-a-better-exploration-strategy-and-an-adjustable-representation-of-the-environment
Reinforcement learning is a promising strategy as all the robot needs to start a random search of the desired solution is a reinforcement function which specifies the main restrictions of the behaviour. Nevertheless, the robot wastes too much time trying the execution of random –mostly wrong– actions, and the user is forced to determine the balance between the exploration of new actions and the execution of already tried ones. In this context we propose a methodology which is able to achieve fast convergences towards good robot-control policies, and it determines on its own the required degree of exploration at every instant. The performance of our approach is due to the mutual and dynamic influence that three different elements exert on each other: reinforcement learning, genetic algorithms, and a dynamic representation of the environment around the robot.
In this paper we describe the application of our approach to solve two common tasks in mobile robotics (wall following and door traversal). The experimental results show how the required learning time is significantly reduced and the stability of the process is increased. On the other hand, the low user-intervention required to solve both tasks –only the reinforcement function is changed–, confirms the contribution of this approach towards robot techniques that are fast, user friendly, and demand little application-specific knowledge by the user, something more and more required nowadays. - Roberto Iglesias, Miguel Rodríguez, Manuel Sánchez, Eva Pereira, Carlos V. Regueiro
publications_en