Restricted gradient-descent algorithm for value-function approximation in reinforcement learning

作者:

Highlights:

摘要

This work presents the restricted gradient-descent (RGD) algorithm, a training method for local radial-basis function networks specifically developed to be used in the context of reinforcement learning. The RGD algorithm can be seen as a way to extract relevant features from the state space to feed a linear model computing an approximation of the value function. Its basic idea is to restrict the way the standard gradient-descent algorithm changes the hidden units of the approximator, which results in conservative modifications that make the learning process less prone to divergence. The algorithm is also able to configure the topology of the network, an important characteristic in the context of reinforcement learning, where the changing policy may result in different requirements on the approximator structure. Computational experiments are presented showing that the RGD algorithm consistently generates better value-function approximations than the standard gradient-descent method, and that the latter is more susceptible to divergence. In the pole-balancing and Acrobot tasks, RGD combined with SARSA presents competitive results with other methods found in the literature, including evolutionary and recent reinforcement-learning algorithms.

论文关键词:Reinforcement learning,Neuro-dynamic programming,Value-function approximation,Radial-basis-function networks

论文评审过程:Received 22 May 2006, Revised 22 August 2007, Accepted 23 August 2007, Available online 6 September 2007.

论文官网地址:https://doi.org/10.1016/j.artint.2007.08.001