Training Reinforcement Neurocontrollers Using the Polytope Algorithm

作者：Aristidis Likas, Isaac E. Lagaris

摘要

A new training algorithm is presented for delayed reinforcement learning problems that does not assume the existence of a critic model and employs the polytope optimization algorithm to adjust the weights of the action network so that a simple direct measure of the training performance is maximized. Experimental results from the application of the method to the pole balancing problem indicate improved training performance compared with critic-based and genetic reinforcement approaches.

论文关键词：reinforcement learning, neurocontrol, optimization, polytope algorithm, pole balancing, genetic reinforcement

论文评审过程：

论文官网地址：https://doi.org/10.1023/A:1018669223478

原文链接
谷歌学术
必应学术
百度学术