Symmetric equilibrium of multi-agent reinforcement learning in repeated prisoner’s dilemma

作者:

Highlights:

摘要

We investigate the repeated prisoner’s dilemma game where both players alternately use reinforcement learning to obtain their optimal memory-one strategies. We theoretically solve the simultaneous Bellman optimality equations of reinforcement learning. We find that the Win-stay Lose-shift strategy, the Grim strategy, and the strategy which always defects can form symmetric equilibrium of the mutual reinforcement learning process amongst all deterministic memory-one strategies.

论文关键词:Repeated prisoner’s dilemma game,Reinforcement learning

论文评审过程:Received 8 February 2021, Revised 3 May 2021, Accepted 12 May 2021, Available online 1 June 2021, Version of Record 1 June 2021.

论文官网地址:https://doi.org/10.1016/j.amc.2021.126370