RL-VAEGAN: Adversarial defense for reinforcement learning agents via style transfer

作者:

Highlights:

摘要

Reinforcement learning (RL) agents parameterized by deep neural networks have achieved great success in many domains. However, deep RL policies have been shown to be vulnerable to adversarial attacks, i.e., inputs with slight perturbations should result in a substantial agent failure. Inspired by recent advances in deep generative networks that have greatly facilitated the development of adversarial attacks, in this paper, we investigate the adversarial robustness of RL agents and propose a novel defense framework for RL based on the idea of style transfer. More precisely, our defense framework containing variational autoencoders (VAEs) and generative adversarial networks (GANs), called RL-VAEGAN, learns the distribution of the styles of the original and adversarial states, respectively, and naturally eliminates the threat of adversarial attacks for RL agents by transferring adversarial states to unperturbed legitimate one under the shared-content latent space assumption. We empirically show that our methods are effective against the state-of-the-art methods in white-box and black-box scenarios with diverse magnitudes of perturbations.

论文关键词:Reinforcement learning,Trusted artificial intelligence,Robust agents,Adversarial attack,Adversarial defense

论文评审过程:Received 26 November 2020, Revised 14 March 2021, Accepted 15 March 2021, Available online 17 March 2021, Version of Record 23 March 2021.

论文官网地址:https://doi.org/10.1016/j.knosys.2021.106967