Defending against attacks tailored to transfer learning via feature distancing

作者:

Highlights:

摘要

Transfer learning is preferable for training a deep neural network with a small training dataset by leveraging a pre-trained teacher model. However, transfer learning opens a door for new attacks that generate adversarial examples using the pre-trained teacher model. In this paper, we propose a novel method called feature distancing to defend against adversarial attacks tailored to transfer learning. The method aims to train a student model with a distinct feature representation from the teacher model. We generate adversarial examples of the mimic attack with the teacher model, and the examples are used to train the student model. We use triplet loss to put the mimic attack examples close to their source images and far from their target images in the feature space of the student model. The proposed method is evaluated on three different transfer learning tasks with diverse attack configurations. It is the only method that achieves high “robust accuracy” and high “test accuracy” on every task we evaluate.

论文关键词:

论文评审过程:Received 22 December 2021, Revised 1 August 2022, Accepted 3 August 2022, Available online 9 August 2022, Version of Record 19 August 2022.

论文官网地址:https://doi.org/10.1016/j.cviu.2022.103533