Effective Targeted Attacks for Adversarial Self-Supervised Learning

Minseon Kim; Hyeonjeong Ha; Sooel Son; Sung Ju Hwang

Effective Targeted Attacks for Adversarial Self-Supervised Learning

Machine Learning 2023-10-27 v2

Authors: Minseon Kim , Hyeonjeong Ha , Sooel Son , Sung Ju Hwang

Abstract

Recently, unsupervised adversarial training (AT) has been highlighted as a means of achieving robustness in models without any label information. Previous studies in unsupervised AT have mostly focused on implementing self-supervised learning (SSL) frameworks, which maximize the instance-wise classification loss to generate adversarial examples. However, we observe that simply maximizing the self-supervised training loss with an untargeted adversarial attack often results in generating ineffective adversaries that may not help improve the robustness of the trained model, especially for non-contrastive SSL frameworks without negative examples. To tackle this problem, we propose a novel positive mining for targeted adversarial attack to generate effective adversaries for adversarial SSL frameworks. Specifically, we introduce an algorithm that selects the most confusing yet similar target example for a given instance based on entropy and similarity, and subsequently perturbs the given instance towards the selected target. Our method demonstrates significant enhancements in robustness when applied to non-contrastive SSL frameworks, and less but consistent robustness improvements with contrastive SSL frameworks, on the benchmark datasets.

Keywords

adversarial training adversarial examples adversarial attack

Cite

@article{arxiv.2210.10482,
  title  = {Effective Targeted Attacks for Adversarial Self-Supervised Learning},
  author = {Minseon Kim and Hyeonjeong Ha and Sooel Son and Sung Ju Hwang},
  journal= {arXiv preprint arXiv:2210.10482},
  year   = {2023}
}

Comments

NeurIPS 2023

Effective Targeted Attacks for Adversarial Self-Supervised Learning

Abstract

Keywords

Cite

Comments

Related papers