Related papers: Adversarial Training for Large Neural Language Mod…

Improving Machine Reading Comprehension via Adversarial Training

Adversarial training (AT) as a regularization method has proved its effectiveness in various tasks, such as image classification and text classification. Though there are successful applications of AT in many tasks of natural language…

Computation and Language · Computer Science 2019-11-12 Ziqing Yang , Yiming Cui , Wanxiang Che , Ting Liu , Shijin Wang , Guoping Hu

Improving Adversarial Robustness by Enforcing Local and Global Compactness

The fact that deep neural networks are susceptible to crafted perturbations severely impacts the use of deep learning in certain domains of application. Among many developed defense models against such attacks, adversarial training emerges…

Machine Learning · Computer Science 2020-07-13 Anh Bui , Trung Le , He Zhao , Paul Montague , Olivier deVel , Tamas Abraham , Dinh Phung

Improving the Robustness of Deep Neural Networks via Adversarial Training with Triplet Loss

Recent studies have highlighted that deep neural networks (DNNs) are vulnerable to adversarial examples. In this paper, we improve the robustness of DNNs by utilizing techniques of Distance Metric Learning. Specifically, we incorporate…

Machine Learning · Computer Science 2019-05-29 Pengcheng Li , Jinfeng Yi , Bowen Zhou , Lijun Zhang

Improved Adversarial Training via Learned Optimizer

Adversarial attack has recently become a tremendous threat to deep learning models. To improve the robustness of machine learning models, adversarial training, formulated as a minimax optimization problem, has been recognized as one of the…

Machine Learning · Computer Science 2020-04-28 Yuanhao Xiong , Cho-Jui Hsieh

Attacks Which Do Not Kill Training Make Adversarial Learning Stronger

Adversarial training based on the minimax formulation is necessary for obtaining adversarial robustness of trained models. However, it is conservative or even pessimistic so that it sometimes hurts the natural generalization. In this paper,…

Machine Learning · Computer Science 2020-09-08 Jingfeng Zhang , Xilie Xu , Bo Han , Gang Niu , Lizhen Cui , Masashi Sugiyama , Mohan Kankanhalli

A Self-supervised Approach for Adversarial Robustness

Adversarial examples can cause catastrophic mistakes in Deep Neural Network (DNNs) based vision systems e.g., for classification, segmentation and object detection. The vulnerability of DNNs against such attacks can prove a major roadblock…

Computer Vision and Pattern Recognition · Computer Science 2020-06-11 Muzammal Naseer , Salman Khan , Munawar Hayat , Fahad Shahbaz Khan , Fatih Porikli

Blind Adversarial Training: Balance Accuracy and Robustness

Adversarial training (AT) aims to improve the robustness of deep learning models by mixing clean data and adversarial examples (AEs). Most existing AT approaches can be grouped into restricted and unrestricted approaches. Restricted AT…

Machine Learning · Computer Science 2020-04-14 Haidong Xie , Xueshuang Xiang , Naijin Liu , Bin Dong

Criticality Leveraged Adversarial Training (CLAT) for Boosted Performance via Parameter Efficiency

Adversarial training enhances neural network robustness but suffers from a tendency to overfit and increased generalization errors on clean data. This work introduces CLAT, an innovative approach that mitigates adversarial overfitting by…

Machine Learning · Computer Science 2024-12-25 Bhavna Gopal , Huanrui Yang , Jingyang Zhang , Mark Horton , Yiran Chen

Better Robustness by More Coverage: Adversarial Training with Mixup Augmentation for Robust Fine-tuning

Pretrained language models (PLMs) perform poorly under adversarial attacks. To improve the adversarial robustness, adversarial data augmentation (ADA) has been widely adopted to cover more search space of adversarial attacks by adding…

Computation and Language · Computer Science 2021-06-08 Chenglei Si , Zhengyan Zhang , Fanchao Qi , Zhiyuan Liu , Yasheng Wang , Qun Liu , Maosong Sun

LAS-AT: Adversarial Training with Learnable Attack Strategy

Adversarial training (AT) is always formulated as a minimax problem, of which the performance depends on the inner optimization that involves the generation of adversarial examples (AEs). Most previous methods adopt Projected Gradient…

Computer Vision and Pattern Recognition · Computer Science 2022-03-15 Xiaojun Jia , Yong Zhang , Baoyuan Wu , Ke Ma , Jue Wang , Xiaochun Cao

Boosting Naturalness of Language in Task-oriented Dialogues via Adversarial Training

The natural language generation (NLG) module in a task-oriented dialogue system produces user-facing utterances conveying required information. Thus, it is critical for the generated response to be natural and fluent. We propose to…

Computation and Language · Computer Science 2020-05-07 Chenguang Zhu

Adversarial Neural Machine Translation

In this paper, we study a new learning paradigm for Neural Machine Translation (NMT). Instead of maximizing the likelihood of the human translation as in previous works, we minimize the distinction between human translation and the…

Computation and Language · Computer Science 2018-10-02 Lijun Wu , Yingce Xia , Li Zhao , Fei Tian , Tao Qin , Jianhuang Lai , Tie-Yan Liu

Adversarial Machine Learning at Scale

Adversarial examples are malicious inputs designed to fool machine learning models. They often transfer from one model to another, allowing attackers to mount black box attacks without knowledge of the target model's parameters. Adversarial…

Computer Vision and Pattern Recognition · Computer Science 2017-02-14 Alexey Kurakin , Ian Goodfellow , Samy Bengio

Robust Transfer Learning with Pretrained Language Models through Adapters

Transfer learning with large pretrained transformer-based language models like BERT has become a dominating approach for most NLP tasks. Simply fine-tuning those large language models on downstream tasks or combining it with task-specific…

Computation and Language · Computer Science 2021-08-06 Wenjuan Han , Bo Pang , Yingnian Wu

Non-Singular Adversarial Robustness of Neural Networks

Adversarial robustness has become an emerging challenge for neural network owing to its over-sensitivity to small input perturbations. While being critical, we argue that solving this singular issue alone fails to provide a comprehensive…

Machine Learning · Computer Science 2021-03-02 Yu-Lin Tsai , Chia-Yi Hsu , Chia-Mu Yu , Pin-Yu Chen

Defending Pre-trained Language Models from Adversarial Word Substitutions Without Performance Sacrifice

Pre-trained contextualized language models (PrLMs) have led to strong performance gains in downstream natural language understanding tasks. However, PrLMs can still be easily fooled by adversarial word substitution, which is one of the most…

Computation and Language · Computer Science 2021-06-01 Rongzhou Bao , Jiayi Wang , Hai Zhao

Robust Models are less Over-Confident

Despite the success of convolutional neural networks (CNNs) in many academic benchmarks for computer vision tasks, their application in the real-world is still facing fundamental challenges. One of these open problems is the inherent lack…

Computer Vision and Pattern Recognition · Computer Science 2022-12-07 Julia Grabinski , Paul Gavrikov , Janis Keuper , Margret Keuper

Adversarial Self-Attention for Language Understanding

Deep neural models (e.g. Transformer) naturally learn spurious features, which create a ``shortcut'' between the labels and inputs, thus impairing the generalization and robustness. This paper advances the self-attention mechanism to its…

Computation and Language · Computer Science 2023-02-09 Hongqiu Wu , Ruixue Ding , Hai Zhao , Pengjun Xie , Fei Huang , Min Zhang

Soft Adversarial Training Can Retain Natural Accuracy

Adversarial training for neural networks has been in the limelight in recent years. The advancement in neural network architectures over the last decade has led to significant improvement in their performance. It sparked an interest in…

Machine Learning · Computer Science 2022-06-07 Abhijith Sharma , Apurva Narayan

Are aligned neural networks adversarially aligned?

Large language models are now tuned to align with the goals of their creators, namely to be "helpful and harmless." These models should respond helpfully to user questions, but refuse to answer requests that could cause harm. However,…

Computation and Language · Computer Science 2024-05-07 Nicholas Carlini , Milad Nasr , Christopher A. Choquette-Choo , Matthew Jagielski , Irena Gao , Anas Awadalla , Pang Wei Koh , Daphne Ippolito , Katherine Lee , Florian Tramer , Ludwig Schmidt