English
Related papers

Related papers: Adversarial Training for Large Neural Language Mod…

200 papers

Adversarial training, which minimizes the maximal risk for label-preserving input perturbations, has proved to be effective for improving the generalization of language models. In this work, we propose a novel adversarial training…

Computation and Language · Computer Science 2020-04-24 Chen Zhu , Yu Cheng , Zhe Gan , Siqi Sun , Tom Goldstein , Jingjing Liu

Large language models (LLMs) can often be made to behave in undesirable ways that they are explicitly fine-tuned not to. For example, the LLM red-teaming literature has produced a wide variety of 'jailbreaking' techniques to elicit harmful…

Given the widespread use of deep learning models in safety-critical applications, ensuring that the decisions of such models are robust against adversarial exploitation is of fundamental importance. In this thesis, we discuss recent…

Machine Learning · Computer Science 2025-09-24 Alexander Robey

Adversarial training is widely acknowledged as the most effective defense against adversarial attacks. However, it is also well established that achieving both robustness and generalization in adversarially trained models involves a…

Computation and Language · Computer Science 2023-12-12 Enes Altinisik , Hassan Sajjad , Husrev Taha Sencar , Safa Messaoud , Sanjay Chawla

The fine-tuning of pre-trained language models has a great success in many NLP fields. Yet, it is strikingly vulnerable to adversarial examples, e.g., word substitution attacks using only synonyms can easily fool a BERT-based sentiment…

Computation and Language · Computer Science 2021-12-23 Xinhsuai Dong , Luu Anh Tuan , Min Lin , Shuicheng Yan , Hanwang Zhang

Adversarial attacks pose a significant threat to the reliability of pre-trained language models (PLMs) such as GPT, BERT, RoBERTa, and T5. This paper presents Adversarial Robustness through Dynamic Ensemble Learning (ARDEL), a novel scheme…

Cryptography and Security · Computer Science 2025-05-14 Hetvi Waghela , Jaydip Sen , Sneha Rakshit

Adversarial training, a method for learning robust deep neural networks, constructs adversarial examples during training. However, recent methods for generating NLP adversarial examples involve combinatorial search and expensive sentence…

Computation and Language · Computer Science 2021-09-14 Jin Yong Yoo , Yanjun Qi

Recently, substantial progress has been made in language modeling by using deep neural networks. However, in practice, large scale neural language models have been shown to be prone to overfitting. In this paper, we present a simple yet…

Machine Learning · Computer Science 2019-09-10 Dilin Wang , Chengyue Gong , Qiang Liu

Adversarial training (AT) is one of the most reliable methods for defending against adversarial attacks in machine learning. Variants of this method have been used as regularization mechanisms to achieve SOTA results on NLP benchmarks, and…

Computation and Language · Computer Science 2021-09-30 Javid Ebrahimi , Hao Yang , Wei Zhang

Neural language models show vulnerability to adversarial examples which are semantically similar to their original counterparts with a few words replaced by their synonyms. A common way to improve model robustness is adversarial training…

Computation and Language · Computer Science 2022-03-25 Hanjie Chen , Yangfeng Ji

For years, adversarial training has been extensively studied in natural language processing (NLP) settings. The main goal is to make models robust so that similar inputs derive in semantically similar outcomes, which is not a trivial…

Computation and Language · Computer Science 2021-09-21 Daniela N. Rim , DongNyeong Heo , Heeyoul Choi

Adversarial training is a widely-applied approach to training deep neural networks to be robust against adversarial perturbation. However, although adversarial training has achieved empirical success in practice, it still remains unclear…

Machine Learning · Computer Science 2025-02-10 Binghui Li , Yuanzhi Li

Modern machine learning and deep learning models are shown to be vulnerable when testing data are slightly perturbed. Existing theoretical studies of adversarial training algorithms mostly focus on either adversarial training losses or…

Machine Learning · Statistics 2021-04-07 Yue Xing , Qifan Song , Guang Cheng

Neural networks are vulnerable to adversarial attacks: adding well-crafted, imperceptible perturbations to their input can modify their output. Adversarial training is one of the most effective approaches in training robust models against…

Machine Learning · Computer Science 2022-07-20 Hadi M. Dolatabadi , Sarah Erfani , Christopher Leckie

Adversarial training is an effective learning technique to improve the robustness of deep neural networks. In this study, the influence of adversarial training on deep learning models in terms of fairness, robustness, and generalization is…

Machine Learning · Computer Science 2023-05-19 Xiaoling Zhou , Nan Yang , Ou Wu

Fine-tuning large-scale pre-trained language models has been demonstrated effective for various natural language processing (NLP) tasks. Previous studies have established that incorporating adversarial training during the fine-tuning stage…

Computation and Language · Computer Science 2023-06-29 Zhehua Zhong , Tianyi Chen , Zhen Wang

Large pre-trained Vision-Language Models (VLMs) like CLIP, despite having remarkable generalization ability, are highly vulnerable to adversarial examples. This work studies the adversarial robustness of VLMs from the novel perspective of…

Computer Vision and Pattern Recognition · Computer Science 2024-03-05 Lin Li , Haoyan Guan , Jianing Qiu , Michael Spratling

Adversarial training, which is to enhance robustness against adversarial attacks, has received much attention because it is easy to generate human-imperceptible perturbations of data to deceive a given deep neural network. In this paper, we…

Machine Learning · Statistics 2023-06-02 Dongyoon Yang , Insung Kong , Yongdai Kim

While deep learning in the form of recurrent neural networks (RNNs) has caused a significant improvement in neural language modeling, the fact that they are extremely prone to overfitting is still a mainly unresolved issue. In this paper we…

Computation and Language · Computer Science 2022-11-18 Sajad Movahedi , Azadeh Shakery

While deep neural networks have achieved remarkable success in various computer vision tasks, they often fail to generalize to new domains and subtle variations of input images. Several defenses have been proposed to improve the robustness…

Computer Vision and Pattern Recognition · Computer Science 2021-09-08 Omid Poursaeed , Tianxing Jiang , Harry Yang , Serge Belongie , SerNam Lim
‹ Prev 1 2 3 10 Next ›