English
Related papers

Related papers: Adversarial Training for Large Neural Language Mod…

200 papers

Despite the rapid progress of neural networks, they remain highly vulnerable to adversarial examples, for which adversarial training (AT) is currently the most effective defense. While AT has been extensively studied, its practical…

Machine Learning · Computer Science 2025-10-16 Yisen Wang , Yichuan Mo , Hongjun Wang , Junyi Li , Zhouchen Lin

Recent improvements in deep learning models and their practical applications have raised concerns about the robustness of these models against adversarial examples. Adversarial training (AT) has been shown effective to reach a robust model…

Machine Learning · Computer Science 2021-03-30 Mohammad Azizmalayeri , Mohammad Hossein Rohban

Fine-tuning pre-trained language models (LMs) has become the de facto standard in many NLP tasks. Nevertheless, fine-tuned LMs are still prone to robustness issues, such as adversarial robustness and model calibration. Several perspectives…

Computation and Language · Computer Science 2023-12-08 Jaehyung Kim , Yuning Mao , Rui Hou , Hanchao Yu , Davis Liang , Pascale Fung , Qifan Wang , Fuli Feng , Lifu Huang , Madian Khabsa

Modern language models often rely on Reinforcement Learning from Human Feedback (RLHF) to encourage safe behaviors. However, they remain vulnerable to adversarial attacks due to three key limitations: (1) the inefficiency and high cost of…

Acronym identification focuses on finding the acronyms and the phrases that have been abbreviated, which is crucial for scientific document understanding tasks. However, the limited size of manually annotated datasets hinders further…

Computation and Language · Computer Science 2021-01-13 Danqing Zhu , Wangli Lin , Yang Zhang , Qiwei Zhong , Guanxiong Zeng , Weilin Wu , Jiayu Tang

In Natural Language Processing (NLP), pretrained language models (LMs) that are transferred to downstream tasks have been recently shown to achieve state-of-the-art results. However, standard fine-tuning can degrade the general-domain…

Machine Learning · Computer Science 2020-10-07 Giorgos Vernikos , Katerina Margatina , Alexandra Chronopoulou , Ion Androutsopoulos

Adversarial training (AT) is an effective technique for enhancing adversarial robustness, but it usually comes at the cost of a decline in generalization ability. Recent studies have attempted to use clean training to assist adversarial…

Machine Learning · Computer Science 2025-04-02 MingWei Zhou , Xiaobing Pei

Neural networks are vulnerable to adversarial examples, i.e. inputs that are imperceptibly perturbed from natural data and yet incorrectly classified by the network. Adversarial training, a heuristic form of robust optimization that…

Machine Learning · Computer Science 2019-11-12 Ruiqi Gao , Tianle Cai , Haochuan Li , Liwei Wang , Cho-Jui Hsieh , Jason D. Lee

In recent years, there has been an explosion of research into developing more robust deep neural networks against adversarial examples. Adversarial training appears as one of the most successful methods. To deal with both the robustness…

Machine Learning · Computer Science 2023-03-21 Gaojie Jin , Xinping Yi , Dengyu Wu , Ronghui Mu , Xiaowei Huang

Adversarial training (AT) has been demonstrated as one of the most promising defense methods against various adversarial attacks. To our knowledge, existing AT-based methods usually train with the locally most adversarial perturbed points…

Computer Vision and Pattern Recognition · Computer Science 2021-09-07 Chuanbiao Song , Yanbo Fan , Yichen Yang , Baoyuan Wu , Yiming Li , Zhifeng Li , Kun He

The design of better automated dialogue evaluation metrics offers the potential of accelerate evaluation research on conversational AI. However, existing trainable dialogue evaluation models are generally restricted to classifiers trained…

Computation and Language · Computer Science 2021-04-19 Xiang Gao , Yizhe Zhang , Michel Galley , Bill Dolan

Fine-tuning large language models (LLMs) for alignment typically relies on supervised fine-tuning or reinforcement learning from human feedback, both limited by the cost and scarcity of high-quality annotations. Recent self-play and…

Machine Learning · Computer Science 2026-02-03 Shiguang Wu , Yaqing Wang , Quanming Yao

Large Language Models (LLMs) exhibit substantial capabilities yet encounter challenges, including hallucination, outdated knowledge, and untraceable reasoning processes. Retrieval-augmented generation (RAG) has emerged as a promising…

Artificial Intelligence · Computer Science 2024-06-03 Feiteng Fang , Yuelin Bai , Shiwen Ni , Min Yang , Xiaojun Chen , Ruifeng Xu

In the last a few decades, deep neural networks have achieved remarkable success in machine learning, computer vision, and pattern recognition. Recent studies however show that neural networks (both shallow and deep) may be easily fooled by…

Computer Vision and Pattern Recognition · Computer Science 2022-03-29 Zhuang Qian , Kaizhu Huang , Qiu-Feng Wang , Xu-Yao Zhang

While existing work in robust deep learning has focused on small pixel-level norm-based perturbations, this may not account for perturbations encountered in several real-world settings. In many such cases although test data might not be…

Computer Vision and Pattern Recognition · Computer Science 2021-04-09 Tejas Gokhale , Rushil Anirudh , Bhavya Kailkhura , Jayaraman J. Thiagarajan , Chitta Baral , Yezhou Yang

The release of large natural language inference (NLI) datasets like SNLI and MNLI have led to rapid development and improvement of completely neural systems for the task. Most recently, heavily pre-trained, Transformer-based models like…

Computation and Language · Computer Science 2019-12-10 Tiffany Chien , Jugal Kalita

Large Language Models (LLMs) have achieved impressive performance in text summarization and are increasingly deployed in real-world applications. However, these systems often inherit associative and framing biases from pre-training data,…

Computation and Language · Computer Science 2025-09-23 Mukur Gupta , Nikhil Reddy Varimalla , Nicholas Deas , Melanie Subbiah , Kathleen McKeown

In recent years, large pre-trained Transformer-based language models have led to dramatic improvements in many natural language understanding tasks. To train these models with increasing sizes, many neural network practitioners attempt to…

Machine Learning · Computer Science 2022-02-01 Minjia Zhang , Niranjan Uma Naresh , Yuxiong He

Adversarial training provides a means of regularizing supervised learning algorithms while virtual adversarial training is able to extend supervised learning algorithms to the semi-supervised setting. However, both methods require making…

Machine Learning · Statistics 2021-11-17 Takeru Miyato , Andrew M. Dai , Ian Goodfellow

We introduce a new large-scale NLI benchmark dataset, collected via an iterative, adversarial human-and-model-in-the-loop procedure. We show that training models on this new dataset leads to state-of-the-art performance on a variety of…

Computation and Language · Computer Science 2020-05-07 Yixin Nie , Adina Williams , Emily Dinan , Mohit Bansal , Jason Weston , Douwe Kiela
‹ Prev 1 4 5 6 7 8 10 Next ›