Related papers: Efficient Combinatorial Optimization for Word-leve…

Word-level Textual Adversarial Attacking as Combinatorial Optimization

Adversarial attacks are carried out to reveal the vulnerability of deep neural networks. Textual adversarial attacking is challenging because text is discrete and a small perturbation can bring significant change to the original input.…

Computation and Language · Computer Science 2020-12-10 Yuan Zang , Fanchao Qi , Chenghao Yang , Zhiyuan Liu , Meng Zhang , Qun Liu , Maosong Sun

Adversarial Attacks on Large Language Models Using Regularized Relaxation

As powerful Large Language Models (LLMs) are now widely used for numerous practical applications, their safety is of critical importance. While alignment techniques have significantly improved overall safety, LLMs remain vulnerable to…

Machine Learning · Computer Science 2024-10-28 Samuel Jacob Chacko , Sajib Biswas , Chashi Mahiul Islam , Fatema Tabassum Liza , Xiuwen Liu

Searching for an Effective Defender: Benchmarking Defense against Adversarial Word Substitution

Recent studies have shown that deep neural networks are vulnerable to intentionally crafted adversarial examples, and various methods have been proposed to defend against adversarial word-substitution attacks for neural NLP models. However,…

Computation and Language · Computer Science 2021-10-07 Zongyi Li , Jianhan Xu , Jiehang Zeng , Linyang Li , Xiaoqing Zheng , Qi Zhang , Kai-Wei Chang , Cho-Jui Hsieh

A Differentiable Language Model Adversarial Attack on Text Classifiers

Robustness of huge Transformer-based models for natural language processing is an important issue due to their capabilities and wide adoption. One way to understand and improve robustness of these models is an exploration of an adversarial…

Computation and Language · Computer Science 2021-07-26 Ivan Fursov , Alexey Zaytsev , Pavel Burnyshev , Ekaterina Dmitrieva , Nikita Klyuchnikov , Andrey Kravchenko , Ekaterina Artemova , Evgeny Burnaev

Learning to Attack: Towards Textual Adversarial Attacking in Real-world Situations

Adversarial attacking aims to fool deep neural networks with adversarial examples. In the field of natural language processing, various textual adversarial attack models have been proposed, varying in the accessibility to the victim model.…

Computation and Language · Computer Science 2020-09-22 Yuan Zang , Bairu Hou , Fanchao Qi , Zhiyuan Liu , Xiaojun Meng , Maosong Sun

Don't Search for a Search Method -- Simple Heuristics Suffice for Adversarial Text Attacks

Recently more attention has been given to adversarial attacks on neural networks for natural language processing (NLP). A central research topic has been the investigation of search algorithms and search constraints, accompanied by…

Computation and Language · Computer Science 2021-10-05 Nathaniel Berger , Stefan Riezler , Artem Sokolov , Sebastian Ebert

Generating Textual Adversaries with Minimal Perturbation

Many word-level adversarial attack approaches for textual data have been proposed in recent studies. However, due to the massive search space consisting of combinations of candidate words, the existing approaches face the problem of…

Computation and Language · Computer Science 2022-11-15 Xingyi Zhao , Lu Zhang , Depeng Xu , Shuhan Yuan

Adversarial Evasion Attack Efficiency against Large Language Models

Large Language Models (LLMs) are valuable for text classification, but their vulnerabilities must not be disregarded. They lack robustness against adversarial examples, so it is pertinent to understand the impacts of different types of…

Computation and Language · Computer Science 2024-06-13 João Vitorino , Eva Maia , Isabel Praça

Generating Natural Language Adversarial Examples through An Improved Beam Search Algorithm

The research of adversarial attacks in the text domain attracts many interests in the last few years, and many methods with a high attack success rate have been proposed. However, these attack methods are inefficient as they require lots of…

Computation and Language · Computer Science 2021-10-18 Tengfei Zhao , Zhaocheng Ge , Hanping Hu , Dingmeng Shi

Detection of Word Adversarial Examples in Text Classification: Benchmark and Baseline via Robust Density Estimation

Word-level adversarial attacks have shown success in NLP models, drastically decreasing the performance of transformer-based models in recent years. As a countermeasure, adversarial defense has been explored, but relatively few efforts have…

Computation and Language · Computer Science 2022-03-04 KiYoon Yoo , Jangho Kim , Jiho Jang , Nojun Kwak

Text-CRS: A Generalized Certified Robustness Framework against Textual Adversarial Attacks

The language models, especially the basic text classification models, have been shown to be susceptible to textual adversarial attacks such as synonym substitution and word insertion attacks. To defend against such attacks, a growing body…

Cryptography and Security · Computer Science 2024-06-12 Xinyu Zhang , Hanbin Hong , Yuan Hong , Peng Huang , Binghui Wang , Zhongjie Ba , Kui Ren

Generating Natural Language Attacks in a Hard Label Black Box Setting

We study an important and challenging task of attacking natural language processing models in a hard label black box setting. We propose a decision-based attack strategy that crafts high quality adversarial examples on text classification…

Computation and Language · Computer Science 2021-04-30 Rishabh Maheshwary , Saket Maheshwary , Vikram Pudi

Generating Valid and Natural Adversarial Examples with Large Language Models

Deep learning-based natural language processing (NLP) models, particularly pre-trained language models (PLMs), have been revealed to be vulnerable to adversarial attacks. However, the adversarial examples generated by many mainstream…

Computation and Language · Computer Science 2023-11-21 Zimu Wang , Wei Wang , Qi Chen , Qiufeng Wang , Anh Nguyen

Towards a Robust Deep Neural Network in Texts: A Survey

Deep neural networks (DNNs) have achieved remarkable success in various tasks (e.g., image classification, speech recognition, and natural language processing (NLP)). However, researchers have demonstrated that DNN-based models are…

Computation and Language · Computer Science 2021-04-22 Wenqi Wang , Run Wang , Lina Wang , Zhibo Wang , Aoshuang Ye

Detecting Textual Adversarial Examples Based on Distributional Characteristics of Data Representations

Although deep neural networks have achieved state-of-the-art performance in various machine learning tasks, adversarial examples, constructed by adding small non-random perturbations to correctly classified inputs, successfully fool highly…

Computation and Language · Computer Science 2022-05-02 Na Liu , Mark Dras , Wei Emma Zhang

Step by Step Loss Goes Very Far: Multi-Step Quantization for Adversarial Text Attacks

We propose a novel gradient-based attack against transformer-based language models that searches for an adversarial example in a continuous space of token probabilities. Our algorithm mitigates the gap between adversarial loss for…

Computation and Language · Computer Science 2023-02-13 Piotr Gaiński , Klaudia Bałazy

A Relaxed Optimization Approach for Adversarial Attacks against Neural Machine Translation Models

In this paper, we propose an optimization-based adversarial attack against Neural Machine Translation (NMT) models. First, we propose an optimization problem to generate adversarial examples that are semantically similar to the original…

Computation and Language · Computer Science 2023-06-16 Sahar Sadrizadeh , Clément Barbier , Ljiljana Dolamic , Pascal Frossard

LimeAttack: Local Explainable Method for Textual Hard-Label Adversarial Attack

Natural language processing models are vulnerable to adversarial examples. Previous textual adversarial attacks adopt gradients or confidence scores to calculate word importance ranking and generate adversarial examples. However, this…

Computation and Language · Computer Science 2024-01-11 Hai Zhu , Zhaoqing Yang , Weiwei Shang , Yuren Wu

A Simple Yet Efficient Method for Adversarial Word-Substitute Attack

NLP researchers propose different word-substitute black-box attacks that can fool text classification models. In such attack, an adversary keeps sending crafted adversarial queries to the target model until it can successfully achieve the…

Computation and Language · Computer Science 2022-06-13 Tianle Li , Yi Yang

"That Is a Suspicious Reaction!": Interpreting Logits Variation to Detect NLP Adversarial Attacks

Adversarial attacks are a major challenge faced by current machine learning research. These purposely crafted inputs fool even the most advanced models, precluding their deployment in safety-critical applications. Extensive research in…

Artificial Intelligence · Computer Science 2023-06-30 Edoardo Mosca , Shreyash Agarwal , Javier Rando , Georg Groh