English
Related papers

Related papers: Fooling Explanations in Text Classifiers

200 papers

Interpretable time series deep learning systems are often assessed by checking temporal consistency on explanations, implicitly treating this as evidence of robustness. We show that this assumption can fail: Predictions and explanations can…

Machine Learning · Computer Science 2026-02-10 Bohan Wang , Zewen Liu , Lu Lin , Hui Liu , Li Xiong , Ming Jin , Wei Jin

Explanations are crucial parts of deep neural network (DNN) classifiers. In high stakes applications, faithful and robust explanations are important to understand and gain trust in DNN classifiers. However, recent work has shown that…

Machine Learning · Computer Science 2022-12-20 Adam Ivankay , Mattia Rigotti , Ivan Girardi , Chiara Marchiori , Pascal Frossard

Deep visual models are susceptible to adversarial perturbations to inputs. Although these signals are carefully crafted, they still appear noise-like patterns to humans. This observation has led to the argument that deep visual…

Computer Vision and Pattern Recognition · Computer Science 2021-06-22 Naveed Akhtar , Muhammad A. A. K. Jalwana , Mohammed Bennamoun , Ajmal Mian

Robustness of huge Transformer-based models for natural language processing is an important issue due to their capabilities and wide adoption. One way to understand and improve robustness of these models is an exploration of an adversarial…

Explanation methods have emerged as an important tool to highlight the features responsible for the predictions of neural networks. There is mounting evidence that many explanation methods are rather unreliable and susceptible to malicious…

Computation and Language · Computer Science 2022-06-27 Shriya Atmakuri , Tejas Chheda , Dinesh Kandula , Nishant Yadav , Taesung Lee , Hessel Tuinhof

Deep learning classifiers achieve state-of-the-art performance in various risk detection applications. They explore rich semantic representations and are supposed to automatically discover risk behaviors. However, due to the lack of…

Cryptography and Security · Computer Science 2025-05-15 Yiling He , Jian Lou , Zhan Qin , Kui Ren

Transformer-based text classifiers such as BERT, RoBERTa, T5, and GPT have shown strong performance in natural language processing tasks but remain vulnerable to adversarial examples. These vulnerabilities raise significant security…

Computation and Language · Computer Science 2025-10-27 Bushra Sabir , Yansong Gao , Alsharif Abuadbba , M. Ali Babar

Text normalization is a ubiquitous process that appears as the first step of many Natural Language Processing problems. However, previous Deep Learning approaches have suffered from so-called silly errors, which are undetectable on…

Computation and Language · Computer Science 2019-03-08 Adrián Javaloy Bornás , Ginés García Mateos

We introduce SelfExplain, a novel self-explaining model that explains a text classifier's predictions using phrase-based concepts. SelfExplain augments existing neural classifiers by adding (1) a globally interpretable layer that identifies…

Computation and Language · Computer Science 2021-09-09 Dheeraj Rajagopal , Vidhisha Balachandran , Eduard Hovy , Yulia Tsvetkov

Despite outstanding performance in a variety of NLP tasks, recent studies have revealed that NLP models are vulnerable to adversarial attacks that slightly perturb the input to cause the models to misbehave. Among these attacks, adversarial…

Computation and Language · Computer Science 2024-06-11 Duy C. Hoang , Quang H. Nguyen , Saurav Manchanda , MinLong Peng , Kok-Seng Wong , Khoa D. Doan

Building explainable systems is a critical problem in the field of Natural Language Processing (NLP), since most machine learning models provide no explanations for the predictions. Existing approaches for explainable machine learning…

Computation and Language · Computer Science 2019-06-12 Hui Liu , Qingyu Yin , William Yang Wang

Machine learning algorithms are often vulnerable to adversarial examples that have imperceptible alterations from the original counterparts but can fool the state-of-the-art models. It is helpful to evaluate or even improve the robustness…

Computation and Language · Computer Science 2020-04-10 Di Jin , Zhijing Jin , Joey Tianyi Zhou , Peter Szolovits

State-of-the-art deep neural networks have achieved impressive results on many image classification tasks. However, these same architectures have been shown to be unstable to small, well sought, perturbations of the images. Despite the…

Machine Learning · Computer Science 2016-08-30 Seyed-Mohsen Moosavi-Dezfooli , Alhussein Fawzi , Pascal Frossard

Feature attribution methods highlight the important input tokens as explanations to model predictions, which have been widely applied to deep neural networks towards trustworthy AI. However, recent works show that explanations provided by…

Computation and Language · Computer Science 2024-01-01 Dongfang Li , Baotian Hu , Qingcai Chen , Shan He

In this paper, we present an effective method to craft text adversarial samples, revealing one important yet underestimated fact that DNN-based text classifiers are also prone to adversarial sample attack. Specifically, confronted with…

Cryptography and Security · Computer Science 2019-01-08 Bin Liang , Hongcheng Li , Miaoqiang Su , Pan Bian , Xirong Li , Wenchang Shi

Most machine learning classifiers, including deep neural networks, are vulnerable to adversarial examples. Such inputs are typically generated by adding small but purposeful modifications that lead to incorrect outputs while imperceptible…

Machine Learning · Computer Science 2017-09-28 Beilun Wang , Ji Gao , Yanjun Qi

Given a state-of-the-art deep neural network text classifier, we show the existence of a universal and very small perturbation vector (in the embedding space) that causes natural text to be misclassified with high probability. Unlike images…

Computation and Language · Computer Science 2019-10-11 Hang Gao , Tim Oates

Deep neural networks are vulnerable to adversarial attacks, where a small perturbation to an input alters the model prediction. In many cases, malicious inputs intentionally crafted for one model can fool another model. In this paper, we…

Machine Learning · Computer Science 2021-09-23 Liping Yuan , Xiaoqing Zheng , Yi Zhou , Cho-Jui Hsieh , Kai-wei Chang

Text classification is a very common task nowadays and there are many efficient methods and algorithms that we can employ to accomplish it. Transformers have revolutionized the field of deep learning, particularly in Natural Language…

Machine Learning · Computer Science 2024-12-31 Christos Petridis

The susceptibility of deep neural networks (DNNs) to adversarial attacks undermines their reliability across numerous applications, underscoring the necessity for an in-depth exploration of these vulnerabilities and the formulation of…

Computer Vision and Pattern Recognition · Computer Science 2025-04-15 S. M. Fazle Rabby Labib , Joyanta Jyoti Mondal , Meem Arafat Manab , Xi Xiao , Sarfaraz Newaz
‹ Prev 1 2 3 10 Next ›