Related papers: A Practical Method for Generating String Counterfa…

Abstract Counterfactuals for Language Model Agents

Counterfactual inference is a powerful tool for analysing and evaluating autonomous agents, but its application to language model (LM) agents remains challenging. Existing work on counterfactuals in LMs has primarily focused on token-level…

Machine Learning · Computer Science 2025-06-04 Edoardo Pona , Milad Kazemi , Yali Du , David Watson , Nicola Paoletti

Does Using Counterfactual Help LLMs Explain Textual Importance in Classification?

Large language models (LLMs) are becoming useful in many domains due to their impressive abilities that arise from large training datasets and large model sizes. More recently, they have been shown to be very effective in textual…

Computation and Language · Computer Science 2025-10-07 Nelvin Tan , James Asikin Cheung , Yu-Ching Shih , Dong Yang , Amol Salunkhe

Explaining Text Classifiers with Counterfactual Representations

One well motivated explanation method for classifiers leverages counterfactuals which are hypothetical events identical to real observations in all aspects except for one feature. Constructing such counterfactual poses specific challenges…

Machine Learning · Computer Science 2024-09-12 Pirmin Lemberger , Antoine Saillenfest

A Survey on Natural Language Counterfactual Generation

Natural language counterfactual generation aims to minimally modify a given text such that the modified text will be classified into a different class. The generated counterfactuals provide insight into the reasoning behind a model's…

Computation and Language · Computer Science 2024-10-08 Yongjie Wang , Xiaoqi Qiu , Yu Yue , Xu Guo , Zhiwei Zeng , Yuhong Feng , Zhiqi Shen

Optimal and efficient text counterfactuals using Graph Neural Networks

As NLP models become increasingly integral to decision-making processes, the need for explainability and interpretability has become paramount. In this work, we propose a framework that achieves the aforementioned by generating semantically…

Computation and Language · Computer Science 2025-08-04 Dimitris Lymperopoulos , Maria Lymperaiou , Giorgos Filandrianos , Giorgos Stamou

Gumbel Counterfactual Generation From Language Models

Understanding and manipulating the causal generation mechanisms in language models is essential for controlling their behavior. Previous work has primarily relied on techniques such as representation surgery -- e.g., model ablations or…

Computation and Language · Computer Science 2025-03-07 Shauli Ravfogel , Anej Svete , Vésteinn Snæbjarnarson , Ryan Cotterell

Model-agnostic and Scalable Counterfactual Explanations via Reinforcement Learning

Counterfactual instances are a powerful tool to obtain valuable insights into automated decision processes, describing the necessary minimal changes in the input space to alter the prediction towards a desired target. Most previous…

Machine Learning · Computer Science 2021-06-07 Robert-Florian Samoilescu , Arnaud Van Looveren , Janis Klaise

A Comparative Analysis of Counterfactual Explanation Methods for Text Classifiers

Counterfactual explanations can be used to interpret and debug text classifiers by producing minimally altered text inputs that change a classifier's output. In this work, we evaluate five methods for generating counterfactual explanations…

Computation and Language · Computer Science 2024-11-06 Stephen McAleese , Mark Keane

What if This Modified That? Syntactic Interventions via Counterfactual Embeddings

Neural language models exhibit impressive performance on a variety of tasks, but their internal reasoning may be difficult to understand. Prior art aims to uncover meaningful properties within model representations via probes, but it is…

Computation and Language · Computer Science 2021-09-21 Mycal Tucker , Peng Qian , Roger Levy

Counterfactual Interventions Reveal the Causal Effect of Relative Clause Representations on Agreement Prediction

When language models process syntactically complex sentences, do they use their representations of syntax in a manner that is consistent with the grammar of the language? We propose AlterRep, an intervention-based method to address this…

Computation and Language · Computer Science 2021-09-16 Shauli Ravfogel , Grusha Prasad , Tal Linzen , Yoav Goldberg

Counterfactual Augmentation for Multimodal Learning Under Presentation Bias

In real-world machine learning systems, labels are often derived from user behaviors that the system wishes to encourage. Over time, new models must be trained as new training examples and features become available. However, feedback loops…

Machine Learning · Computer Science 2023-11-01 Victoria Lin , Louis-Philippe Morency , Dimitrios Dimitriadis , Srinagesh Sharma

Large Language Models as Nondeterministic Causal Models

Recent work by Chatzi et al. and Ravfogel et al. has developed, for the first time, a method for generating counterfactuals of probabilistic Large Language Models. Such counterfactuals tell us what would - or might - have been the output of…

Artificial Intelligence · Computer Science 2026-04-21 Sander Beckers

Template-Based Probes Are Imperfect Lenses for Counterfactual Bias Evaluation in LLMs

Bias in large language models (LLMs) has many forms, from overt discrimination to implicit stereotypes. Counterfactual bias evaluation is a widely used approach to quantifying bias and often relies on template-based probes that explicitly…

Computation and Language · Computer Science 2026-01-15 Farnaz Kohankhaki , D. B. Emerson , Jacob-Junqi Tian , Laleh Seyyed-Kalantari , Faiza Khan Khattak

Counterfactual Language Model Adaptation for Suggesting Phrases

Mobile devices use language models to suggest words and phrases for use in text entry. Traditional language models are based on contextual word frequency in a static corpus of text. However, certain types of phrases, when offered to writers…

Computation and Language · Computer Science 2017-10-06 Kenneth C. Arnold , Kai-Wei Chang , Adam T. Kalai

Empowering Language Understanding with Counterfactual Reasoning

Present language understanding methods have demonstrated extraordinary ability of recognizing patterns in texts via machine learning. However, existing methods indiscriminately use the recognized patterns in the testing phase that is…

Computation and Language · Computer Science 2021-06-08 Fuli Feng , Jizhi Zhang , Xiangnan He , Hanwang Zhang , Tat-Seng Chua

Flexible text generation for counterfactual fairness probing

A common approach for testing fairness issues in text-based classifiers is through the use of counterfactuals: does the classifier output change if a sensitive attribute in the input is changed? Existing counterfactual generation methods…

Computation and Language · Computer Science 2022-06-29 Zee Fryer , Vera Axelrod , Ben Packer , Alex Beutel , Jilin Chen , Kellie Webster

Counterfactuals uncover the modular structure of deep generative models

Deep generative models can emulate the perceptual properties of complex image datasets, providing a latent representation of the data. However, manipulating such representation to perform meaningful and controllable transformations in the…

Machine Learning · Computer Science 2019-12-13 Michel Besserve , Arash Mehrjou , Rémy Sun , Bernhard Schölkopf

Text Counterfactuals via Latent Optimization and Shapley-Guided Search

We study the problem of generating counterfactual text for a classifier as a means for understanding and debugging classification. Given a textual input and a classification model, we aim to minimally alter the text to change the model's…

Computation and Language · Computer Science 2021-10-25 Quintin Pope , Xiaoli Z. Fern

On the Eligibility of LLMs for Counterfactual Reasoning: A Decompositional Study

Counterfactual reasoning has emerged as a crucial technique for generalizing the reasoning capabilities of large language models (LLMs). By generating and analyzing counterfactual scenarios, researchers can assess the adaptability and…

Artificial Intelligence · Computer Science 2026-02-17 Shuai Yang , Qi Yang , Luoxi Tang , Yuqiao Meng , Nancy Guo , Jeremy Blackburn , Zhaohan Xi

Data Augmentations for Improved (Large) Language Model Generalization

The reliance of text classifiers on spurious correlations can lead to poor generalization at deployment, raising concerns about their use in safety-critical domains such as healthcare. In this work, we propose to use counterfactual data…

Machine Learning · Computer Science 2024-01-10 Amir Feder , Yoav Wald , Claudia Shi , Suchi Saria , David Blei