Related papers: Executable Counterfactuals: Improving LLMs' Causal…

On the Eligibility of LLMs for Counterfactual Reasoning: A Decompositional Study

Counterfactual reasoning has emerged as a crucial technique for generalizing the reasoning capabilities of large language models (LLMs). By generating and analyzing counterfactual scenarios, researchers can assess the adaptability and…

Artificial Intelligence · Computer Science 2026-02-17 Shuai Yang , Qi Yang , Luoxi Tang , Yuqiao Meng , Nancy Guo , Jeremy Blackburn , Zhaohan Xi

Using LLMs for Explaining Sets of Counterfactual Examples to Final Users

Causality is vital for understanding true cause-and-effect relationships between variables within predictive models, rather than relying on mere correlations, making it highly relevant in the field of Explainable AI. In an automated…

Machine Learning · Computer Science 2024-08-28 Arturo Fredes , Jordi Vitria

CounterBench: Evaluating and Improving Counterfactual Reasoning in Large Language Models

Counterfactual reasoning is widely recognized as one of the most challenging and intricate aspects of causality in artificial intelligence. In this paper, we evaluate the performance of large language models (LLMs) in counterfactual…

Computation and Language · Computer Science 2026-04-14 Yuefei Chen , Vivek K. Singh , Jing Ma , Ruixiang Tang

Better Think Thrice: Learning to Reason Causally with Double Counterfactual Consistency

Despite their strong performance on reasoning benchmarks, large language models (LLMs) have proven brittle when presented with counterfactual questions, suggesting weaknesses in their causal reasoning ability. While recent work has…

Machine Learning · Computer Science 2026-02-20 Victoria Lin , Xinnuo Xu , Rachel Lawrence , Risa Ueno , Amit Sharma , Javier Gonzalez , Niranjani Prasad

Counterfactual Explanations for Continuous Action Reinforcement Learning

Reinforcement Learning (RL) has shown great promise in domains like healthcare and robotics but often struggles with adoption due to its lack of interpretability. Counterfactual explanations, which address "what if" scenarios, provide a…

Machine Learning · Computer Science 2025-05-20 Shuyang Dong , Shangtong Zhang , Lu Feng

Trustworthy Reasoning: Evaluating and Enhancing Factual Accuracy in LLM Intermediate Thought Processes

We present a novel framework addressing a critical vulnerability in Large Language Models (LLMs): the prevalence of factual inaccuracies within intermediate reasoning steps despite correct final answers. This phenomenon poses substantial…

Computation and Language · Computer Science 2025-08-05 Rui Jiao , Yue Zhang , Jinku Li

Counterfactual Simulatability of LLM Explanations for Generation Tasks

LLMs can be unpredictable, as even slight alterations to the prompt can cause the output to change in unexpected ways. Thus, the ability of models to accurately explain their behavior is critical, especially in high-stakes settings. One…

Computation and Language · Computer Science 2025-11-26 Marvin Limpijankit , Yanda Chen , Melanie Subbiah , Nicholas Deas , Kathleen McKeown

Thinking Fast, Thinking Wrong: Intuitiveness Modulates LLM Counterfactual Reasoning in Policy Evaluation

Large language models (LLMs) are increasingly used for causal and counterfactual reasoning, yet their reliability in real-world policy evaluation remains underexplored. We construct a benchmark of 40 empirical policy evaluation cases drawn…

Artificial Intelligence · Computer Science 2026-05-29 Yanjie He

Reasoning or Reciting? Exploring the Capabilities and Limitations of Language Models Through Counterfactual Tasks

The impressive performance of recent language models across a wide range of tasks suggests that they possess a degree of abstract reasoning skills. Are these skills general and transferable, or specialized to specific tasks seen during…

Computation and Language · Computer Science 2024-04-01 Zhaofeng Wu , Linlu Qiu , Alexis Ross , Ekin Akyürek , Boyuan Chen , Bailin Wang , Najoung Kim , Jacob Andreas , Yoon Kim

Reasoning Elicitation in Language Models via Counterfactual Feedback

Despite the increasing effectiveness of language models, their reasoning capabilities remain underdeveloped. In particular, causal reasoning through counterfactual question answering is lacking. This work aims to bridge this gap. We first…

Computation and Language · Computer Science 2025-03-18 Alihan Hüyük , Xinnuo Xu , Jacqueline Maasch , Aditya V. Nori , Javier González

Unveiling the Magic of Code Reasoning through Hypothesis Decomposition and Amendment

The reasoning abilities are one of the most enigmatic and captivating aspects of large language models (LLMs). Numerous studies are dedicated to exploring and expanding the boundaries of this reasoning capability. However, tasks that embody…

Artificial Intelligence · Computer Science 2025-02-27 Yuze Zhao , Tianyun Ji , Wenjun Feng , Zhenya Huang , Qi Liu , Zhiding Liu , Yixiao Ma , Kai Zhang , Enhong Chen

Language Models as Compilers: Simulating Pseudocode Execution Improves Algorithmic Reasoning in Language Models

Algorithmic reasoning refers to the ability to understand the complex patterns behind the problem and decompose them into a sequence of reasoning steps towards the solution. Such nature of algorithmic reasoning makes it a challenge for…

Computation and Language · Computer Science 2024-04-04 Hyungjoo Chae , Yeonghyeon Kim , Seungone Kim , Kai Tzu-iunn Ong , Beong-woo Kwak , Moohyeon Kim , Seonghwan Kim , Taeyoon Kwon , Jiwan Chung , Youngjae Yu , Jinyoung Yeo

Cofca: A Step-Wise Counterfactual Multi-hop QA benchmark

While Large Language Models (LLMs) excel in question-answering (QA) tasks, their real reasoning abilities on multiple evidence retrieval and integration on Multi-hop QA tasks remain less explored. Firstly, LLMs sometimes generate answers…

Computation and Language · Computer Science 2024-10-16 Jian Wu , Linyi Yang , Zhen Wang , Manabu Okumura , Yue Zhang

CausalFlow: Causal Attribution and Counterfactual Repair for LLM Agent Failures

Large language model (LLM) agents frequently fail on multi-step tasks involving reasoning, tool use, and environment interaction. While such failures are typically logged or retried heuristically, they contain structured signals about where…

Machine Learning · Computer Science 2026-05-26 Akash Bonagiri , Devang Borkar , Gerard Janno Anderias , Setareh Rafatirad , Houman Homayoun

Towards Generalizable Reasoning: Group Causal Counterfactual Policy Optimization for LLM Reasoning

Large language models (LLMs) excel at complex tasks with advances in reasoning capabilities. However, existing reward mechanisms remain tightly coupled to final correctness and pay little attention to the underlying reasoning process:…

Machine Learning · Computer Science 2026-05-14 Jingyao Wang , Peizheng Guo , Wenwen Qiang , Jiahuan Zhou , Huijie Guo , Changwen Zheng , Hui Xiong

Counterfactual Data Augmentation using Locally Factored Dynamics

Many dynamic processes, including common scenarios in robotic control and reinforcement learning (RL), involve a set of interacting subprocesses. Though the subprocesses are not independent, their interactions are often sparse, and the…

Machine Learning · Computer Science 2020-12-07 Silviu Pitis , Elliot Creager , Animesh Garg

Counterfactual Explanations and Algorithmic Recourses for Machine Learning: A Review

Machine learning plays a role in many deployed decision systems, often in ways that are difficult or impossible to understand by human stakeholders. Explaining, in a human-understandable way, the relationship between the input and output of…

Machine Learning · Computer Science 2022-11-17 Sahil Verma , Varich Boonsanong , Minh Hoang , Keegan E. Hines , John P. Dickerson , Chirag Shah

CauseJudger: Identifying the Cause with LLMs for Abductive Logical Reasoning

Large language models (LLMs) have been utilized in solving diverse reasoning tasks, encompassing common sense, arithmetic and deduction tasks. However, with difficulties of reversing thinking patterns and irrelevant premises, how to…

Artificial Intelligence · Computer Science 2024-09-10 Jinwei He , Feng Lu

Demystifying Errors in LLM Reasoning Traces: An Empirical Study of Code Execution Simulation

Understanding a program's runtime reasoning behavior, meaning how intermediate states and control flows lead to final execution results, is essential for reliable code generation, debugging, and automated reasoning. Although large language…

Software Engineering · Computer Science 2025-12-02 Mohammad Abdollahi , Khandaker Rifah Tasnia , Soumit Kanti Saha , Jinqiu Yang , Song Wang , Hadi Hemmati

Counterfactual Collaborative Reasoning

Causal reasoning and logical reasoning are two important types of reasoning abilities for human intelligence. However, their relationship has not been extensively explored under machine intelligence context. In this paper, we explore how…

Information Retrieval · Computer Science 2023-07-06 Jianchao Ji , Zelong Li , Shuyuan Xu , Max Xiong , Juntao Tan , Yingqiang Ge , Hao Wang , Yongfeng Zhang