Related papers: Enhancing the Rationale-Input Alignment for Self-e…

Learnable Game-theoretic Policy Optimization for Data-centric Self-explanation Rationalization

Rationalization, a data-centric framework, aims to build self-explanatory models to explain the prediction outcome by generating a subset of human-intelligible pieces of the input data. It involves a cooperative game model where a generator…

Artificial Intelligence · Computer Science 2025-10-16 Yunxiao Zhao , Zhiqiang Wang , Xingtong Yu , Xiaoli Li , Jiye Liang , Ru Li

Decoupled Rationalization with Asymmetric Learning Rates: A Flexible Lipschitz Restraint

A self-explaining rationalization model is generally constructed by a cooperative game where a generator selects the most human-intelligible pieces from the input text as rationales, followed by a predictor that makes predictions based on…

Machine Learning · Computer Science 2023-06-27 Wei Liu , Jun Wang , Haozhao Wang , Ruixuan Li , Yang Qiu , YuanKai Zhang , Jie Han , Yixiong Zou

Rethinking Cooperative Rationalization: Introspective Extraction and Complement Control

Selective rationalization has become a common mechanism to ensure that predictive models reveal how they use any available features. The selection may be soft or hard, and identifies a subset of input features relevant for prediction. The…

Computation and Language · Computer Science 2019-12-17 Mo Yu , Shiyu Chang , Yang Zhang , Tommi S. Jaakkola

Adversarial Cooperative Rationalization: The Risk of Spurious Correlations in Even Clean Datasets

This study investigates the self-rationalization framework constructed with a cooperative game, where a generator initially extracts the most informative segment from raw input, and a subsequent predictor utilizes the selected subset for…

Artificial Intelligence · Computer Science 2025-08-07 Wei Liu , Zhongyu Niu , Lang Gao , Zhiying Deng , Jun Wang , Haozhao Wang , Ruixuan Li

MGR: Multi-generator Based Rationalization

Rationalization is to employ a generator and a predictor to construct a self-explaining NLP model in which the generator selects a subset of human-intelligible pieces of the input text to the following predictor. However, rationalization…

Machine Learning · Computer Science 2023-07-25 Wei Liu , Haozhao Wang , Jun Wang , Ruixuan Li , Xinyang Li , Yuankai Zhang , Yang Qiu

Understanding Interlocking Dynamics of Cooperative Rationalization

Selective rationalization explains the prediction of complex neural networks by finding a small subset of the input that is sufficient to predict the neural model output. The selection mechanism is commonly integrated into the model itself…

Machine Learning · Computer Science 2021-10-27 Mo Yu , Yang Zhang , Shiyu Chang , Tommi S. Jaakkola

Rationalization through Concepts

Automated predictions require explanations to be interpretable by humans. One type of explanation is a rationale, i.e., a selection of input features such as relevant text snippets from which the model computes the outcome. However, a…

Computation and Language · Computer Science 2021-05-12 Diego Antognini , Boi Faltings

Truthful or Fabricated? Using Causal Attribution to Mitigate Reward Hacking in Explanations

Chain-of-thought explanations are widely used to inspect the decision process of large language models (LLMs) and to evaluate the trustworthiness of model outputs, making them important for effective collaboration between LLMs and humans.…

Computation and Language · Computer Science 2025-07-16 Pedro Ferreira , Wilker Aziz , Ivan Titov

Self-Aligned Reward: Towards Effective and Efficient Reasoners

Reinforcement learning with verifiable rewards has significantly advanced reasoning in large language models (LLMs), but such signals remain coarse, offering only binary correctness feedback. This limitation often results in inefficiencies,…

Machine Learning · Computer Science 2026-04-20 Peixuan Han , Adit Krishnan , Gerald Friedland , Jiaxuan You , Chris Kong

Distribution Matching for Rationalization

The task of rationalization aims to extract pieces of input text as rationales to justify neural network predictions on text classification tasks. By definition, rationales represent key text pieces used for prediction and thus should have…

Computation and Language · Computer Science 2021-06-02 Yongfeng Huang , Yujun Chen , Yulun Du , Zhilin Yang

Unsupervised Selective Rationalization with Noise Injection

A major issue with using deep learning models in sensitive applications is that they provide no explanation for their output. To address this problem, unsupervised selective rationalization produces rationales alongside predictions by…

Computation and Language · Computer Science 2023-05-30 Adam Storek , Melanie Subbiah , Kathleen McKeown

FAIRER: Fairness as Decision Rationale Alignment

Deep neural networks (DNNs) have made significant progress, but often suffer from fairness issues, as deep models typically show distinct accuracy differences among certain subgroups (e.g., males and females). Existing research addresses…

Machine Learning · Computer Science 2023-06-28 Tianlin Li , Qing Guo , Aishan Liu , Mengnan Du , Zhiming Li , Yang Liu

Towards Trustworthy Explanation: On Causal Rationalization

With recent advances in natural language processing, rationalization becomes an essential self-explaining diagram to disentangle the black box by selecting a subset of input texts to account for the major variation in prediction. Yet,…

Machine Learning · Computer Science 2023-09-12 Wenbo Zhang , Tong Wu , Yunlong Wang , Yong Cai , Hengrui Cai

Rationalization: A Neural Machine Translation Approach to Generating Natural Language Explanations

We introduce AI rationalization, an approach for generating explanations of autonomous system behavior as if a human had performed the behavior. We describe a rationalization technique that uses neural machine translation to translate…

Artificial Intelligence · Computer Science 2017-12-20 Upol Ehsan , Brent Harrison , Larry Chan , Mark O. Riedl

Making Large Language Models Better Reasoners with Alignment

Reasoning is a cognitive process of using evidence to reach a sound conclusion. The reasoning capability is essential for large language models (LLMs) to serve as the brain of the artificial general intelligence agent. Recent studies reveal…

Computation and Language · Computer Science 2023-09-06 Peiyi Wang , Lei Li , Liang Chen , Feifan Song , Binghuai Lin , Yunbo Cao , Tianyu Liu , Zhifang Sui

Explainability's Gain is Optimality's Loss? -- How Explanations Bias Decision-making

Decisions in organizations are about evaluating alternatives and choosing the one that would best serve organizational goals. To the extent that the evaluation of alternatives could be formulated as a predictive task with appropriate…

Human-Computer Interaction · Computer Science 2022-06-30 Charles Wan , Rodrigo Belo , Leid Zejnilović

Aligning Deep Implicit Preferences by Learning to Reason Defensively

Personalized alignment is crucial for enabling Large Language Models (LLMs) to engage effectively in user-centric interactions. However, current methods face a dual challenge: they fail to infer users' deep implicit preferences (including…

Artificial Intelligence · Computer Science 2026-04-29 Peiming Li , Zhiyuan Hu , Yang Tang , Shiyu Li , Xi Chen

Self-rationalization improves LLM as a fine-grained judge

LLM-as-a-judge models have been used for evaluating both human and AI generated content, specifically by providing scores and rationales. Rationales, in addition to increasing transparency, help models learn to calibrate its judgments.…

Computation and Language · Computer Science 2024-10-10 Prapti Trivedi , Aditya Gulati , Oliver Molenschot , Meghana Arakkal Rajeev , Rajkumar Ramamurthy , Keith Stevens , Tanveesh Singh Chaudhery , Jahnavi Jambholkar , James Zou , Nazneen Rajani

Beyond Templates: Dynamic Adaptation of Reasoning Demonstrations via Feasibility-Aware Exploration

Large language models (LLMs) have shown remarkable reasoning capabilities, yet aligning such abilities to small language models (SLMs) remains a challenge due to distributional mismatches and limited model capacity. Existing reasoning…

Computation and Language · Computer Science 2025-05-28 Yong Wu , Weihang Pan , Ke Li , Chen Binhui , Ping Li , Binbin Lin

Boosting Explainability through Selective Rationalization in Pre-trained Language Models

The widespread application of pre-trained language models (PLMs) in natural language processing (NLP) has led to increasing concerns about their explainability. Selective rationalization is a self-explanatory framework that selects…

Computation and Language · Computer Science 2025-01-07 Libing Yuan , Shuaibo Hu , Kui Yu , Le Wu