Related papers: Large Language Models Cannot Explain Themselves

Evaluating the Reliability of Self-Explanations in Large Language Models

This paper investigates the reliability of explanations generated by large language models (LLMs) when prompted to explain their previous output. We evaluate two kinds of such self-explanations - extractive and counterfactual - using three…

Computation and Language · Computer Science 2025-02-03 Korbinian Randl , John Pavlopoulos , Aron Henriksson , Tony Lindgren

Properties and Challenges of LLM-Generated Explanations

The self-rationalising capabilities of large language models (LLMs) have been explored in restricted settings, using task/specific data sets. However, current LLMs do not (only) rely on specifically annotated data; nonetheless, they…

Computation and Language · Computer Science 2024-12-18 Jenny Kunz , Marco Kuhlmann

Explainability of Large Language Models: Opportunities and Challenges toward Generating Trustworthy Explanations

Large language models have exhibited impressive performance across a broad range of downstream tasks in natural language processing. However, how a language model predicts the next token and generates content is not generally understandable…

Computation and Language · Computer Science 2025-10-21 Shahin Atakishiyev , Housam K. B. Babiker , Jiayi Dai , Nawshad Farruque , Teruaki Hayashi , Nafisa Sadaf Hriti , Md Abed Rahman , Iain Smith , Mi-Young Kim , Osmar R. Zaïane , Randy Goebel

Explainability for Large Language Models: A Survey

Large language models (LLMs) have demonstrated impressive capabilities in natural language processing. However, their internal mechanisms are still unclear and this lack of transparency poses unwanted risks for downstream applications.…

Computation and Language · Computer Science 2023-11-30 Haiyan Zhao , Hanjie Chen , Fan Yang , Ninghao Liu , Huiqi Deng , Hengyi Cai , Shuaiqiang Wang , Dawei Yin , Mengnan Du

Explanations from Large Language Models Make Small Reasoners Better

Integrating free-text explanations to in-context learning of large language models (LLM) is shown to elicit strong reasoning capabilities along with reasonable explanations. In this paper, we consider the problem of leveraging the…

Computation and Language · Computer Science 2022-10-14 Shiyang Li , Jianshu Chen , Yelong Shen , Zhiyu Chen , Xinlu Zhang , Zekun Li , Hong Wang , Jing Qian , Baolin Peng , Yi Mao , Wenhu Chen , Xifeng Yan

Reasoning-Grounded Natural Language Explanations for Language Models

We propose a large language model explainability technique for obtaining faithful natural language explanations by grounding the explanations in a reasoning process. When converted to a sequence of tokens, the outputs of the reasoning…

Machine Learning · Computer Science 2026-03-17 Vojtech Cahlik , Rodrigo Alves , Pavel Kordik

Can Large Language Models Act as Symbolic Reasoners?

The performance of Large language models (LLMs) across a broad range of domains has been impressive but have been critiqued as not being able to reason about their process and conclusions derived. This is to explain the conclusions draw,…

Computation and Language · Computer Science 2024-10-30 Rob Sullivan , Nelly Elsayed

Large Language Models Cannot Self-Correct Reasoning Yet

Large Language Models (LLMs) have emerged as a groundbreaking technology with their unparalleled text generation capabilities across various applications. Nevertheless, concerns persist regarding the accuracy and appropriateness of their…

Computation and Language · Computer Science 2024-03-15 Jie Huang , Xinyun Chen , Swaroop Mishra , Huaixiu Steven Zheng , Adams Wei Yu , Xinying Song , Denny Zhou

Can Large Language Models Explain Themselves? A Study of LLM-Generated Self-Explanations

Large language models (LLMs) such as ChatGPT have demonstrated superior performance on a variety of natural language processing (NLP) tasks including sentiment analysis, mathematical reasoning and summarization. Furthermore, since these…

Computation and Language · Computer Science 2023-10-18 Shiyuan Huang , Siddarth Mamidanna , Shreedhar Jangam , Yilun Zhou , Leilani H. Gilpin

Generating Search Explanations using Large Language Models

Aspect-oriented explanations in search results are typically concise text snippets placed alongside retrieved documents to serve as explanations that assist users in efficiently locating relevant information. While Large Language Models…

Information Retrieval · Computer Science 2025-07-23 Arif Laksito , Mark Stevenson

Getting More out of Large Language Models for Proofs

Large language models have the potential to simplify formal theorem proving and make it more accessible. But how to get the most out of these models is still an open question. To answer this question, we take a step back and explore the…

Formal Languages and Automata Theory · Computer Science 2023-06-02 Shizhuo Dylan Zhang , Talia Ringer , Emily First

Towards Uncovering How Large Language Model Works: An Explainability Perspective

Large language models (LLMs) have led to breakthroughs in language tasks, yet the internal mechanisms that enable their remarkable generalization and reasoning abilities remain opaque. This lack of transparency presents challenges such as…

Computation and Language · Computer Science 2024-04-17 Haiyan Zhao , Fan Yang , Bo Shen , Himabindu Lakkaraju , Mengnan Du

Do Models Explain Themselves? Counterfactual Simulatability of Natural Language Explanations

Large language models (LLMs) are trained to imitate humans to explain human decisions. However, do LLMs explain themselves? Can they help humans build mental models of how LLMs process different inputs? To answer these questions, we propose…

Computation and Language · Computer Science 2023-07-18 Yanda Chen , Ruiqi Zhong , Narutatsu Ri , Chen Zhao , He He , Jacob Steinhardt , Zhou Yu , Kathleen McKeown

Can LLMs Explain Themselves Counterfactually?

Explanations are an important tool for gaining insights into the behavior of ML models, calibrating user trust and ensuring regulatory compliance. Past few years have seen a flurry of post-hoc methods for generating model explanations, many…

Computation and Language · Computer Science 2025-09-24 Zahra Dehghanighobadi , Asja Fischer , Muhammad Bilal Zafar

Do LLM Self-Explanations Help Users Predict Model Behavior? Evaluating Counterfactual Simulatability with Pragmatic Perturbations

Large Language Models (LLMs) can produce verbalized self-explanations, yet prior studies suggest that such rationales may not reliably reflect the model's true decision process. We ask whether these explanations nevertheless help users…

Computation and Language · Computer Science 2026-01-08 Pingjun Hong , Benjamin Roth

Fostering Appropriate Reliance on Large Language Models: The Role of Explanations, Sources, and Inconsistencies

Large language models (LLMs) can produce erroneous responses that sound fluent and convincing, raising the risk that users will rely on these responses as if they were correct. Mitigating such overreliance is a key challenge. Through a…

Human-Computer Interaction · Computer Science 2025-02-13 Sunnie S. Y. Kim , Jennifer Wortman Vaughan , Q. Vera Liao , Tania Lombrozo , Olga Russakovsky

Language Models can perform Single-Utterance Self-Correction of Perturbed Reasoning

Large Language Models (LLMs) have demonstrated impressive mathematical reasoning capabilities, yet their performance remains brittle to minor variations in problem description and prompting strategy. Furthermore, reasoning is vulnerable to…

Computation and Language · Computer Science 2025-06-23 Sam Silver , Jimin Sun , Ivan Zhang , Sara Hooker , Eddie Kim

Are Large Language Models Fit For Guided Reading?

This paper looks at the ability of large language models to participate in educational guided reading. We specifically, evaluate their ability to generate meaningful questions from the input text, generate diverse questions both in terms of…

Computation and Language · Computer Science 2023-05-22 Peter Ochieng

Can Large Language Models Reason and Plan?

While humans sometimes do show the capability of correcting their own erroneous guesses with self-critiquing, there seems to be no basis for that assumption in the case of LLMs.

Artificial Intelligence · Computer Science 2024-03-12 Subbarao Kambhampati

Logical Discrete Graphical Models Must Supplement Large Language Models for Information Synthesis

Given the emergent reasoning abilities of large language models, information retrieval is becoming more complex. Rather than just retrieve a document, modern information retrieval systems advertise that they can synthesize an answer based…

Information Retrieval · Computer Science 2024-03-15 Gregory Coppola