Related papers: Pitfalls in Evaluating Interpretability Agents

Interpreting Agentic Systems: Beyond Model Explanations to System-Level Accountability

Agentic systems have transformed how Large Language Models (LLMs) can be leveraged to create autonomous systems with goal-directed behaviors, consisting of multi-step planning and the ability to interact with different environments. These…

Artificial Intelligence · Computer Science 2026-01-27 Judy Zhu , Dhari Gandhi , Himanshu Joshi , Ahmad Rezaie Mianroodi , Sedef Akinli Kocak , Dhanesh Ramachandran

Fundamentals of Building Autonomous LLM Agents

This paper reviews the architecture and implementation methods of agents powered by large language models (LLMs). Motivated by the limitations of traditional LLMs in real-world tasks, the research aims to explore patterns to develop…

Artificial Intelligence · Computer Science 2025-10-13 Victor de Lamo Castrillo , Habtom Kahsay Gidey , Alexander Lenz , Alois Knoll

Because we have LLMs, we Can and Should Pursue Agentic Interpretability

The era of Large Language Models (LLMs) presents a new opportunity for interpretability--agentic interpretability: a multi-turn conversation with an LLM wherein the LLM proactively assists human understanding by developing and leveraging a…

Artificial Intelligence · Computer Science 2025-06-17 Been Kim , John Hewitt , Neel Nanda , Noah Fiedel , Oyvind Tafjord

Rethinking Interpretability in the Era of Large Language Models

Interpretable machine learning has exploded as an area of interest over the last decade, sparked by the rise of increasingly large datasets and deep neural networks. Simultaneously, large language models (LLMs) have demonstrated remarkable…

Computation and Language · Computer Science 2024-02-06 Chandan Singh , Jeevana Priya Inala , Michel Galley , Rich Caruana , Jianfeng Gao

Multi-Agent LLM Judge: automatic personalized LLM judge design for evaluating natural language generation applications

Large Language Models (LLMs) have demonstrated impressive performance across diverse domains, yet they still encounter challenges such as insufficient domain-specific knowledge, biases, and hallucinations. This underscores the need for…

Computation and Language · Computer Science 2025-04-07 Hongliu Cao , Ilias Driouich , Robin Singh , Eoin Thomas

An Expert Schema for Evaluating Large Language Model Errors in Scholarly Question-Answering Systems

Large Language Models (LLMs) are transforming scholarly tasks like search and summarization, but their reliability remains uncertain. Current evaluation metrics for testing LLM reliability are primarily automated approaches that prioritize…

Human-Computer Interaction · Computer Science 2026-02-25 Anna Martin-Boyle , William Humphreys , Martha Brown , Cara Leckey , Harmanpreet Kaur

Mechanistic Interpretability for Large Language Model Alignment: Progress, Challenges, and Future Directions

Large language models (LLMs) have achieved remarkable capabilities across diverse tasks, yet their internal decision-making processes remain largely opaque. Mechanistic interpretability (i.e., the systematic study of how neural networks…

Computation and Language · Computer Science 2026-02-13 Usman Naseem

Interpret-able feedback for AutoML systems

Automated machine learning (AutoML) systems aim to enable training machine learning (ML) models for non-ML experts. A shortcoming of these systems is that when they fail to produce a model with high accuracy, the user has no path to improve…

Machine Learning · Computer Science 2021-02-23 Behnaz Arzani , Kevin Hsieh , Haoxian Chen

Automated Interpretability and Feature Discovery in Language Models with Agents

We introduce an autonomous multiagent framework for mechanistic interpretability that automates both explaining and finding internal features in large language models. The system runs two coupled loops: (1) explanation refinement, where an…

Computation and Language · Computer Science 2026-05-05 Arnau Marin-Llobet , Javier Ferrando

A Systematic Survey and Critical Review on Evaluating Large Language Models: Challenges, Limitations, and Recommendations

Large Language Models (LLMs) have recently gained significant attention due to their remarkable capabilities in performing diverse tasks across various domains. However, a thorough evaluation of these models is crucial before deploying them…

Computation and Language · Computer Science 2024-10-04 Md Tahmid Rahman Laskar , Sawsan Alqahtani , M Saiful Bari , Mizanur Rahman , Mohammad Abdullah Matin Khan , Haidar Khan , Israt Jahan , Amran Bhuiyan , Chee Wei Tan , Md Rizwan Parvez , Enamul Hoque , Shafiq Joty , Jimmy Huang

Interpretable to Whom? A Role-based Model for Analyzing Interpretable Machine Learning Systems

Several researchers have argued that a machine learning system's interpretability should be defined in relation to a specific agent or task: we should not ask if the system is interpretable, but to whom is it interpretable. We describe a…

Artificial Intelligence · Computer Science 2018-06-21 Richard Tomsett , Dave Braines , Dan Harborne , Alun Preece , Supriyo Chakraborty

Pitfalls in Evaluating Language Model Forecasters

Large language models (LLMs) have recently been applied to forecasting tasks, with some works claiming these systems match or exceed human performance. In this paper, we argue that, as a community, we should be careful about such…

Machine Learning · Computer Science 2025-06-03 Daniel Paleka , Shashwat Goel , Jonas Geiping , Florian Tramèr

Challenging common interpretability assumptions in feature attribution explanations

As machine learning and algorithmic decision making systems are increasingly being leveraged in high-stakes human-in-the-loop settings, there is a pressing need to understand the rationale of their predictions. Researchers have responded to…

Machine Learning · Computer Science 2020-12-07 Jonathan Dinu , Jeffrey Bigham , J. Zico Kolter

Techniques for Interpretable Machine Learning

Interpretable machine learning tackles the important problem that humans cannot understand the behaviors of complex machine learning models and how these models arrive at a particular decision. Although many approaches have been proposed, a…

Machine Learning · Computer Science 2019-05-21 Mengnan Du , Ninghao Liu , Xia Hu

Towards Reasoning in Large Language Models via Multi-Agent Peer Review Collaboration

Large Language Models (LLMs) have shown remarkable capabilities in general natural language processing tasks but often fall short in complex reasoning tasks. Recent studies have explored human-like problem-solving strategies, such as…

Computation and Language · Computer Science 2023-12-19 Zhenran Xu , Senbao Shi , Baotian Hu , Jindi Yu , Dongfang Li , Min Zhang , Yuxiang Wu

Exploring Subjectivity for more Human-Centric Assessment of Social Biases in Large Language Models

An essential aspect of evaluating Large Language Models (LLMs) is identifying potential biases. This is especially relevant considering the substantial evidence that LLMs can replicate human social biases in their text outputs and further…

Human-Computer Interaction · Computer Science 2024-05-21 Paula Akemi Aoyagui , Sharon Ferguson , Anastasia Kuzminykh

Exploring Autonomous Agents through the Lens of Large Language Models: A Review

Large Language Models (LLMs) are transforming artificial intelligence, enabling autonomous agents to perform diverse tasks across various domains. These agents, proficient in human-like text comprehension and generation, have the potential…

Artificial Intelligence · Computer Science 2024-04-10 Saikat Barua

Evaluating Explanation Without Ground Truth in Interpretable Machine Learning

Interpretable Machine Learning (IML) has become increasingly important in many real-world applications, such as autonomous cars and medical diagnosis, where explanations are significantly preferred to help people better understand how…

Machine Learning · Computer Science 2019-08-19 Fan Yang , Mengnan Du , Xia Hu

Large Language Model-Based Agents for Automated Research Reproducibility: An Exploratory Study in Alzheimer's Disease

Objective: To demonstrate the capabilities of Large Language Models (LLMs) as autonomous agents to reproduce findings of published research studies using the same or similar dataset. Materials and Methods: We used the "Quick Access" dataset…

Computation and Language · Computer Science 2025-06-02 Nic Dobbins , Christelle Xiong , Kristine Lan , Meliha Yetisgen

Agentic Interpretation: Lattice-Structured Evidence for LLM-Based Program Analysis

Large language models can consult information that fixed static analyzers cannot, such as documentation, current security advisories, version-specific metadata, and informal API contracts. This makes LLMs a compelling option for program…

Software Engineering · Computer Science 2026-05-14 Jacqueline L. Mitchell , Chao Wang