English
Related papers

Related papers: Pitfalls in Evaluating Interpretability Agents

200 papers

Agentic systems have transformed how Large Language Models (LLMs) can be leveraged to create autonomous systems with goal-directed behaviors, consisting of multi-step planning and the ability to interact with different environments. These…

Artificial Intelligence · Computer Science 2026-01-27 Judy Zhu , Dhari Gandhi , Himanshu Joshi , Ahmad Rezaie Mianroodi , Sedef Akinli Kocak , Dhanesh Ramachandran

This paper reviews the architecture and implementation methods of agents powered by large language models (LLMs). Motivated by the limitations of traditional LLMs in real-world tasks, the research aims to explore patterns to develop…

Artificial Intelligence · Computer Science 2025-10-13 Victor de Lamo Castrillo , Habtom Kahsay Gidey , Alexander Lenz , Alois Knoll

The era of Large Language Models (LLMs) presents a new opportunity for interpretability--agentic interpretability: a multi-turn conversation with an LLM wherein the LLM proactively assists human understanding by developing and leveraging a…

Artificial Intelligence · Computer Science 2025-06-17 Been Kim , John Hewitt , Neel Nanda , Noah Fiedel , Oyvind Tafjord

Interpretable machine learning has exploded as an area of interest over the last decade, sparked by the rise of increasingly large datasets and deep neural networks. Simultaneously, large language models (LLMs) have demonstrated remarkable…

Computation and Language · Computer Science 2024-02-06 Chandan Singh , Jeevana Priya Inala , Michel Galley , Rich Caruana , Jianfeng Gao

Large Language Models (LLMs) have demonstrated impressive performance across diverse domains, yet they still encounter challenges such as insufficient domain-specific knowledge, biases, and hallucinations. This underscores the need for…

Computation and Language · Computer Science 2025-04-07 Hongliu Cao , Ilias Driouich , Robin Singh , Eoin Thomas

Large Language Models (LLMs) are transforming scholarly tasks like search and summarization, but their reliability remains uncertain. Current evaluation metrics for testing LLM reliability are primarily automated approaches that prioritize…

Human-Computer Interaction · Computer Science 2026-02-25 Anna Martin-Boyle , William Humphreys , Martha Brown , Cara Leckey , Harmanpreet Kaur

Large language models (LLMs) have achieved remarkable capabilities across diverse tasks, yet their internal decision-making processes remain largely opaque. Mechanistic interpretability (i.e., the systematic study of how neural networks…

Computation and Language · Computer Science 2026-02-13 Usman Naseem

Automated machine learning (AutoML) systems aim to enable training machine learning (ML) models for non-ML experts. A shortcoming of these systems is that when they fail to produce a model with high accuracy, the user has no path to improve…

Machine Learning · Computer Science 2021-02-23 Behnaz Arzani , Kevin Hsieh , Haoxian Chen

We introduce an autonomous multiagent framework for mechanistic interpretability that automates both explaining and finding internal features in large language models. The system runs two coupled loops: (1) explanation refinement, where an…

Computation and Language · Computer Science 2026-05-05 Arnau Marin-Llobet , Javier Ferrando

Large Language Models (LLMs) have recently gained significant attention due to their remarkable capabilities in performing diverse tasks across various domains. However, a thorough evaluation of these models is crucial before deploying them…

Several researchers have argued that a machine learning system's interpretability should be defined in relation to a specific agent or task: we should not ask if the system is interpretable, but to whom is it interpretable. We describe a…

Artificial Intelligence · Computer Science 2018-06-21 Richard Tomsett , Dave Braines , Dan Harborne , Alun Preece , Supriyo Chakraborty

Large language models (LLMs) have recently been applied to forecasting tasks, with some works claiming these systems match or exceed human performance. In this paper, we argue that, as a community, we should be careful about such…

Machine Learning · Computer Science 2025-06-03 Daniel Paleka , Shashwat Goel , Jonas Geiping , Florian Tramèr

As machine learning and algorithmic decision making systems are increasingly being leveraged in high-stakes human-in-the-loop settings, there is a pressing need to understand the rationale of their predictions. Researchers have responded to…

Machine Learning · Computer Science 2020-12-07 Jonathan Dinu , Jeffrey Bigham , J. Zico Kolter

Interpretable machine learning tackles the important problem that humans cannot understand the behaviors of complex machine learning models and how these models arrive at a particular decision. Although many approaches have been proposed, a…

Machine Learning · Computer Science 2019-05-21 Mengnan Du , Ninghao Liu , Xia Hu

Large Language Models (LLMs) have shown remarkable capabilities in general natural language processing tasks but often fall short in complex reasoning tasks. Recent studies have explored human-like problem-solving strategies, such as…

Computation and Language · Computer Science 2023-12-19 Zhenran Xu , Senbao Shi , Baotian Hu , Jindi Yu , Dongfang Li , Min Zhang , Yuxiang Wu

An essential aspect of evaluating Large Language Models (LLMs) is identifying potential biases. This is especially relevant considering the substantial evidence that LLMs can replicate human social biases in their text outputs and further…

Human-Computer Interaction · Computer Science 2024-05-21 Paula Akemi Aoyagui , Sharon Ferguson , Anastasia Kuzminykh

Large Language Models (LLMs) are transforming artificial intelligence, enabling autonomous agents to perform diverse tasks across various domains. These agents, proficient in human-like text comprehension and generation, have the potential…

Artificial Intelligence · Computer Science 2024-04-10 Saikat Barua

Interpretable Machine Learning (IML) has become increasingly important in many real-world applications, such as autonomous cars and medical diagnosis, where explanations are significantly preferred to help people better understand how…

Machine Learning · Computer Science 2019-08-19 Fan Yang , Mengnan Du , Xia Hu

Objective: To demonstrate the capabilities of Large Language Models (LLMs) as autonomous agents to reproduce findings of published research studies using the same or similar dataset. Materials and Methods: We used the "Quick Access" dataset…

Computation and Language · Computer Science 2025-06-02 Nic Dobbins , Christelle Xiong , Kristine Lan , Meliha Yetisgen

Large language models can consult information that fixed static analyzers cannot, such as documentation, current security advisories, version-specific metadata, and informal API contracts. This makes LLMs a compelling option for program…

Software Engineering · Computer Science 2026-05-14 Jacqueline L. Mitchell , Chao Wang
‹ Prev 1 2 3 10 Next ›