Related papers: DeepFix: Debugging and Fixing Machine Learning Wor…

DeepDiagnosis: Automatically Diagnosing Faults and Recommending Actionable Fixes in Deep Learning Programs

Deep Neural Networks (DNNs) are used in a wide variety of applications. However, as in any software application, DNN-based apps are afflicted with bugs. Previous work observed that DNN bug fix patterns are different from traditional bug fix…

Software Engineering · Computer Science 2021-12-09 Mohammad Wardat , Breno Dantas Cruz , Wei Le , Hridesh Rajan

ToolFuzz -- Automated Agent Tool Testing

Large Language Model (LLM) Agents leverage the advanced reasoning capabilities of LLMs in real-world applications. To interface with an environment, these agents often rely on tools, such as web search or database APIs. As the agent…

Artificial Intelligence · Computer Science 2025-03-12 Ivan Milev , Mislav Balunović , Maximilian Baader , Martin Vechev

DebugHarness: Emulating Human Dynamic Debugging for Autonomous Program Repair

Patching severe security flaws in complex software remains a major challenge. While automated tools like fuzzers efficiently discover bugs, fixing deep-rooted low-level faults (e.g., use-after-free and memory corruption) still requires…

Software Engineering · Computer Science 2026-04-07 Maolin Sun , Yibiao Yang , Xuanlin Liu , Yuming Zhou , Baowen Xu

Debug2Fix: Can Interactive Debugging Help Coding Agents Fix More Bugs?

While significant progress has been made in automating various aspects of software development through coding agents, there is still significant room for improvement in their bug fixing capabilities. Debugging and investigation of runtime…

Software Engineering · Computer Science 2026-04-22 Spandan Garg , Yufan Huang

DeepCode AI Fix: Fixing Security Vulnerabilities with Large Language Models

The automated program repair field has attracted substantial interest over the years, but despite significant research efforts, creating a system that works well for complex semantic bugs such as security vulnerabilities has proven…

Cryptography and Security · Computer Science 2024-02-26 Berkay Berabi , Alexey Gronskiy , Veselin Raychev , Gishor Sivanrupan , Victor Chibotaru , Martin Vechev

XAgen: An Explainability Tool for Identifying and Correcting Failures in Multi-Agent Workflows

As multi-agent systems powered by Large Language Models (LLMs) are increasingly adopted in real-world workflows, users with diverse technical backgrounds are now building and refining their own agentic processes. However, these systems can…

Human-Computer Interaction · Computer Science 2026-03-05 Xinru Wang , Ming Yin , Eunyee Koh , Mustafa Doga Dogan

TFCheck : A TensorFlow Library for Detecting Training Issues in Neural Network Programs

The increasing inclusion of Machine Learning (ML) models in safety critical systems like autonomous cars have led to the development of multiple model-based ML testing techniques. One common denominator of these testing techniques is their…

Machine Learning · Computer Science 2019-09-09 Houssem Ben Braiek , Foutse Khomh

AgentFL: Scaling LLM-based Fault Localization to Project-Level Context

Fault Localization (FL) is an essential step during the debugging process. With the strong capabilities of code comprehension, the recent Large Language Models (LLMs) have demonstrated promising performance in diagnosing bugs in the code.…

Software Engineering · Computer Science 2025-02-25 Yihao Qin , Shangwen Wang , Yiling Lou , Jinhao Dong , Kaixin Wang , Xiaoling Li , Xiaoguang Mao

AgentFixer: From Failure Detection to Fix Recommendations in LLM Agentic Systems

We introduce a comprehensive validation framework for LLM-based agentic systems that provides systematic diagnosis and improvement of reliability failures. The framework includes fifteen failure-detection tools and two root-cause analysis…

Artificial Intelligence · Computer Science 2026-04-01 Hadar Mulian , Sergey Zeltyn , Ido Levy , Liane Galanti , Avi Yaeli , Segev Shlomov

SelfHeal: Empirical Fix Pattern Analysis and Bug Repair in LLM Agents

Large Language Models (LLMs) have transformed software development and AI applications. While LLMs are designed for text processing, LLM agents extend this capability by enabling autonomous actions, tool use, and multi-step task completion.…

Software Engineering · Computer Science 2026-04-21 Niful Islam , Muhammad Anas Raza , Mohammad Wardat

A Systematic Approach for Large Language Models Debugging

Large language models (LLMs) have become central to modern AI workflows, powering applications from open-ended text generation to complex agent-based reasoning. However, debugging these models remains a persistent challenge due to their…

Artificial Intelligence · Computer Science 2026-04-28 Basel Shbita , Anna Lisa Gentile , Bing Zhang , Sungeun An , Shailja Thakur , Shubhi Asthana , Yi Zhou , Saptha Surendran , Farhan Ahmed , Rohan Kulkarni , Yuya Jeremy Ong , Chad DeLuca , Hima Patel

DynaFix: Iterative Automated Program Repair Driven by Execution-Level Dynamic Information

Automated Program Repair (APR) aims to automatically generate correct patches for buggy programs. Recent approaches leveraging large language models (LLMs) have shown promise but face limitations. Most rely solely on static analysis,…

Software Engineering · Computer Science 2026-04-21 Zhili Huang , Ling Xu , Chao Liu , Weifeng Sun , Xu Zhang , Yan Lei , Meng Yan , Hongyu Zhang

Where LLM Agents Fail and How They can Learn From Failures

Large Language Model (LLM) agents, which integrate planning, memory, reflection, and tool-use modules, have shown promise in solving complex, multi-step tasks. Yet their sophisticated architectures amplify vulnerability to cascading…

Artificial Intelligence · Computer Science 2025-10-01 Kunlun Zhu , Zijia Liu , Bingxuan Li , Muxin Tian , Yingxuan Yang , Jiaxun Zhang , Pengrui Han , Qipeng Xie , Fuyang Cui , Weijia Zhang , Xiaoteng Ma , Xiaodong Yu , Gowtham Ramesh , Jialian Wu , Zicheng Liu , Pan Lu , James Zou , Jiaxuan You

DREAM: Debugging and Repairing AutoML Pipelines

Deep Learning models have become an integrated component of modern software systems. In response to the challenge of model design, researchers proposed Automated Machine Learning (AutoML) systems, which automatically search for model…

Software Engineering · Computer Science 2024-01-02 Xiaoyu Zhang , Juan Zhai , Shiqing Ma , Chao Shen

An Empirical Study on LLM-based Agents for Automated Bug Fixing

Large language models (LLMs) and LLM-based Agents have been applied to fix bugs automatically, demonstrating the capability in addressing software defects by engaging in development environment interaction, iterative validation and code…

Software Engineering · Computer Science 2025-10-21 Xiangxin Meng , Zexiong Ma , Pengfei Gao , Chao Peng

The Rise of Agentic Testing: Multi-Agent Systems for Robust Software Quality Assurance

Software testing has progressed toward intelligent automation, yet current AI-based test generators still suffer from static, single-shot outputs that frequently produce invalid, redundant, or non-executable tests due to the lack of…

Software Engineering · Computer Science 2026-01-07 Saba Naqvi , Mohammad Baqar , Nawaz Ali Mohammad

MirrorFuzz: Leveraging LLM and Shared Bugs for Deep Learning Framework APIs Fuzzing

Deep learning (DL) frameworks serve as the backbone for a wide range of artificial intelligence applications. However, bugs within DL frameworks can cascade into critical issues in higher-level applications, jeopardizing reliability and…

Software Engineering · Computer Science 2025-10-20 Shiwen Ou , Yuwei Li , Lu Yu , Chengkun Wei , Tingke Wen , Qiangpu Chen , Yu Chen , Haizhi Tang , Zulie Pan

Dissecting Bug Triggers and Failure Modes in Modern Agentic Frameworks: An Empirical Study

Modern agentic frameworks (e.g., CrewAI and AutoGen) have evolved into complex, autonomous multi-agent systems, introducing unique reliability challenges beyond earlier pipeline-based LLM libraries. However, existing empirical studies focus…

Software Engineering · Computer Science 2026-04-13 Xiaowen Zhang , Hannuo Zhang , Shin Hwei Tan

UniDebugger: Hierarchical Multi-Agent Framework for Unified Software Debugging

Software debugging is a time-consuming endeavor involving a series of steps, such as fault localization and patch generation, each requiring thorough analysis and a deep understanding of the underlying logic. While large language models…

Software Engineering · Computer Science 2025-11-19 Cheryl Lee , Chunqiu Steven Xia , Longji Yang , Jen-tse Huang , Zhouruixin Zhu , Lingming Zhang , Michael R. Lyu

LongCLI-Bench: A Preliminary Benchmark and Study for Long-horizon Agentic Programming in Command-Line Interfaces

Recent advances in AI-assisted programming have empowered agents to execute complex workflows via command-line interfaces, however, existing benchmarks are limited by short task horizons, data contamination from GitHub scraping, and a lack…

Software Engineering · Computer Science 2026-02-27 Yukang Feng , Jianwen Sun , Zelai Yang , Jiaxin Ai , Chuanhao Li , Zizhen Li , Fanrui Zhang , Kang He , Rui Ma , Jifan Lin , Jie Sun , Yang Xiao , Sizhuo Zhou , Wenxiao Wu , Yiming Liu , Pengfei Liu , Yu Qiao , Shenglin Zhang , Kaipeng Zhang