English
Related papers

Related papers: DeepFix: Debugging and Fixing Machine Learning Wor…

200 papers

Deep Neural Networks (DNNs) are used in a wide variety of applications. However, as in any software application, DNN-based apps are afflicted with bugs. Previous work observed that DNN bug fix patterns are different from traditional bug fix…

Software Engineering · Computer Science 2021-12-09 Mohammad Wardat , Breno Dantas Cruz , Wei Le , Hridesh Rajan

Large Language Model (LLM) Agents leverage the advanced reasoning capabilities of LLMs in real-world applications. To interface with an environment, these agents often rely on tools, such as web search or database APIs. As the agent…

Artificial Intelligence · Computer Science 2025-03-12 Ivan Milev , Mislav Balunović , Maximilian Baader , Martin Vechev

Patching severe security flaws in complex software remains a major challenge. While automated tools like fuzzers efficiently discover bugs, fixing deep-rooted low-level faults (e.g., use-after-free and memory corruption) still requires…

Software Engineering · Computer Science 2026-04-07 Maolin Sun , Yibiao Yang , Xuanlin Liu , Yuming Zhou , Baowen Xu

While significant progress has been made in automating various aspects of software development through coding agents, there is still significant room for improvement in their bug fixing capabilities. Debugging and investigation of runtime…

Software Engineering · Computer Science 2026-04-22 Spandan Garg , Yufan Huang

The automated program repair field has attracted substantial interest over the years, but despite significant research efforts, creating a system that works well for complex semantic bugs such as security vulnerabilities has proven…

Cryptography and Security · Computer Science 2024-02-26 Berkay Berabi , Alexey Gronskiy , Veselin Raychev , Gishor Sivanrupan , Victor Chibotaru , Martin Vechev

As multi-agent systems powered by Large Language Models (LLMs) are increasingly adopted in real-world workflows, users with diverse technical backgrounds are now building and refining their own agentic processes. However, these systems can…

Human-Computer Interaction · Computer Science 2026-03-05 Xinru Wang , Ming Yin , Eunyee Koh , Mustafa Doga Dogan

The increasing inclusion of Machine Learning (ML) models in safety critical systems like autonomous cars have led to the development of multiple model-based ML testing techniques. One common denominator of these testing techniques is their…

Machine Learning · Computer Science 2019-09-09 Houssem Ben Braiek , Foutse Khomh

Fault Localization (FL) is an essential step during the debugging process. With the strong capabilities of code comprehension, the recent Large Language Models (LLMs) have demonstrated promising performance in diagnosing bugs in the code.…

Software Engineering · Computer Science 2025-02-25 Yihao Qin , Shangwen Wang , Yiling Lou , Jinhao Dong , Kaixin Wang , Xiaoling Li , Xiaoguang Mao

We introduce a comprehensive validation framework for LLM-based agentic systems that provides systematic diagnosis and improvement of reliability failures. The framework includes fifteen failure-detection tools and two root-cause analysis…

Artificial Intelligence · Computer Science 2026-04-01 Hadar Mulian , Sergey Zeltyn , Ido Levy , Liane Galanti , Avi Yaeli , Segev Shlomov

Large Language Models (LLMs) have transformed software development and AI applications. While LLMs are designed for text processing, LLM agents extend this capability by enabling autonomous actions, tool use, and multi-step task completion.…

Software Engineering · Computer Science 2026-04-21 Niful Islam , Muhammad Anas Raza , Mohammad Wardat

Large language models (LLMs) have become central to modern AI workflows, powering applications from open-ended text generation to complex agent-based reasoning. However, debugging these models remains a persistent challenge due to their…

Automated Program Repair (APR) aims to automatically generate correct patches for buggy programs. Recent approaches leveraging large language models (LLMs) have shown promise but face limitations. Most rely solely on static analysis,…

Software Engineering · Computer Science 2026-04-21 Zhili Huang , Ling Xu , Chao Liu , Weifeng Sun , Xu Zhang , Yan Lei , Meng Yan , Hongyu Zhang

Large Language Model (LLM) agents, which integrate planning, memory, reflection, and tool-use modules, have shown promise in solving complex, multi-step tasks. Yet their sophisticated architectures amplify vulnerability to cascading…

Deep Learning models have become an integrated component of modern software systems. In response to the challenge of model design, researchers proposed Automated Machine Learning (AutoML) systems, which automatically search for model…

Software Engineering · Computer Science 2024-01-02 Xiaoyu Zhang , Juan Zhai , Shiqing Ma , Chao Shen

Large language models (LLMs) and LLM-based Agents have been applied to fix bugs automatically, demonstrating the capability in addressing software defects by engaging in development environment interaction, iterative validation and code…

Software Engineering · Computer Science 2025-10-21 Xiangxin Meng , Zexiong Ma , Pengfei Gao , Chao Peng

Software testing has progressed toward intelligent automation, yet current AI-based test generators still suffer from static, single-shot outputs that frequently produce invalid, redundant, or non-executable tests due to the lack of…

Software Engineering · Computer Science 2026-01-07 Saba Naqvi , Mohammad Baqar , Nawaz Ali Mohammad

Deep learning (DL) frameworks serve as the backbone for a wide range of artificial intelligence applications. However, bugs within DL frameworks can cascade into critical issues in higher-level applications, jeopardizing reliability and…

Software Engineering · Computer Science 2025-10-20 Shiwen Ou , Yuwei Li , Lu Yu , Chengkun Wei , Tingke Wen , Qiangpu Chen , Yu Chen , Haizhi Tang , Zulie Pan

Modern agentic frameworks (e.g., CrewAI and AutoGen) have evolved into complex, autonomous multi-agent systems, introducing unique reliability challenges beyond earlier pipeline-based LLM libraries. However, existing empirical studies focus…

Software Engineering · Computer Science 2026-04-13 Xiaowen Zhang , Hannuo Zhang , Shin Hwei Tan

Software debugging is a time-consuming endeavor involving a series of steps, such as fault localization and patch generation, each requiring thorough analysis and a deep understanding of the underlying logic. While large language models…

Software Engineering · Computer Science 2025-11-19 Cheryl Lee , Chunqiu Steven Xia , Longji Yang , Jen-tse Huang , Zhouruixin Zhu , Lingming Zhang , Michael R. Lyu

Recent advances in AI-assisted programming have empowered agents to execute complex workflows via command-line interfaces, however, existing benchmarks are limited by short task horizons, data contamination from GitHub scraping, and a lack…

‹ Prev 1 2 3 10 Next ›