Related papers: ATTest: Agent-Driven Tensor Testing for Deep Learn…

Improving Deep Learning Library Testing with Machine Learning

Deep Learning (DL) libraries like TensorFlow and Pytorch simplify machine learning (ML) model development but are prone to bugs due to their complex design. Bug-finding techniques exist, but without precise API specifications, they produce…

Software Engineering · Computer Science 2026-02-04 Facundo Molina , M M Abid Naziri , Feiran Qin , Alessandra Gorla , Marcelo d'Amorim

ACETest: Automated Constraint Extraction for Testing Deep Learning Operators

Deep learning (DL) applications are prevalent nowadays as they can help with multiple tasks. DL libraries are essential for building DL applications. Furthermore, DL operators are the important building blocks of the DL libraries, that…

Software Engineering · Computer Science 2023-06-06 Jingyi Shi , Yang Xiao , Yuekang Li , Yeting Li , Dongsong Yu , Chendong Yu , Hui Su , Yufeng Chen , Wei Huo

DUET: Agentic Design Understanding via Experimentation and Testing

AI agents powered by large language models (LLMs) are being used to solve increasingly complex software engineering challenges, but struggle with hardware design tasks. Register Transfer Level (RTL) code presents a unique challenge for…

Software Engineering · Computer Science 2026-01-23 Gus Henry Smith , Sandesh Adhikary , Vineet Thumuluri , Karthik Suresh , Vivek Pandit , Kartik Hegde , Hamid Shojaei , Chandra Bhagavatula

Constraint-Guided Unit Test Generation for Machine Learning Libraries

Machine learning (ML) libraries such as PyTorch and TensorFlow are essential for a wide range of modern applications. Ensuring the correctness of ML libraries through testing is crucial. However, ML APIs often impose strict input…

Software Engineering · Computer Science 2025-10-13 Lukas Krodinger , Altin Hajdari , Stephan Lukasczyk , Gordon Fraser

ATGen: Adversarial Reinforcement Learning for Test Case Generation

Large Language Models (LLMs) excel at code generation, yet their outputs often contain subtle bugs, for which effective test cases are a critical bottleneck. Existing test generation methods, whether based on prompting or supervised…

Software Engineering · Computer Science 2025-10-17 Qingyao Li , Xinyi Dai , Weiwen Liu , Xiangyang Li , Yasheng Wang , Ruiming Tang , Yong Yu , Weinan Zhang

Testing Deep Learning Libraries via Neurosymbolic Constraint Learning

Deep Learning (DL) libraries (e.g., PyTorch) are popular in AI development. These libraries are complex and contain bugs. Researchers have proposed various bug-finding techniques for such libraries. Yet, there is much room for improvement.…

Software Engineering · Computer Science 2026-01-23 M M Abid Naziri , Shinhae Kim , Feiran Qin , Marcelo d'Amorim , Saikat Dutta

APT-Agent: Automated Penetration Testing using Large Language Models

Penetration testing is essential to securing modern web infrastructures, yet traditional manual methods struggle to keep pace with their scale and complexity. Large Language Models (LLMs) offer new opportunities for automating these tasks,…

Cryptography and Security · Computer Science 2026-05-26 William Guanting Li , Alsharif Abuadbba , Kristen Moore , Dan Dongseong Kim

Where LLM Agents Fail and How They can Learn From Failures

Large Language Model (LLM) agents, which integrate planning, memory, reflection, and tool-use modules, have shown promise in solving complex, multi-step tasks. Yet their sophisticated architectures amplify vulnerability to cascading…

Artificial Intelligence · Computer Science 2025-10-01 Kunlun Zhu , Zijia Liu , Bingxuan Li , Muxin Tian , Yingxuan Yang , Jiaxun Zhang , Pengrui Han , Qipeng Xie , Fuyang Cui , Weijia Zhang , Xiaoteng Ma , Xiaodong Yu , Gowtham Ramesh , Jialian Wu , Zicheng Liu , Pan Lu , James Zou , Jiaxuan You

AgentStepper: Interactive Debugging of Software Development Agents

Software development agents powered by large language models (LLMs) have shown great promise in automating tasks like environment setup, issue solving, and program repair. Unfortunately, understanding and debugging such agents remain…

Software Engineering · Computer Science 2026-02-09 Robert Hutter , Michael Pradel

Test vs Mutant: Adversarial LLM Agents for Robust Unit Test Generation

Software testing is a critical, yet resource-intensive phase of the software development lifecycle. Over the years, various automated tools have been developed to aid in this process. Search-based approaches typically achieve high coverage…

Software Engineering · Computer Science 2026-02-11 Pengyu Chang , Yixiong Fang , Silin Chen , Yuling Shi , Beijun Shen , Xiaodong Gu

KTester: Leveraging Domain and Testing Knowledge for More Effective LLM-based Test Generation

Automated unit test generation using large language models (LLMs) holds great promise but often struggles with generating tests that are both correct and maintainable in real-world projects. This paper presents KTester, a novel framework…

Software Engineering · Computer Science 2026-02-09 Anji Li , Mingwei Liu , Zhenxi Chen , Zheng Pei , Zike Li , Dekun Dai , Yanlin Wang , Zibin Zheng

DeepTutor: Towards Agentic Personalized Tutoring

Education is one of the most promising real-world applications for Large Language Models (LLMs). However, current LLMs rely on static pre-training knowledge and lack adaptation to individual learners, while existing RAG systems fall short…

Computers and Society · Computer Science 2026-05-12 Bingxi Zhao , Jiahao Zhang , Xubin Ren , Zirui Guo , Tianzhe Chu , Yi Ma , Chao Huang

From Static Benchmarks to Dynamic Protocol: Agent-Centric Text Anomaly Detection for Evaluating LLM Reasoning

The evaluation of large language models (LLMs) has predominantly relied on static datasets, which offer limited scalability and fail to capture the evolving reasoning capabilities of recent models. To overcome these limitations, we propose…

Computation and Language · Computer Science 2026-03-02 Seungdong Yoa , Sanghyu Yoon , Suhee Yoon , Dongmin Kim , Ye Seul Sim , Junhyun Lee , Woohyung Lim

Agent-Testing Agent: A Meta-Agent for Automated Testing and Evaluation of Conversational AI Agents

LLM agents are increasingly deployed to plan, retrieve, and write with tools, yet evaluation still leans on static benchmarks and small human studies. We present the Agent-Testing Agent (ATA), a meta-agent that combines static code…

Computation and Language · Computer Science 2025-08-26 Sameer Komoravolu , Khalil Mrini

SUT: Active Defects Probing for Transcompiler Models

Automatic Program translation has enormous application value and hence has been attracting significant interest from AI researchers. However, we observe that current program translation models still make elementary syntax errors,…

Software Engineering · Computer Science 2023-10-24 Mengnan Qi , Yufan Huang , Maoquan Wang , Yongqiang Yao , Zihan Liu , Bin Gu , Colin Clement , Neel Sundaresan

AgentTTS: Large Language Model Agent for Test-time Compute-optimal Scaling Strategy in Complex Tasks

Test-time scaling (TTS) enhances the performance of large language models (LLMs) by allocating additional compute resources during inference. However, existing research primarily investigates TTS in single-stage tasks; while many real-world…

Artificial Intelligence · Computer Science 2025-10-23 Fali Wang , Hui Liu , Zhenwei Dai , Jingying Zeng , Zhiwei Zhang , Zongyu Wu , Chen Luo , Zhen Li , Xianfeng Tang , Qi He , Suhang Wang

ATG: Benchmarking Automated Theorem Generation for Generative Language Models

Humans can develop new theorems to explore broader and more complex mathematical results. While current generative language models (LMs) have achieved significant improvement in automatically proving theorems, their ability to generate new…

Computation and Language · Computer Science 2024-05-14 Xiaohan Lin , Qingxing Cao , Yinya Huang , Zhicheng Yang , Zhengying Liu , Zhenguo Li , Xiaodan Liang

AutoPenBench: Benchmarking Generative Agents for Penetration Testing

Generative AI agents, software systems powered by Large Language Models (LLMs), are emerging as a promising approach to automate cybersecurity tasks. Among the others, penetration testing is a challenging field due to the task complexity…

Cryptography and Security · Computer Science 2024-10-29 Luca Gioacchini , Marco Mellia , Idilio Drago , Alexander Delsanto , Giuseppe Siracusano , Roberto Bifulco

AgentSentinel: An End-to-End and Real-Time Security Defense Framework for Computer-Use Agents

Large Language Models (LLMs) have been increasingly integrated into computer-use agents, which can autonomously operate tools on a user's computer to accomplish complex tasks. However, due to the inherently unstable and unpredictable nature…

Cryptography and Security · Computer Science 2025-09-10 Haitao Hu , Peng Chen , Yanpeng Zhao , Yuqi Chen

ASTER: Natural and Multi-language Unit Test Generation with LLMs

Implementing automated unit tests is an important but time-consuming activity in software development. To assist developers in this task, many techniques for automating unit test generation have been developed. However, despite this effort,…

Software Engineering · Computer Science 2025-01-16 Rangeet Pan , Myeongsoo Kim , Rahul Krishna , Raju Pavuluri , Saurabh Sinha