English
Related papers

Related papers: ATTest: Agent-Driven Tensor Testing for Deep Learn…

200 papers

Deep Learning (DL) libraries like TensorFlow and Pytorch simplify machine learning (ML) model development but are prone to bugs due to their complex design. Bug-finding techniques exist, but without precise API specifications, they produce…

Software Engineering · Computer Science 2026-02-04 Facundo Molina , M M Abid Naziri , Feiran Qin , Alessandra Gorla , Marcelo d'Amorim

Deep learning (DL) applications are prevalent nowadays as they can help with multiple tasks. DL libraries are essential for building DL applications. Furthermore, DL operators are the important building blocks of the DL libraries, that…

Software Engineering · Computer Science 2023-06-06 Jingyi Shi , Yang Xiao , Yuekang Li , Yeting Li , Dongsong Yu , Chendong Yu , Hui Su , Yufeng Chen , Wei Huo

AI agents powered by large language models (LLMs) are being used to solve increasingly complex software engineering challenges, but struggle with hardware design tasks. Register Transfer Level (RTL) code presents a unique challenge for…

Machine learning (ML) libraries such as PyTorch and TensorFlow are essential for a wide range of modern applications. Ensuring the correctness of ML libraries through testing is crucial. However, ML APIs often impose strict input…

Software Engineering · Computer Science 2025-10-13 Lukas Krodinger , Altin Hajdari , Stephan Lukasczyk , Gordon Fraser

Large Language Models (LLMs) excel at code generation, yet their outputs often contain subtle bugs, for which effective test cases are a critical bottleneck. Existing test generation methods, whether based on prompting or supervised…

Software Engineering · Computer Science 2025-10-17 Qingyao Li , Xinyi Dai , Weiwen Liu , Xiangyang Li , Yasheng Wang , Ruiming Tang , Yong Yu , Weinan Zhang

Deep Learning (DL) libraries (e.g., PyTorch) are popular in AI development. These libraries are complex and contain bugs. Researchers have proposed various bug-finding techniques for such libraries. Yet, there is much room for improvement.…

Software Engineering · Computer Science 2026-01-23 M M Abid Naziri , Shinhae Kim , Feiran Qin , Marcelo d'Amorim , Saikat Dutta

Penetration testing is essential to securing modern web infrastructures, yet traditional manual methods struggle to keep pace with their scale and complexity. Large Language Models (LLMs) offer new opportunities for automating these tasks,…

Cryptography and Security · Computer Science 2026-05-26 William Guanting Li , Alsharif Abuadbba , Kristen Moore , Dan Dongseong Kim

Large Language Model (LLM) agents, which integrate planning, memory, reflection, and tool-use modules, have shown promise in solving complex, multi-step tasks. Yet their sophisticated architectures amplify vulnerability to cascading…

Software development agents powered by large language models (LLMs) have shown great promise in automating tasks like environment setup, issue solving, and program repair. Unfortunately, understanding and debugging such agents remain…

Software Engineering · Computer Science 2026-02-09 Robert Hutter , Michael Pradel

Software testing is a critical, yet resource-intensive phase of the software development lifecycle. Over the years, various automated tools have been developed to aid in this process. Search-based approaches typically achieve high coverage…

Software Engineering · Computer Science 2026-02-11 Pengyu Chang , Yixiong Fang , Silin Chen , Yuling Shi , Beijun Shen , Xiaodong Gu

Automated unit test generation using large language models (LLMs) holds great promise but often struggles with generating tests that are both correct and maintainable in real-world projects. This paper presents KTester, a novel framework…

Software Engineering · Computer Science 2026-02-09 Anji Li , Mingwei Liu , Zhenxi Chen , Zheng Pei , Zike Li , Dekun Dai , Yanlin Wang , Zibin Zheng

Education is one of the most promising real-world applications for Large Language Models (LLMs). However, current LLMs rely on static pre-training knowledge and lack adaptation to individual learners, while existing RAG systems fall short…

Computers and Society · Computer Science 2026-05-12 Bingxi Zhao , Jiahao Zhang , Xubin Ren , Zirui Guo , Tianzhe Chu , Yi Ma , Chao Huang

The evaluation of large language models (LLMs) has predominantly relied on static datasets, which offer limited scalability and fail to capture the evolving reasoning capabilities of recent models. To overcome these limitations, we propose…

Computation and Language · Computer Science 2026-03-02 Seungdong Yoa , Sanghyu Yoon , Suhee Yoon , Dongmin Kim , Ye Seul Sim , Junhyun Lee , Woohyung Lim

LLM agents are increasingly deployed to plan, retrieve, and write with tools, yet evaluation still leans on static benchmarks and small human studies. We present the Agent-Testing Agent (ATA), a meta-agent that combines static code…

Computation and Language · Computer Science 2025-08-26 Sameer Komoravolu , Khalil Mrini

Automatic Program translation has enormous application value and hence has been attracting significant interest from AI researchers. However, we observe that current program translation models still make elementary syntax errors,…

Software Engineering · Computer Science 2023-10-24 Mengnan Qi , Yufan Huang , Maoquan Wang , Yongqiang Yao , Zihan Liu , Bin Gu , Colin Clement , Neel Sundaresan

Test-time scaling (TTS) enhances the performance of large language models (LLMs) by allocating additional compute resources during inference. However, existing research primarily investigates TTS in single-stage tasks; while many real-world…

Artificial Intelligence · Computer Science 2025-10-23 Fali Wang , Hui Liu , Zhenwei Dai , Jingying Zeng , Zhiwei Zhang , Zongyu Wu , Chen Luo , Zhen Li , Xianfeng Tang , Qi He , Suhang Wang

Humans can develop new theorems to explore broader and more complex mathematical results. While current generative language models (LMs) have achieved significant improvement in automatically proving theorems, their ability to generate new…

Computation and Language · Computer Science 2024-05-14 Xiaohan Lin , Qingxing Cao , Yinya Huang , Zhicheng Yang , Zhengying Liu , Zhenguo Li , Xiaodan Liang

Generative AI agents, software systems powered by Large Language Models (LLMs), are emerging as a promising approach to automate cybersecurity tasks. Among the others, penetration testing is a challenging field due to the task complexity…

Cryptography and Security · Computer Science 2024-10-29 Luca Gioacchini , Marco Mellia , Idilio Drago , Alexander Delsanto , Giuseppe Siracusano , Roberto Bifulco

Large Language Models (LLMs) have been increasingly integrated into computer-use agents, which can autonomously operate tools on a user's computer to accomplish complex tasks. However, due to the inherently unstable and unpredictable nature…

Cryptography and Security · Computer Science 2025-09-10 Haitao Hu , Peng Chen , Yanpeng Zhao , Yuqi Chen

Implementing automated unit tests is an important but time-consuming activity in software development. To assist developers in this task, many techniques for automating unit test generation have been developed. However, despite this effort,…

Software Engineering · Computer Science 2025-01-16 Rangeet Pan , Myeongsoo Kim , Rahul Krishna , Raju Pavuluri , Saurabh Sinha
‹ Prev 1 2 3 10 Next ›