English
Related papers

Related papers: FixEval: Execution-based Evaluation of Program Fix…

200 papers

The rapid development of large language model (LLM) evaluation methodologies and datasets has led to a profound challenge: integrating state-of-the-art evaluation techniques cost-effectively while ensuring reliability, reproducibility, and…

Computation and Language · Computer Science 2024-04-10 Zhuohao Yu , Chang Gao , Wenjin Yao , Yidong Wang , Zhengran Zeng , Wei Ye , Jindong Wang , Yue Zhang , Shikun Zhang

With the growing reliance on automated code completion tools in software development, the need for comprehensive evaluation benchmarks has become critical. Existing benchmarks focus more on code completion in function and class level by…

Software Engineering · Computer Science 2025-11-03 Qinyun Wu , Chao Peng , Pengfei Gao , Ruida Hu , Haoyu Gan , Bo Jiang , Jinhe Tang , Zhiwen Deng , Zhanming Guan , Cuiyun Gao , Xia Liu , Ping Yang

Evaluation of text generation to date has primarily focused on content created sequentially, rather than improvements on a piece of text. Writing, however, is naturally an iterative and incremental process that requires expertise in…

Computation and Language · Computer Science 2022-09-28 Jane Dwivedi-Yu , Timo Schick , Zhengbao Jiang , Maria Lomeli , Patrick Lewis , Gautier Izacard , Edouard Grave , Sebastian Riedel , Fabio Petroni

Current Large Language Models (LLMs) have advanced automated unit test generation but face a critical limitation: they often neglect to construct the necessary test fixtures, which are the environmental setups required for a test to run. To…

Software Engineering · Computer Science 2026-03-26 Chengyi Wang , Pengyu Xue , Zhen Yang , Xiapu Luo , Yuxuan Zhang , Xiran Lyu , Yifei Pei , Zonghan Jia , Yichen Sun , Linhao Wu , Kunwu Zheng

Code repair is a fundamental task in software development, facilitating efficient bug resolution and software maintenance. Although large language models (LLMs) have demonstrated considerable potential in automated code repair, their…

Software Engineering · Computer Science 2026-02-27 Dekun Dai , MingWei Liu , Anji Li , Jialun Cao , Yanlin Wang , Chong Wang , Xin Peng , Zibin Zheng

Automated release note generation addresses the challenge of documenting frequent software updates, where manual efforts are time-consuming and prone to human error. Although recent advances in language models further enhance this process,…

Software Engineering · Computer Science 2025-11-05 Qianru Meng , Zhaochun Ren , Joost Visser

LLMs have achieved strong performance on text-based programming tasks, yet they remain unreliable for block-based languages such as Scratch. Scratch programs exhibit deeply nested, non-linear structures, event-driven concurrency across…

Software Engineering · Computer Science 2026-02-03 Yuan Si , Simeng Han , Daming Li , Hanyuan Shi , Jialu Zhang

Recently, pre-trained large language models (LLMs) have shown impressive abilities in generating codes from natural language descriptions, repairing buggy codes, translating codes between languages, and retrieving relevant code segments.…

Computation and Language · Computer Science 2023-11-07 Mohammad Abdullah Matin Khan , M Saiful Bari , Xuan Long Do , Weishi Wang , Md Rizwan Parvez , Shafiq Joty

Recent advancements in large language models (LLMs) have automated various software engineering tasks, with benchmarks emerging to evaluate their capabilities. However, for adaptation, a critical activity during code reuse, there is no…

Software Engineering · Computer Science 2026-01-09 Tanghaoran Zhang , Xinjun Mao , Shangwen Wang , Yuxin Zhao , Yao Lu , Jin Zhang , Zhang Zhang , Kang Yang , Yue Yu

The rapid advancement of large language models (LLMs) and the development of increasingly large and diverse evaluation benchmarks have introduced substantial computational challenges for model assessment. In this paper, we present EffiEval,…

Computation and Language · Computer Science 2025-08-14 Yaoning Wang , Jiahao Ying , Yixin Cao , Yubo Ma , Yugang Jiang

Automatically resolving software issues is crucial for software development in practice, impacting the software quality and user experience. The process of resolving real-world issues encompasses tasks such as question-answering (QA), fault…

Software Engineering · Computer Science 2024-11-28 Ruida Hu , Chao Peng , Jingyi Ren , Bo Jiang , Xiangxin Meng , Qinyun Wu , Pengfei Gao , Xinchen Wang , Cuiyun Gao

Testing plays a crucial role in the software development cycle, enabling the detection of bugs, vulnerabilities, and other undesirable behaviors. To perform software testing, testers need to write code snippets that execute the program…

Software Engineering · Computer Science 2025-02-04 Wenhan Wang , Chenyuan Yang , Zhijie Wang , Yuheng Huang , Zhaoyang Chu , Da Song , Lingming Zhang , An Ran Chen , Lei Ma

Code generation models can help improve many common software tasks ranging from code completion to defect prediction. Most of the existing benchmarks for code generation LLMs focus on code authoring or code completion. Surprisingly, there…

Software Engineering · Computer Science 2025-03-20 Kush Jain , Gabriel Synnaeve , Baptiste Rozière

Logical vulnerabilities in software stem from flaws in program logic rather than memory safety, which can lead to critical security failures. Although existing automated program repair techniques primarily focus on repairing memory…

Current code generation evaluation measures functional correctness on well-formed inputs that satisfy all input preconditions. This paradigm has a critical limitation: task descriptions often leave these preconditions implicit, while…

Artificial Intelligence · Computer Science 2026-04-21 Soohan Lim , Joonghyuk Hahn , Hyunwoo Park , Sang-Ki Ko , Yo-Sub Han

Code large language models (LLMs) have made significant progress in code debugging by directly generating the correct code based on the buggy code snippet. Programming benchmarks, typically consisting of buggy code snippet and their…

Software bugs significantly contribute to software cost and increase the risk of system malfunctioning. In recent years, many automated program-repair approaches have been proposed to automatically fix undesired program behavior. Despite of…

Software Engineering · Computer Science 2021-07-19 Dirk Beyer , Lars Grunske , Thomas Lemberger , Minxing Tang

Large Language Models (LLMs) are predominantly assessed based on their common sense reasoning, language comprehension, and logical reasoning abilities. While models trained in specialized domains like mathematics or coding have demonstrated…

Software Engineering · Computer Science 2026-01-08 Danny Brahman , Mohammad Mahoor

Software plays a crucial role in our daily lives, and therefore the quality and security of software systems have become increasingly important. However, vulnerabilities in software still pose a significant threat, as they can have serious…

Software Engineering · Computer Science 2023-09-18 Chaozheng Wang , Zongjie Li , Yun Peng , Shuzheng Gao , Sirong Chen , Shuai Wang , Cuiyun Gao , Michael R. Lyu

Recently, LLM agents have made rapid progress in improving their programming capabilities. However, existing benchmarks lack the ability to automatically evaluate from users' perspective, and also lack the explainability of the results of…

Software Engineering · Computer Science 2025-06-03 Kaiyuan Liu , Youcheng Pan , Yang Xiang , Daojing He , Jing Li , Yexing Du , Tianrun Gao
‹ Prev 1 2 3 10 Next ›