English
Related papers

Related papers: ReCode: Robustness Evaluation of Code Generation M…

200 papers

In practice, rigorous reasoning is often a key driver of correct code, while Reinforcement Learning (RL) for code generation often neglects optimizing reasoning quality. Bringing process-level supervision into RL is appealing, but it faces…

Software Engineering · Computer Science 2026-05-06 Lishui Fan , Yu Zhang , Mouxiang Chen , Zhongxin Liu

Large language models have gained significant traction and popularity in recent times, extending their usage to code-generation tasks. While this field has garnered considerable attention, the exploration of testing and evaluating the…

Software Engineering · Computer Science 2026-05-05 Fazle Rabbi , Zishuo Ding , Jinqiu Yang

Large language models are increasingly used for code generation, yet the correctness of their outputs depends not only on model capability but also on how tasks are specified. Prior studies demonstrate that small changes in natural language…

Software Engineering · Computer Science 2026-04-28 Amal AKLI , Mike PAPADAKIS , Maxime CORDY , Yves Le TRAON

Code generation models are not robust to small perturbations, which often lead to incorrect generations and significantly degrade the performance of these models. Although improving the robustness of code generation models is crucial to…

Code generation systems have been extensively developed in recent years to generate source code based on natural language instructions. However, despite their advancements, these systems still face robustness issues where even slightly…

Software Engineering · Computer Science 2023-08-28 Ming Yan , Junjie Chen , Jie M. Zhang , Xuejie Cao , Chen Yang , Mark Harman

In the era of large language models (LLMs), code benchmarks have become an important research area in software engineering and are widely used by practitioners. These benchmarks evaluate the performance of LLMs on specific code-related…

Software Engineering · Computer Science 2025-06-24 Zhiyuan Pan , Xing Hu , Xin Xia , Xiaohu Yang

Robustness is a critical factor for reliable code generation by large language models, yet most evaluations focus on correctness and overlook key issues such as missing input validation and inadequate error handling. In this work, we…

Software Engineering · Computer Science 2025-09-24 Zike Li , Mingwei Liu , Anji Li , Kaifeng He , Yanlin Wang , Xin Peng , Zibin Zheng

Semantic parsing is a technique aimed at constructing a structured representation of the meaning of a natural-language question. Recent advancements in few-shot language models trained on code have demonstrated superior performance in…

Computation and Language · Computer Science 2023-03-10 Terry Yue Zhuo , Zhuang Li , Yujin Huang , Fatemeh Shiri , Weiqing Wang , Gholamreza Haffari , Yuan-Fang Li

The introduction of large language models has significantly advanced code generation. However, open-source models often lack the execution capabilities and iterative refinement of advanced systems like the GPT-4 Code Interpreter. To address…

Software Engineering · Computer Science 2025-01-08 Tianyu Zheng , Ge Zhang , Tianhao Shen , Xueling Liu , Bill Yuchen Lin , Jie Fu , Wenhu Chen , Xiang Yue

Large Language Models (LLMs) exhibit remarkable code generation capabilities but falter when adapting to frequent updates in external library APIs. This critical limitation, stemming from reliance on outdated API knowledge from their…

Computation and Language · Computer Science 2025-11-25 Haoze Wu , Yunzhi Yao , Wenhao Yu , Ningyu Zhang

With the growing reliance on automated code completion tools in software development, the need for comprehensive evaluation benchmarks has become critical. Existing benchmarks focus more on code completion in function and class level by…

Software Engineering · Computer Science 2025-11-03 Qinyun Wu , Chao Peng , Pengfei Gao , Ruida Hu , Haoyu Gan , Bo Jiang , Jinhe Tang , Zhiwen Deng , Zhanming Guan , Cuiyun Gao , Xia Liu , Ping Yang

Code generation aims to automatically generate code snippets of specific programming language according to natural language descriptions. The continuous advancements in deep learning, particularly pre-trained models, have empowered the code…

Software Engineering · Computer Science 2025-01-24 Zezhou Yang , Sirong Chen , Cuiyun Gao , Zhenhao Li , Xing Hu , Kui Liu , Xin Xia

Software engineering research has always being concerned with the improvement of code completion approaches, which suggest the next tokens a developer will likely type while coding. The release of GitHub Copilot constitutes a big step…

Recent studies have adopted pre-trained language models, such as CodeT5 and CodeGPT, for automated program generation tasks like code generation, repair, and translation. Numerous language model-based approaches have been proposed and…

Software Engineering · Computer Science 2024-01-09 Yue Liu , Chakkrit Tantithamthavorn , Yonghui Liu , Li Li

The task of generating code solutions for a given programming problem can benefit from the use of pre-trained language models such as Codex, which can produce multiple diverse samples. However, a major challenge for this task is to select…

Computation and Language · Computer Science 2022-11-24 Bei Chen , Fengji Zhang , Anh Nguyen , Daoguang Zan , Zeqi Lin , Jian-Guang Lou , Weizhu Chen

We introduce self-invoking code generation, a new task designed to evaluate the progressive reasoning and problem-solving capabilities of LLMs. In this task, models are presented with a base problem and a related, more complex problem. They…

Software Engineering · Computer Science 2025-01-03 Zhaojian Yu , Yilun Zhao , Arman Cohan , Xiao-Ping Zhang

Despite the increase in popularity of language models for code generation, it is still unknown how training on bimodal coding forums affects a model's code generation performance and reliability. We, therefore, collect a dataset of over…

Machine Learning · Computer Science 2022-11-16 Gabriel Orlanski , Seonhye Yang , Michael Healy

Large language models (LLMs) achieve promising results in code generation based on a given natural language description. They have been integrated into open-source projects and commercial products to facilitate daily coding activities. The…

Software Engineering · Computer Science 2024-07-01 Junkai Chen , Zhenhao Li , Xing Hu , Xin Xia

Large Language Models (LLMs) have demonstrated impressive performance in code generation tasks under idealized conditions, where task descriptions are clear and precise. However, in practice, task descriptions frequently exhibit ambiguity,…

Software Engineering · Computer Science 2025-07-29 Maya Larbi , Amal Akli , Mike Papadakis , Rihab Bouyousfi , Maxime Cordy , Federica Sarro , Yves Le Traon

Recently, a number of repository-level code generation benchmarks-such as CoderEval, DevEval, RepoEval, RepoBench, and LongCodeArena-have emerged to evaluate the capabilities of large language models (LLMs) beyond standalone benchmarks like…

Software Engineering · Computer Science 2025-06-26 Shanchao Liang , Yiran Hu , Nan Jiang , Lin Tan
‹ Prev 1 2 3 10 Next ›