English
Related papers

Related papers: CodeUpdateArena: Benchmarking Knowledge Editing on…

200 papers

Large Language Models (LLMs) have reshaped code generation by synergizing their exceptional comprehension of natural language and programming syntax, thereby substantially boosting developer productivity. These advancements have prompted…

Software Engineering · Computer Science 2025-03-04 Mingzhe Du , Anh Tuan Luu , Bin Ji , Xiaobao Wu , Dong Huang , Terry Yue Zhuo , Qian Liu , See-Kiong Ng

Code large language models (codeLLMs) have made significant strides in code generation. Most previous code-related benchmarks, which consist of various programming exercises along with the corresponding test cases, are used as a common…

Computation and Language · Computer Science 2024-12-09 Jian Yang , Jiaxi Yang , Ke Jin , Yibo Miao , Lei Zhang , Liqun Yang , Zeyu Cui , Yichang Zhang , Binyuan Hui , Junyang Lin

Large Language Models (LLMs) have exhibited exceptional performance in software engineering yet face challenges in adapting to continually evolving code knowledge, particularly regarding the frequent updates of third-party library APIs.…

Computation and Language · Computer Science 2025-06-19 Chenlong Wang , Zhaoyang Chu , Zhengxiang Cheng , Xuyi Yang , Kaiyue Qiu , Yao Wan , Zhou Zhao , Xuanhua Shi , Dongping Chen

Large Language Models (LLMs) exhibit remarkable code generation capabilities but falter when adapting to frequent updates in external library APIs. This critical limitation, stemming from reliance on outdated API knowledge from their…

Computation and Language · Computer Science 2025-11-25 Haoze Wu , Yunzhi Yao , Wenhao Yu , Ningyu Zhang

Automating the decision of whether a code change requires manual review is vital for maintaining software quality in modern development workflows. However, the emergence of new programming languages and frameworks creates a critical…

Software Engineering · Computer Science 2025-09-08 Yogev Cohen , Dudi Ohayon , Romy Somkin , Yehudit Aperstein , Alexander Apartsin

Large Language Models (LLMs) for code are rapidly evolving, with code editing emerging as a critical capability. We introduce CodeEditorBench, an evaluation framework designed to rigorously assess the performance of LLMs in code editing…

Large language models (LLMs) for code are increasingly used in software development, but they remain static after pretraining while APIs and software libraries continue to evolve. Model editing offers a lightweight alternative to retraining…

Software Engineering · Computer Science 2026-05-11 Vinaik Chhetri , Moghis Fereidouni , A. B Siddique , Umar Farooq

The rapid evolution of software libraries creates a significant challenge for Large Language Models (LLMs), whose static parametric knowledge often becomes stale post-training. While retrieval-augmented generation (RAG) is commonly used to…

Software Engineering · Computer Science 2026-04-13 Ahmed Nusayer Ashik , Shaowei Wang , Tse-Hsun Chen , Muhammad Asaduzzaman , Yuan Tian

Knowledge editing aims to update the embedded knowledge within Large Language Models (LLMs). However, existing approaches, whether through parameter modification or external memory integration, often suffer from inconsistent evaluation…

Computation and Language · Computer Science 2025-05-27 Guoxiu He , Xin Song , Futing Wang , Aixin Sun

Large language models (LLMs) have demonstrated strong performance on function-level code generation benchmarks, yet real-world software development increasingly demands class-level implementations that integrate multiple methods,…

Software Engineering · Computer Science 2025-11-06 Musfiqur Rahman , SayedHassan Khatoonabadi , Emad Shihab

The ability of large language models (LLMs) to mimic human-like intelligence has led to a surge in LLM-based autonomous agents. Though recent LLMs seem capable of planning and reasoning given user instructions, their effectiveness in…

Code review is a critical practice in modern software engineering, helping developers detect defects early, improve code quality, and facilitate knowledge sharing. With the rapid advancement of large language models (LLMs), a growing body…

Software Engineering · Computer Science 2026-02-17 Taufiqul Islam Khan , Shaowei Wang , Haoxiang Zhang , Tse-Hsun Chen

Implementing new features in repository-level codebases is a crucial application of code generation models. However, current benchmarks lack a dedicated evaluation framework for this capability. To fill this gap, we introduce FEA-Bench, a…

Software Engineering · Computer Science 2025-06-23 Wei Li , Xin Zhang , Zhongxin Guo , Shaoguang Mao , Wen Luo , Guangyue Peng , Yangyu Huang , Houfeng Wang , Scarlett Li

Large Language Models (LLM) are increasingly used for software development, yet existing benchmarks for LLM-based coding assistance do not reflect the constraints of High Energy Physics (HEP) and High Performance Computing (HPC) software.…

Large language models (LLMs) excel across many natural language processing tasks but face challenges in domain-specific, analytical tasks such as conducting research surveys. This study introduces ResearchArena, a benchmark designed to…

Artificial Intelligence · Computer Science 2025-09-09 Hao Kang , Chenyan Xiong

Large language models (LLMs) have achieved impressive performance across various natural language benchmarks, prompting a continual need to curate more difficult datasets for larger LLMs, which is costly and time-consuming. In this paper,…

Computation and Language · Computer Science 2024-06-07 Jiahao Ying , Yixin Cao , Yushi Bai , Qianru Sun , Bo Wang , Wei Tang , Zhaojun Ding , Yizhe Yang , Xuanjing Huang , Shuicheng Yan

Large Language Models (LLMs) excel in natural language processing by encoding extensive human knowledge, but their utility relies on timely updates as knowledge evolves. Updating LLMs involves two key tasks simultaneously: unlearning to…

Computation and Language · Computer Science 2025-02-04 Binchi Zhang , Zhengzhang Chen , Zaiyi Zheng , Jundong Li , Haifeng Chen

Task automation has been greatly empowered by the recent advances in Large Language Models (LLMs) via Python code, where the tasks ranging from software engineering development to general-purpose reasoning. While current benchmarks have…

With the emergence of Large Language Models (LLMs), there has been a significant improvement in the programming capabilities of models, attracting growing attention from researchers. Evaluating the programming capabilities of LLMs is…

Application Programming Interfaces (APIs) facilitate the integration of third-party dependencies within the code of client applications. However, changes to an API, such as deprecation, modification of parameter names or types, or complete…

Software Engineering · Computer Science 2026-04-14 Frank Reyes , May Mahmoud , Federico Bono , Sarah Nadi , Benoit Baudry , Martin Monperrus
‹ Prev 1 2 3 10 Next ›