Related papers: When LLMs Lag Behind: Knowledge Conflicts from Evo…

When LLMs Meet API Documentation: Can Retrieval Augmentation Aid Code Generation Just as It Helps Developers?

Retrieval-augmented generation (RAG) has increasingly shown its power in extending large language models' (LLMs') capability beyond their pre-trained knowledge. Existing works have shown that RAG can help with software development tasks…

Software Engineering · Computer Science 2025-03-20 Jingyi Chen , Songqiang Chen , Jialun Cao , Jiasi Shen , Shing-Chi Cheung

Studying Large Language Model Behaviors Under Context-Memory Conflicts With Real Documents

Retrieval-augmented generation (RAG) mitigates many problems of fully parametric language models, such as temporal degradation, hallucinations, and lack of grounding. In RAG, the model's knowledge can be updated from documents provided in…

Machine Learning · Computer Science 2024-10-10 Evgenii Kortukov , Alexander Rubinstein , Elisa Nguyen , Seong Joon Oh

Beyond Synthetic Benchmarks: Evaluating LLM Performance on Real-World Class-Level Code Generation

Large language models (LLMs) have demonstrated strong performance on function-level code generation benchmarks, yet real-world software development increasingly demands class-level implementations that integrate multiple methods,…

Software Engineering · Computer Science 2025-11-06 Musfiqur Rahman , SayedHassan Khatoonabadi , Emad Shihab

Code Evolution Graphs: Understanding Large Language Model Driven Design of Algorithms

Large Language Models (LLMs) have demonstrated great promise in generating code, especially when used inside an evolutionary computation framework to iteratively optimize the generated algorithms. However, in some cases they fail to…

Neural and Evolutionary Computing · Computer Science 2025-03-24 Niki van Stein , Anna V. Kononova , Lars Kotthoff , Thomas Bäck

LLMs Meet Library Evolution: Evaluating Deprecated API Usage in LLM-based Code Completion

Large language models (LLMs), pre-trained or fine-tuned on large code corpora, have shown effectiveness in generating code completions. However, in LLM-based code completion, LLMs may struggle to use correct and up-to-date Application…

Software Engineering · Computer Science 2025-02-14 Chong Wang , Kaifeng Huang , Jian Zhang , Yebo Feng , Lyuye Zhang , Yang Liu , Xin Peng

CODESYNC: Synchronizing Large Language Models with Dynamic Code Evolution at Scale

Large Language Models (LLMs) have exhibited exceptional performance in software engineering yet face challenges in adapting to continually evolving code knowledge, particularly regarding the frequent updates of third-party library APIs.…

Computation and Language · Computer Science 2025-06-19 Chenlong Wang , Zhaoyang Chu , Zhengxiang Cheng , Xuyi Yang , Kaiyue Qiu , Yao Wan , Zhou Zhao , Xuanhua Shi , Dongping Chen

RAG or Learning? Understanding the Limits of LLM Adaptation under Continuous Knowledge Drift in the Real World

Large language models (LLMs) acquire most of their knowledge during pretraining, which ties them to a fixed snapshot of the world and makes adaptation to continuously evolving knowledge challenging. As facts, entities, and events change…

Computation and Language · Computer Science 2026-04-16 Hanbing Liu , Lang Cao , Yang Li

The Instruction Gap: LLMs get lost in Following Instruction

Large Language Models (LLMs) have shown remarkable capabilities in natural language understanding and generation, yet their deployment in enterprise environments reveals a critical limitation: inconsistent adherence to custom instructions.…

Computation and Language · Computer Science 2026-01-08 Vishesh Tripathi , Uday Allu , Biddwan Ahmed

Task Matters: Knowledge Requirements Shape LLM Responses to Context-Memory Conflict

Large language models (LLMs) draw on both contextual information and parametric memory, yet these sources can conflict. Prior studies have largely examined this issue in contextual question answering, implicitly assuming that tasks should…

Computation and Language · Computer Science 2026-04-21 Kaiser Sun , Fan Bai , Mark Dredze

ReCode: Updating Code API Knowledge with Reinforcement Learning

Large Language Models (LLMs) exhibit remarkable code generation capabilities but falter when adapting to frequent updates in external library APIs. This critical limitation, stemming from reliance on outdated API knowledge from their…

Computation and Language · Computer Science 2025-11-25 Haoze Wu , Yunzhi Yao , Wenhao Yu , Ningyu Zhang

MEMCoder: Multi-dimensional Evolving Memory for Private-Library-Oriented Code Generation

Large Language Models (LLMs) excel at general code generation, but their performance drops sharply in enterprise settings that rely on internal private libraries absent from public pre-training corpora. While Retrieval-Augmented Generation…

Software Engineering · Computer Science 2026-04-28 Mofei Li , Taozhi Chen , Guowei Yang , Jia Li

Contradiction Detection in RAG Systems: Evaluating LLMs as Context Validators for Improved Information Consistency

Retrieval Augmented Generation (RAG) systems have emerged as a powerful method for enhancing large language models (LLMs) with up-to-date information. However, the retrieval step in RAG can sometimes surface documents containing…

Computation and Language · Computer Science 2025-04-02 Vignesh Gokul , Srikanth Tenneti , Alwarappan Nakkiran

RustEvo^2: An Evolving Benchmark for API Evolution in LLM-based Rust Code Generation

Large Language Models (LLMs) have become pivotal tools for automating code generation in software development. However, these models face significant challenges in producing version-aware code for rapidly evolving languages like Rust, where…

Software Engineering · Computer Science 2025-03-24 Linxi Liang , Jing Gong , Mingwei Liu , Chong Wang , Guangsheng Ou , Yanlin Wang , Xin Peng , Zibin Zheng

What's Wrong with Your Code Generated by Large Language Models? An Extensive Study

The increasing development of LLMs in code generation has drawn significant attention among researchers. To enhance LLM-based code generation ability, current efforts are predominantly directed towards collecting high-quality datasets and…

Software Engineering · Computer Science 2025-10-20 Shihan Dou , Haoxiang Jia , Shenxi Wu , Huiyuan Zheng , Muling Wu , Yunbo Tao , Ming Zhang , Mingxu Chai , Jessica Fan , Zhiheng Xi , Rui Zheng , Yueming Wu , Ming Wen , Tao Gui , Qi Zhang , Xipeng Qiu , Xuanjing Huang

Where Do LLMs Still Struggle? An In-Depth Analysis of Code Generation Benchmarks

Large Language Models (LLMs) have achieved remarkable success in code generation, and the race to improve their performance has become a central focus of AI research. Benchmarks and leaderboards are increasingly popular, offering…

Software Engineering · Computer Science 2025-11-07 Amir Molzam Sharifloo , Maedeh Heydari , Parsa Kazerooni , Daniel Maninger , Mira Mezini

Do AI Models Dream of Faster Code? An Empirical Study on LLM-Proposed Performance Improvements in Real-World Software

Large Language Models (LLMs) can generate code, but can they generate fast code for complex, real-world software systems? In this study, we investigate this question using a dataset of 65 tasks mined from performance-critical open-source…

Software Engineering · Computer Science 2026-04-10 Lirong Yi , Gregory Gay , Philipp Leitner

How Large Language Models Balance Internal Knowledge with User and Document Assertions

Large language models (LLMs) often need to balance their internal parametric knowledge with external information, such as user beliefs and content from retrieved documents, in real-world scenarios like RAG or chat-based systems. A model's…

Computation and Language · Computer Science 2026-04-27 Shuowei Li , Haoxin Li , Wenda Chu , Yi Fang

Investigating Context-Faithfulness in Large Language Models: The Roles of Memory Strength and Evidence Style

Retrieval-augmented generation (RAG) improves Large Language Models (LLMs) by incorporating external information into the response generation process. However, how context-faithful LLMs are and what factors influence LLMs' context…

Computation and Language · Computer Science 2025-07-11 Yuepei Li , Kang Zhou , Qiao Qiao , Bach Nguyen , Qing Wang , Qi Li

CodeRAG-Bench: Can Retrieval Augment Code Generation?

While language models (LMs) have proven remarkably adept at generating code, many programs are challenging for LMs to generate using their parametric knowledge alone. Providing external contexts such as library documentation can facilitate…

Software Engineering · Computer Science 2025-02-28 Zora Zhiruo Wang , Akari Asai , Xinyan Velocity Yu , Frank F. Xu , Yiqing Xie , Graham Neubig , Daniel Fried

Rethinking LLM Parametric Knowledge as Post-retrieval Confidence for Dynamic Retrieval and Reranking

Large Language Models (LLMs) often generate inaccurate responses (hallucinations) when faced with questions beyond their knowledge scope. Retrieval-Augmented Generation (RAG) addresses this by leveraging external knowledge, but a critical…

Information Retrieval · Computer Science 2025-09-10 Haoxiang Jin , Ronghan Li , Zixiang Lu , Qiguang Miao