Related papers: PERC: Plan-As-Query Example Retrieval for Underrep…

CodeRAG-Bench: Can Retrieval Augment Code Generation?

While language models (LMs) have proven remarkably adept at generating code, many programs are challenging for LMs to generate using their parametric knowledge alone. Providing external contexts such as library documentation can facilitate…

Software Engineering · Computer Science 2025-02-28 Zora Zhiruo Wang , Akari Asai , Xinyan Velocity Yu , Frank F. Xu , Yiqing Xie , Graham Neubig , Daniel Fried

Enhancing Code Translation in Language Models with Few-Shot Learning via Retrieval-Augmented Generation

The advent of large language models (LLMs) has significantly advanced the field of code translation, enabling automated translation between programming languages. However, these models often struggle with complex translation tasks due to…

Artificial Intelligence · Computer Science 2024-07-30 Manish Bhattarai , Javier E. Santos , Shawn Jones , Ayan Biswas , Boian Alexandrov , Daniel O'Malley

Prompt-based Code Completion via Multi-Retrieval Augmented Generation

Automated code completion, aiming at generating subsequent tokens from unfinished code, has been significantly benefited from recent progress in pre-trained Large Language Models (LLMs). However, these models often suffer from coherence…

Software Engineering · Computer Science 2024-05-14 Hanzhuo Tan , Qi Luo , Ling Jiang , Zizheng Zhan , Jing Li , Haotian Zhang , Yuqun Zhang

RAG-RL: Advancing Retrieval-Augmented Generation via RL and Curriculum Learning

Retrieval-augmented generation (RAG) systems rely on retrieval models for identifying relevant contexts and answer generation models for utilizing those contexts. However, retrievers exhibit imperfect recall and precision, limiting…

Computation and Language · Computer Science 2026-04-29 Jerry Huang , Siddarth Madala , Risham Sidhu , Cheng Niu , Hao Peng , Julia Hockenmaier , Tong Zhang

Prompt Selection and Augmentation for Few Examples Code Generation in Large Language Model and its Application in Robotics Control

Few-shot prompting and step-by-step reasoning have enhanced the capabilities of Large Language Models (LLMs) in tackling complex tasks including code generation. In this paper, we introduce a prompt selection and augmentation algorithm…

Robotics · Computer Science 2024-03-21 On Tai Wu , Frodo Kin Sun Chan , Zunhao Zhang , Yan Nei Law , Benny Drescher , Edmond Shiao Bun Lai

Generation-Augmented Query Expansion For Code Retrieval

Pre-trained language models have achieved promising success in code retrieval tasks, where a natural language documentation query is given to find the most relevant existing code snippet. However, existing models focus only on optimizing…

Software Engineering · Computer Science 2022-12-22 Dong Li , Yelong Shen , Ruoming Jin , Yi Mao , Kuan Wang , Weizhu Chen

Context-Augmented Code Generation Using Programming Knowledge Graphs

Large Language Models (LLMs) excel at code generation but struggle with complex problems. Retrieval-Augmented Generation (RAG) mitigates this issue by integrating external knowledge, yet retrieval models often miss relevant context, and…

Software Engineering · Computer Science 2026-01-29 Shahd Seddik , Fahd Seddik , Iman Saberi , Fatemeh Fard , Minh Hieu Huynh , Patanamon Thongtanunam

LLM Agents Improve Semantic Code Search

Code Search is a key task that many programmers often have to perform while developing solutions to problems. Current methodologies suffer from an inability to perform accurately on prompts that contain some ambiguity or ones that require…

Software Engineering · Computer Science 2024-08-22 Sarthak Jain , Aditya Dora , Ka Seng Sam , Prabhat Singh

Retrieval-Augmented Code Generation: A Survey with Focus on Repository-Level Approaches

Recent advances in large language models (LLMs) have significantly improved automated code generation. While existing approaches have achieved strong performance at the function and file levels, real-world software engineering requires…

Software Engineering · Computer Science 2026-05-21 Yicheng Tao , Yuante Li , Yao Qin , Yepang Liu

Cross-Lingual Retrieval Augmented Prompt for Low-Resource Languages

Multilingual Pretrained Language Models (MPLMs) have shown their strong multilinguality in recent empirical cross-lingual transfer studies. In this paper, we propose the Prompts Augmented by Retrieval Crosslingually (PARC) pipeline to…

Computation and Language · Computer Science 2023-07-12 Ercong Nie , Sheng Liang , Helmut Schmid , Hinrich Schütze

Does Few-Shot Learning Help LLM Performance in Code Synthesis?

Large language models (LLMs) have made significant strides at code generation through improved model design, training, and chain-of-thought. However, prompt-level optimizations remain an important yet under-explored aspect of LLMs for…

Software Engineering · Computer Science 2024-12-05 Derek Xu , Tong Xie , Botao Xia , Haoyu Li , Yunsheng Bai , Yizhou Sun , Wei Wang

Rethinking Retrieval-Augmented Generation as a Cooperative Decision-Making Problem

Retrieval-Augmented Generation (RAG) has demonstrated strong effectiveness in knowledge-intensive tasks by grounding language generation in external evidence. Despite its success, many existing RAG systems are built based on a…

Computation and Language · Computer Science 2026-04-27 Lichang Song , Ting Long , Yi Chang

An Empirical Study of Retrieval-Augmented Code Generation: Challenges and Opportunities

Code generation aims to automatically generate code snippets of specific programming language according to natural language descriptions. The continuous advancements in deep learning, particularly pre-trained models, have empowered the code…

Software Engineering · Computer Science 2025-01-24 Zezhou Yang , Sirong Chen , Cuiyun Gao , Zhenhao Li , Xing Hu , Kui Liu , Xin Xia

ParetoRAG: Leveraging Sentence-Context Attention for Robust and Efficient Retrieval-Augmented Generation

While Retrieval-Augmented Generation (RAG) systems enhance Large Language Models (LLMs) by incorporating external knowledge, they still face persistent challenges in retrieval inefficiency and the inability of LLMs to filter out irrelevant…

Computation and Language · Computer Science 2025-02-13 Ruobing Yao , Yifei Zhang , Shuang Song , Yuhua Liu , Neng Gao , Chenyang Tu

Assessing the Answerability of Queries in Retrieval-Augmented Code Generation

Thanks to unprecedented language understanding and generation capabilities of large language model (LLM), Retrieval-augmented Code Generation (RaCG) has recently been widely utilized among software developers. While this has increased…

Computation and Language · Computer Science 2024-11-26 Geonmin Kim , Jaeyeon Kim , Hancheol Park , Wooksu Shin , Tae-Ho Kim

Preference-Guided Refactored Tuning for Retrieval Augmented Code Generation

Retrieval-augmented code generation utilizes Large Language Models as the generator and significantly expands their code generation capabilities by providing relevant code, documentation, and more via the retriever. The current approach…

Software Engineering · Computer Science 2024-09-25 Xinyu Gao , Yun Xiong , Deze Wang , Zhenhan Guan , Zejian Shi , Haofen Wang , Shanshan Li

Context-Augmented Code Generation Using Programming Knowledge Graphs

Large Language Models (LLMs) and Code-LLMs (CLLMs) have significantly improved code generation, but, they frequently face difficulties when dealing with challenging and complex problems. Retrieval-Augmented Generation (RAG) addresses this…

Software Engineering · Computer Science 2025-06-17 Iman Saberi , Fatemeh Fard

CODEPROMPTZIP: Code-specific Prompt Compression for Retrieval-Augmented Generation in Coding Tasks with LMs

Retrieval-Augmented Generation (RAG) enhances coding tasks by incorporating retrieved code examples into prompts. However, lengthy prompts, often exceeding tens of thousands of tokens, introduce challenges related to limited context windows…

Software Engineering · Computer Science 2026-04-13 Pengfei He , Shaowei Wang , Tse-Hsun Chen

Retrieval as Generation: A Unified Framework with Self-Triggered Information Planning

We revisit retrieval-augmented generation (RAG) by embedding retrieval control directly into generation. Instead of treating retrieval as an external intervention, we express retrieval decisions within token-level decoding, enabling…

Computation and Language · Computer Science 2026-04-21 Bo Li , Mingda Wang , Gexiang Fang , Shikun Zhang , Wei Ye

Less is More: Making Smaller Language Models Competent Subgraph Retrievers for Multi-hop KGQA

Retrieval-Augmented Generation (RAG) is widely used to inject external non-parametric knowledge into large language models (LLMs). Recent works suggest that Knowledge Graphs (KGs) contain valuable external knowledge for LLMs. Retrieving…

Computation and Language · Computer Science 2024-10-10 Wenyu Huang , Guancheng Zhou , Hongru Wang , Pavlos Vougiouklis , Mirella Lapata , Jeff Z. Pan