Related papers: Relative Positioning Based Code Chunking Method Fo…

ContextModule: Improving Code Completion via Repository-level Contextual Information

Large Language Models (LLMs) have demonstrated impressive capabilities in code completion tasks, where they assist developers by predicting and generating new code in real-time. However, existing LLM-based code completion systems primarily…

Software Engineering · Computer Science 2024-12-12 Zhanming Guan , Junlin Liu , Jierui Liu , Chao Peng , Dexin Liu , Ningyuan Sun , Bo Jiang , Wenchao Li , Jie Liu , Hang Zhu

Beyond More Context: How Granularity and Order Drive Code Completion Quality

Context plays an important role in the quality of code completion, as Large Language Models (LLMs) require sufficient and relevant information to assist developers in code generation tasks. However, composing a relevant context for code…

Software Engineering · Computer Science 2025-10-09 Uswat Yusuf , Genevieve Caumartin , Diego Elias Costa

On The Importance of Reasoning for Context Retrieval in Repository-Level Code Editing

Recent advancements in code-fluent Large Language Models (LLMs) enabled the research on repository-level code editing. In such tasks, the model navigates and modifies the entire codebase of a project according to request. Hence, such tasks…

Software Engineering · Computer Science 2024-06-10 Alexander Kovrigin , Aleksandra Eliseeva , Yaroslav Zharov , Timofey Bryksin

GraphCoder: Enhancing Repository-Level Code Completion via Code Context Graph-based Retrieval and Language Model

The performance of repository-level code completion depends upon the effective leverage of both general and repository-specific knowledge. Despite the impressive capability of code LLMs in general code completion tasks, they often exhibit…

Software Engineering · Computer Science 2024-09-16 Wei Liu , Ailun Yu , Daoguang Zan , Bo Shen , Wei Zhang , Haiyan Zhao , Zhi Jin , Qianxiang Wang

Hierarchical Context Pruning: Optimizing Real-World Code Completion with Repository-Level Pretrained Code LLMs

Some recently developed code large language models (Code LLMs) have been pre-trained on repository-level code data (Repo-Code LLMs), enabling these models to recognize repository structures and utilize cross-file information for code…

Computation and Language · Computer Science 2024-06-28 Lei Zhang , Yunshui Li , Jiaming Li , Xiaobo Xia , Jiaxi Yang , Run Luo , Minzheng Wang , Longze Chen , Junhao Liu , Min Yang

ReACC: A Retrieval-Augmented Code Completion Framework

Code completion, which aims to predict the following code token(s) according to the code context, can improve the productivity of software development. Recent work has proved that statistical language modeling with transformers can greatly…

Software Engineering · Computer Science 2022-03-16 Shuai Lu , Nan Duan , Hojae Han , Daya Guo , Seung-won Hwang , Alexey Svyatkovskiy

Late Chunking: Contextual Chunk Embeddings Using Long-Context Embedding Models

Many use cases require retrieving smaller portions of text, and dense vector-based retrieval systems often perform better with shorter text segments, as the semantics are less likely to be over-compressed in the embeddings. Consequently,…

Computation and Language · Computer Science 2025-07-08 Michael Günther , Isabelle Mohr , Daniel James Williams , Bo Wang , Han Xiao

AlignCoder: Aligning Retrieval with Target Intent for Repository-Level Code Completion

Repository-level code completion remains a challenging task for existing code large language models (code LLMs) due to their limited understanding of repository-specific context and domain knowledge. While retrieval-augmented generation…

Software Engineering · Computer Science 2026-01-28 Tianyue Jiang , Yanli Wang , Yanlin Wang , Daya Guo , Ensheng Shi , Yuchi Ma , Jiachi Chen , Zibin Zheng

An Evaluation of Context Length Extrapolation in Long Code via Positional Embeddings and Efficient Attention

The rapid advancement of large language models (LLMs) has led to a significant increase in automated tools in the software engineering, capable of performing various code-related tasks such as code generation, completion, and translation.…

Software Engineering · Computer Science 2026-02-26 Madhusudan Ghosh , Rishabh Gupta

Retrieval-augmented code completion for local projects using large language models

The use of large language models (LLMs) is becoming increasingly widespread among software developers. However, privacy and computational requirements are problematic with commercial solutions and the use of LLMs. In this work, we focus on…

Software Engineering · Computer Science 2025-06-17 Marko Hostnik , Marko Robnik-Šikonja

Enhancing LLM-Based Coding Tools through Native Integration of IDE-Derived Static Context

Large Language Models (LLMs) have achieved remarkable success in code completion, as evidenced by their essential roles in developing code assistant services such as Copilot. Being trained on in-file contexts, current LLMs are quite…

Software Engineering · Computer Science 2024-02-20 Yichen Li , Yun Peng , Yintong Huo , Michael R. Lyu

Beyond Chunk-Then-Embed: A Comprehensive Taxonomy and Evaluation of Document Chunking Strategies for Information Retrieval

Document chunking is a critical preprocessing step in dense retrieval systems, yet the design space of chunking strategies remains poorly understood. Recent research has proposed several concurrent approaches, including LLM-guided methods…

Information Retrieval · Computer Science 2026-02-20 Yongjie Zhou , Shuai Wang , Bevan Koopman , Guido Zuccon

CodeRAG: Finding Relevant and Necessary Knowledge for Retrieval-Augmented Repository-Level Code Completion

Repository-level code completion automatically predicts the unfinished code based on the broader information from the repository. Recent strides in Code Large Language Models (code LLMs) have spurred the development of repository-level code…

Computation and Language · Computer Science 2025-09-22 Sheng Zhang , Yifan Ding , Shuquan Lian , Shun Song , Hui Li

Reconstructing Context: Evaluating Advanced Chunking Strategies for Retrieval-Augmented Generation

Retrieval-augmented generation (RAG) has become a transformative approach for enhancing large language models (LLMs) by grounding their outputs in external knowledge sources. Yet, a critical question persists: how can vast volumes of…

Information Retrieval · Computer Science 2025-04-29 Carlo Merola , Jaspinder Singh

Sequence Model Design for Code Completion in the Modern IDE

Code completion plays a prominent role in modern integrated development environments (IDEs). Machine learning has become ubiquitous in analogous natural language writing and search software, surfacing more relevant autocompletions and…

Software Engineering · Computer Science 2020-04-14 Gareth Ari Aye , Gail E. Kaiser

HyQE: Ranking Contexts with Hypothetical Query Embeddings

In retrieval-augmented systems, context ranking techniques are commonly employed to reorder the retrieved contexts based on their relevance to a user query. A standard approach is to measure this relevance through the similarity between…

Information Retrieval · Computer Science 2024-10-22 Weichao Zhou , Jiaxin Zhang , Hilaf Hasson , Anu Singh , Wenchao Li

REPOFUSE: Repository-Level Code Completion with Fused Dual Context

The success of language models in code assistance has spurred the proposal of repository-level code completion as a means to enhance prediction accuracy, utilizing the context from the entire codebase. However, this amplified context can…

Software Engineering · Computer Science 2024-02-26 Ming Liang , Xiaoheng Xie , Gehao Zhang , Xunjin Zheng , Peng Di , wei jiang , Hongwei Chen , Chengpeng Wang , Gang Fan

Towards Full-line Code Completion with Neural Language Models

A code completion system suggests future code elements to developers given a partially-complete code snippet. Code completion is one of the most useful features in Integrated Development Environments (IDEs). Currently, most code completion…

Software Engineering · Computer Science 2020-09-21 Wenhan Wang , Sijie Shen , Ge Li , Zhi Jin

Meta-Chunking: Learning Text Segmentation and Semantic Completion via Logical Perception

While Retrieval-Augmented Generation (RAG) has emerged as a promising paradigm for boosting large language models (LLMs) in knowledge-intensive tasks, it often overlooks the crucial aspect of text chunking within its workflow. This paper…

Computation and Language · Computer Science 2025-05-22 Jihao Zhao , Zhiyuan Ji , Yuchen Feng , Pengnian Qi , Simin Niu , Bo Tang , Feiyu Xiong , Zhiyu Li

Hierarchical Re-ranker Retriever (HRR)

Retrieving the right level of context for a given query is a perennial challenge in information retrieval - too large a chunk dilutes semantic specificity, while chunks that are too small lack broader context. This paper introduces the…

Information Retrieval · Computer Science 2025-03-05 Ashish Singh , Priti Mohapatra