Related papers: Improving Code Localization with Repository Memory

CodeScout: An Effective Recipe for Reinforcement Learning of Code Search Agents

A prerequisite for coding agents to perform tasks on large repositories is code localization - the identification of relevant files, classes, and functions to work on. While repository-level code localization has been performed using…

Software Engineering · Computer Science 2026-03-19 Lintang Sutawika , Aditya Bharat Soni , Bharath Sriraam R R , Apurva Gandhi , Taha Yassine , Sanidhya Vijayvargiya , Yuchen Li , Xuhui Zhou , Yilin Zhang , Leander Melroy Maben , Graham Neubig

Learning to Commit: Generating Organic Pull Requests via Online Repository Memory

Large language model (LLM)-based coding agents achieve impressive results on controlled benchmarks yet routinely produce pull requests that real maintainers reject. The root cause is not functional incorrectness but a lack of organicity:…

Software Engineering · Computer Science 2026-03-30 Mo Li , L. H. Xu , Qitai Tan , Ting Cao , Yunxin Liu

LocAgent: Graph-Guided LLM Agents for Code Localization

Code localization--identifying precisely where in a codebase changes need to be made--is a fundamental yet challenging task in software maintenance. Existing approaches struggle to efficiently navigate complex codebases when identifying…

Software Engineering · Computer Science 2025-04-30 Zhaoling Chen , Xiangru Tang , Gangda Deng , Fang Wu , Jialong Wu , Zhiwei Jiang , Viktor Prasanna , Arman Cohan , Xingyao Wang

Reformulate, Retrieve, Localize: Agents for Repository-Level Bug Localization

Bug localization remains a critical yet time-consuming challenge in large-scale software repositories. Traditional information retrieval-based bug localization (IRBL) methods rely on unchanged bug descriptions, which often contain noisy…

Software Engineering · Computer Science 2025-12-09 Genevieve Caumartin , Glaucia Melo

SWE-fficiency: Can Language Models Optimize Real-World Repositories on Real Workloads?

Optimizing the performance of large-scale software repositories demands expertise in code reasoning and software engineering (SWE) to reduce runtime while preserving program correctness. However, most benchmarks emphasize what to fix rather…

Software Engineering · Computer Science 2025-11-12 Jeffrey Jian Ma , Milad Hashemi , Amir Yazdanbakhsh , Kevin Swersky , Ofir Press , Enhui Li , Vijay Janapa Reddi , Parthasarathy Ranganathan

LLM Agents Improve Semantic Code Search

Code Search is a key task that many programmers often have to perform while developing solutions to problems. Current methodologies suffer from an inability to perform accurately on prompts that contain some ambiguity or ones that require…

Software Engineering · Computer Science 2024-08-22 Sarthak Jain , Aditya Dora , Ka Seng Sam , Prabhat Singh

SWE-Adept: An LLM-Based Agentic Framework for Deep Codebase Analysis and Structured Issue Resolution

Large language models (LLMs) exhibit strong performance on self-contained programming tasks. However, they still struggle with repository-level software engineering (SWE), which demands (1) deep codebase navigation with effective context…

Software Engineering · Computer Science 2026-05-27 Kang He , Kaushik Roy

RepoMaster: Autonomous Exploration and Understanding of GitHub Repositories for Complex Task Solving

The ultimate goal of code agents is to solve complex tasks autonomously. Although large language models (LLMs) have made substantial progress in code generation, real-world tasks typically demand full-fledged code repositories rather than…

Software Engineering · Computer Science 2025-08-26 Huacan Wang , Ziyi Ni , Shuo Zhang , Shuo Lu , Sen Hu , Ziyang He , Chen Hu , Jiaye Lin , Yifu Guo , Ronghao Chen , Xin Li , Daxin Jiang , Yuntao Du , Pin Lyu

CodePlan: Repository-level Coding using LLMs and Planning

Software engineering activities such as package migration, fixing errors reports from static analysis or testing, and adding type annotations or other specifications to a codebase, involve pervasively editing the entire repository of code.…

Software Engineering · Computer Science 2023-09-25 Ramakrishna Bairi , Atharv Sonwane , Aditya Kanade , Vageesh D C , Arun Iyer , Suresh Parthasarathy , Sriram Rajamani , B. Ashok , Shashank Shet

Meta-RAG on Large Codebases Using Code Summarization

Large Language Model (LLM) systems have been at the forefront of applied Artificial Intelligence (AI) research in a multitude of domains. One such domain is software development, where researchers have pushed the automation of a number of…

Software Engineering · Computer Science 2025-08-08 Vali Tawosi , Salwa Alamir , Xiaomo Liu , Manuela Veloso

HAFixAgent: History-Aware Program Repair Agent

Automated program repair (APR) has recently shifted toward large language models and agent-based systems, yet most systems rely on local snapshot context, overlooking repository history. Prior work shows that repository history helps repair…

Software Engineering · Computer Science 2026-04-03 Yu Shi , Hao Li , Bram Adams , Ahmed E. Hassan

LARGER: Lexically Anchored Repository Graph Exploration and Retrieval

Repository-level coding agents must first localize the files and symbols relevant to a task; failures at this stage can cascade across downstream objectives ranging from patch generation to test writing and codebase question answering.…

Information Retrieval · Computer Science 2026-05-19 Yuntong Hu , Tongli Su , Liang Zhao , Bowen Zhu , Hasibul Haque

Extracting Conceptual Knowledge to Locate Software Issues

Issue localization, which identifies faulty code elements such as files or functions, is critical for effective bug fixing. While recent LLM-based and LLM-agent-based approaches improve accuracy, they struggle in large-scale repositories…

Software Engineering · Computer Science 2025-10-07 Ying Wang , Wenjun Mao , Chong Wang , Zhenhao Zhou , Yicheng Zhou , Wenyun Zhao , Yiling Lou , Xin Peng

Code Researcher: Deep Research Agent for Large Systems Code and Commit History

Large Language Model (LLM)-based coding agents have shown promising results on coding benchmarks, but their effectiveness on systems code remains underexplored. Due to the size and complexities of systems code, making changes to a systems…

Software Engineering · Computer Science 2026-05-21 Ramneet Singh , Sathvik Joel , Abhav Mehrotra , Nalin Wadhwa , Ramakrishna B Bairi , Aditya Kanade , Nagarajan Natarajan

SweRank: Software Issue Localization with Code Ranking

Software issue localization, the task of identifying the precise code locations (files, classes, or functions) relevant to a natural language issue description (e.g., bug report, feature request), is a critical yet time-consuming aspect of…

Software Engineering · Computer Science 2026-04-23 Revanth Gangi Reddy , Tarun Suresh , JaeHyeok Doo , Ye Liu , Xuan Phi Nguyen , Yingbo Zhou , Semih Yavuz , Caiming Xiong , Heng Ji , Shafiq Joty

BootstrapAgent: Distilling Repository Setup into Reusable Agent Knowledge

Code agents increasingly help developers work with unfamiliar repositories, but every such task depends on a costly prerequisite: bootstrapping the repository into a usable development state. This process requires substantial…

Software Engineering · Computer Science 2026-05-18 Sihan Fu , Oucheng Liu , Shiyuan Wang , Jin Shi , Chengkun Wei

RepoTransAgent: Multi-Agent LLM Framework for Repository-Aware Code Translation

Repository-aware code translation is critical for modernizing legacy systems, enhancing maintainability, and enabling interoperability across diverse programming languages. While recent advances in large language models (LLMs) have improved…

Software Engineering · Computer Science 2025-08-26 Ziqi Guan , Xin Yin , Zhiyuan Peng , Chao Ni

RepoAgent: An LLM-Powered Open-Source Framework for Repository-level Code Documentation Generation

Generative models have demonstrated considerable potential in software engineering, particularly in tasks such as code generation and debugging. However, their utilization in the domain of code documentation generation remains…

Computation and Language · Computer Science 2024-02-27 Qinyu Luo , Yining Ye , Shihao Liang , Zhong Zhang , Yujia Qin , Yaxi Lu , Yesai Wu , Xin Cong , Yankai Lin , Yingli Zhang , Xiaoyin Che , Zhiyuan Liu , Maosong Sun

SGAgent: Suggestion-Guided LLM-Based Multi-Agent Framework for Repository-Level Software Repair

Large Language Models (LLMs) have enabled intelligent agents that autonomously interact with environments and invoke external tools. Recently, agent-based software repair has drawn wide attention, as repair agents can localize bugs,…

Software Engineering · Computer Science 2026-05-26 Quanjun Zhang , Chengyu Gao , Yu Han , Ye Shang , Chunrong Fang , Zhenyu Chen , Liang Xiao

Repository-level Code Search with Neural Retrieval Methods

This paper presents a multi-stage reranking system for repository-level code search, which leverages the vastly available commit histories of large open-source repositories to aid in bug fixing. We define the task of repository-level code…

Information Retrieval · Computer Science 2025-02-12 Siddharth Gandhi , Luyu Gao , Jamie Callan