Related papers: RepoDoc: A Knowledge Graph-Based Framework to Auto…

RepoCoder: Repository-Level Code Completion Through Iterative Retrieval and Generation

The task of repository-level code completion is to continue writing the unfinished code based on a broader context of the repository. While for automated code completion tools, it is difficult to utilize the useful information scattered in…

Computation and Language · Computer Science 2023-10-23 Fengji Zhang , Bei Chen , Yue Zhang , Jacky Keung , Jin Liu , Daoguang Zan , Yi Mao , Jian-Guang Lou , Weizhu Chen

RepoGraph: Enhancing AI Software Engineering with Repository-level Code Graph

Large Language Models (LLMs) excel in code generation yet struggle with modern AI software engineering tasks. Unlike traditional function-level or file-level coding tasks, AI software engineering requires not only basic coding proficiency…

Software Engineering · Computer Science 2025-03-20 Siru Ouyang , Wenhao Yu , Kaixin Ma , Zilin Xiao , Zhihan Zhang , Mengzhao Jia , Jiawei Han , Hongming Zhang , Dong Yu

RepoAgent: An LLM-Powered Open-Source Framework for Repository-level Code Documentation Generation

Generative models have demonstrated considerable potential in software engineering, particularly in tasks such as code generation and debugging. However, their utilization in the domain of code documentation generation remains…

Computation and Language · Computer Science 2024-02-27 Qinyu Luo , Yining Ye , Shihao Liang , Zhong Zhang , Yujia Qin , Yaxi Lu , Yesai Wu , Xin Cong , Yankai Lin , Yingli Zhang , Xiaoyin Che , Zhiyuan Liu , Maosong Sun

gDoc: Automatic Generation of Structured API Documentation

Generating and maintaining API documentation with integrity and consistency can be time-consuming and expensive for evolving APIs. To solve this problem, several approaches have been proposed to automatically generate high-quality API…

Software Engineering · Computer Science 2023-03-24 Shujun Wang , Yongqiang Tian , Dengcheng He

RepoHyper: Search-Expand-Refine on Semantic Graphs for Repository-Level Code Completion

Code Large Language Models (CodeLLMs) have demonstrated impressive proficiency in code completion tasks. However, they often fall short of fully understanding the extensive context of a project repository, such as the intricacies of…

Software Engineering · Computer Science 2024-08-15 Huy N. Phan , Hoang N. Phan , Tien N. Nguyen , Nghi D. Q. Bui

RepoSummary: Feature-Oriented Summarization and Documentation Generation for Code Repositories

Repository summarization is a crucial research question in development and maintenance for software engineering. Existing repository summarization techniques primarily focus on summarizing code according to the directory tree, which is…

Software Engineering · Computer Science 2025-10-14 Yifeng Zhu , Xianlin Zhao , Xutian Li , Yanzhen Zou , Haizhuo Yuan , Yue Wang , Bing Xie

RepoGenReflex: Enhancing Repository-Level Code Completion with Verbal Reinforcement and Retrieval-Augmented Generation

In real-world software engineering tasks, solving a problem often requires understanding and modifying multiple functions, classes, and files across a large codebase. Therefore, on the repository level, it is crucial to extract the relevant…

Software Engineering · Computer Science 2024-09-25 Jicheng Wang , Yifeng He , Hao Chen

KCoEvo: A Knowledge Graph Augmented Framework for Evolutionary Code Generation

Code evolution is inevitable in modern software development. Changes to third-party APIs frequently break existing code and complicate maintenance, posing practical challenges for developers. While large language models (LLMs) have shown…

Software Engineering · Computer Science 2026-03-10 Jiazhen Kang , Yuchen Lu , Chen Jiang , Jinrui Liu , Tianhao Zhang , Bo Jiang , Ningyuan Sun , Tongtong Wu , Guilin Qi

Knowledge Graph Based Repository-Level Code Generation

Recent advancements in Large Language Models (LLMs) have transformed code generation from natural language queries. However, despite their extensive knowledge and ability to produce high-quality code, LLMs often struggle with contextual…

Artificial Intelligence · Computer Science 2025-07-17 Mihir Athale , Vishal Vaddina

Code2Doc: A Quality-First Curated Dataset for Code Documentation

The performance of automatic code documentation generation models depends critically on the quality of the training data used for supervision. However, most existing code documentation datasets are constructed through large scale scraping…

Software Engineering · Computer Science 2025-12-25 Recep Kaan Karaman , Meftun Akarsu

RPG: A Repository Planning Graph for Unified and Scalable Codebase Generation

Large language models excel at generating individual functions or single files of code, yet generating complete repositories from scratch remains a fundamental challenge. This capability is key to building coherent software systems from…

Computation and Language · Computer Science 2026-02-16 Jane Luo , Xin Zhang , Steven Liu , Jie Wu , Jianfeng Liu , Yiming Huang , Yangyu Huang , Chengyu Yin , Ying Xin , Yuefeng Zhan , Hao Sun , Qi Chen , Scarlett Li , Mao Yang

Remember Your Trace: Memory-Guided Long-Horizon Agentic Framework for Consistent and Hierarchical Repository-Level Code Documentation

Automated code documentation is essential for modern software development, providing the contextual grounding that both human developers and coding agents rely on to navigate large codebases. Existing repository-level approaches process…

Software Engineering · Computer Science 2026-05-15 Suyoung Bae , Jaehoon Lee , Changkyu Choi , YunSeok Choi , Jee-Hyong Lee

Automating API Documentation from Crowdsourced Knowledge

API documentation is crucial for developers to learn and use APIs. However, it is known that many official API documents are obsolete and incomplete. To address this challenge, we propose a new approach called AutoDoc that generates API…

Software Engineering · Computer Science 2026-01-14 Bonan Kou , Zijie Zhou , Muhao Chen , Tianyi Zhang

RepoMaster: Autonomous Exploration and Understanding of GitHub Repositories for Complex Task Solving

The ultimate goal of code agents is to solve complex tasks autonomously. Although large language models (LLMs) have made substantial progress in code generation, real-world tasks typically demand full-fledged code repositories rather than…

Software Engineering · Computer Science 2025-08-26 Huacan Wang , Ziyi Ni , Shuo Zhang , Shuo Lu , Sen Hu , Ziyang He , Chen Hu , Jiaye Lin , Yifu Guo , Ronghao Chen , Xin Li , Daxin Jiang , Yuntao Du , Pin Lyu

Completion by Comprehension: Guiding Code Generation with Multi-Granularity Understanding

As code completion task from function-level to repository-level, leveraging contextual information from large-scale codebases becomes a core challenge. However, existing retrieval-augmented generation (RAG) methods typically treat code as…

Software Engineering · Computer Science 2025-12-05 Xinkui Zhao , Rongkai Liu , Yifan Zhang , Chen Zhi , Lufei Zhang , Guanjie Cheng , Yueshen Xu , Shuiguang Deng , Jianwei Yin

FlexDoc: Parameterized Sampling for Diverse Multilingual Synthetic Documents for Training Document Understanding Models

Developing document understanding models at enterprise scale requires large, diverse, and well-annotated datasets spanning a wide range of document types. However, collecting such data is prohibitively expensive due to privacy constraints,…

Artificial Intelligence · Computer Science 2025-10-03 Karan Dua , Hitesh Laxmichand Patel , Puneet Mittal , Ranjeet Gupta , Amit Agarwal , Praneet Pabolu , Srikant Panda , Hansa Meghwani , Graham Horwood , Fahad Shah

Revolutionizing Retrieval-Augmented Generation with Enhanced PDF Structure Recognition

With the rapid development of Large Language Models (LLMs), Retrieval-Augmented Generation (RAG) has become a predominant method in the field of professional knowledge-based question answering. Presently, major foundation model companies…

Artificial Intelligence · Computer Science 2024-01-24 Demiao Lin

DBAutoDoc: Automated Discovery and Documentation of Undocumented Database Schemas via Statistical Analysis and Iterative LLM Refinement

A tremendous number of critical database systems lack adequate documentation. Declared primary keys are absent, foreign key constraints have been dropped for performance, column names are cryptic abbreviations, and no entity-relationship…

Databases · Computer Science 2026-03-25 Amith Nagarajan , Thomas Altman

RepoBench: Benchmarking Repository-Level Code Auto-Completion Systems

Large Language Models (LLMs) have greatly advanced code auto-completion systems, with a potential for substantial productivity enhancements for developers. However, current benchmarks mainly focus on single-file tasks, leaving an assessment…

Computation and Language · Computer Science 2023-10-05 Tianyang Liu , Canwen Xu , Julian McAuley

RepoScope: Leveraging Call Chain-Aware Multi-View Context for Repository-Level Code Generation

Repository-level code generation aims to generate code within the context of a specified repository. Existing approaches typically employ retrieval-augmented generation (RAG) techniques to provide LLMs with relevant contextual information…

Software Engineering · Computer Science 2025-11-04 Yang Liu , Li Zhang , Fang Liu , Zhuohang Wang , Donglin Wei , Zhishuo Yang , Kechi Zhang , Jia Li , Lin Shi