English
Related papers

Related papers: RepoDoc: A Knowledge Graph-Based Framework to Auto…

200 papers

The task of repository-level code completion is to continue writing the unfinished code based on a broader context of the repository. While for automated code completion tools, it is difficult to utilize the useful information scattered in…

Computation and Language · Computer Science 2023-10-23 Fengji Zhang , Bei Chen , Yue Zhang , Jacky Keung , Jin Liu , Daoguang Zan , Yi Mao , Jian-Guang Lou , Weizhu Chen

Large Language Models (LLMs) excel in code generation yet struggle with modern AI software engineering tasks. Unlike traditional function-level or file-level coding tasks, AI software engineering requires not only basic coding proficiency…

Software Engineering · Computer Science 2025-03-20 Siru Ouyang , Wenhao Yu , Kaixin Ma , Zilin Xiao , Zhihan Zhang , Mengzhao Jia , Jiawei Han , Hongming Zhang , Dong Yu

Generative models have demonstrated considerable potential in software engineering, particularly in tasks such as code generation and debugging. However, their utilization in the domain of code documentation generation remains…

Computation and Language · Computer Science 2024-02-27 Qinyu Luo , Yining Ye , Shihao Liang , Zhong Zhang , Yujia Qin , Yaxi Lu , Yesai Wu , Xin Cong , Yankai Lin , Yingli Zhang , Xiaoyin Che , Zhiyuan Liu , Maosong Sun

Generating and maintaining API documentation with integrity and consistency can be time-consuming and expensive for evolving APIs. To solve this problem, several approaches have been proposed to automatically generate high-quality API…

Software Engineering · Computer Science 2023-03-24 Shujun Wang , Yongqiang Tian , Dengcheng He

Code Large Language Models (CodeLLMs) have demonstrated impressive proficiency in code completion tasks. However, they often fall short of fully understanding the extensive context of a project repository, such as the intricacies of…

Software Engineering · Computer Science 2024-08-15 Huy N. Phan , Hoang N. Phan , Tien N. Nguyen , Nghi D. Q. Bui

Repository summarization is a crucial research question in development and maintenance for software engineering. Existing repository summarization techniques primarily focus on summarizing code according to the directory tree, which is…

Software Engineering · Computer Science 2025-10-14 Yifeng Zhu , Xianlin Zhao , Xutian Li , Yanzhen Zou , Haizhuo Yuan , Yue Wang , Bing Xie

In real-world software engineering tasks, solving a problem often requires understanding and modifying multiple functions, classes, and files across a large codebase. Therefore, on the repository level, it is crucial to extract the relevant…

Software Engineering · Computer Science 2024-09-25 Jicheng Wang , Yifeng He , Hao Chen

Code evolution is inevitable in modern software development. Changes to third-party APIs frequently break existing code and complicate maintenance, posing practical challenges for developers. While large language models (LLMs) have shown…

Software Engineering · Computer Science 2026-03-10 Jiazhen Kang , Yuchen Lu , Chen Jiang , Jinrui Liu , Tianhao Zhang , Bo Jiang , Ningyuan Sun , Tongtong Wu , Guilin Qi

Recent advancements in Large Language Models (LLMs) have transformed code generation from natural language queries. However, despite their extensive knowledge and ability to produce high-quality code, LLMs often struggle with contextual…

Artificial Intelligence · Computer Science 2025-07-17 Mihir Athale , Vishal Vaddina

The performance of automatic code documentation generation models depends critically on the quality of the training data used for supervision. However, most existing code documentation datasets are constructed through large scale scraping…

Software Engineering · Computer Science 2025-12-25 Recep Kaan Karaman , Meftun Akarsu

Large language models excel at generating individual functions or single files of code, yet generating complete repositories from scratch remains a fundamental challenge. This capability is key to building coherent software systems from…

Computation and Language · Computer Science 2026-02-16 Jane Luo , Xin Zhang , Steven Liu , Jie Wu , Jianfeng Liu , Yiming Huang , Yangyu Huang , Chengyu Yin , Ying Xin , Yuefeng Zhan , Hao Sun , Qi Chen , Scarlett Li , Mao Yang

Automated code documentation is essential for modern software development, providing the contextual grounding that both human developers and coding agents rely on to navigate large codebases. Existing repository-level approaches process…

Software Engineering · Computer Science 2026-05-15 Suyoung Bae , Jaehoon Lee , Changkyu Choi , YunSeok Choi , Jee-Hyong Lee

API documentation is crucial for developers to learn and use APIs. However, it is known that many official API documents are obsolete and incomplete. To address this challenge, we propose a new approach called AutoDoc that generates API…

Software Engineering · Computer Science 2026-01-14 Bonan Kou , Zijie Zhou , Muhao Chen , Tianyi Zhang

The ultimate goal of code agents is to solve complex tasks autonomously. Although large language models (LLMs) have made substantial progress in code generation, real-world tasks typically demand full-fledged code repositories rather than…

Software Engineering · Computer Science 2025-08-26 Huacan Wang , Ziyi Ni , Shuo Zhang , Shuo Lu , Sen Hu , Ziyang He , Chen Hu , Jiaye Lin , Yifu Guo , Ronghao Chen , Xin Li , Daxin Jiang , Yuntao Du , Pin Lyu

As code completion task from function-level to repository-level, leveraging contextual information from large-scale codebases becomes a core challenge. However, existing retrieval-augmented generation (RAG) methods typically treat code as…

Software Engineering · Computer Science 2025-12-05 Xinkui Zhao , Rongkai Liu , Yifan Zhang , Chen Zhi , Lufei Zhang , Guanjie Cheng , Yueshen Xu , Shuiguang Deng , Jianwei Yin

Developing document understanding models at enterprise scale requires large, diverse, and well-annotated datasets spanning a wide range of document types. However, collecting such data is prohibitively expensive due to privacy constraints,…

With the rapid development of Large Language Models (LLMs), Retrieval-Augmented Generation (RAG) has become a predominant method in the field of professional knowledge-based question answering. Presently, major foundation model companies…

Artificial Intelligence · Computer Science 2024-01-24 Demiao Lin

A tremendous number of critical database systems lack adequate documentation. Declared primary keys are absent, foreign key constraints have been dropped for performance, column names are cryptic abbreviations, and no entity-relationship…

Databases · Computer Science 2026-03-25 Amith Nagarajan , Thomas Altman

Large Language Models (LLMs) have greatly advanced code auto-completion systems, with a potential for substantial productivity enhancements for developers. However, current benchmarks mainly focus on single-file tasks, leaving an assessment…

Computation and Language · Computer Science 2023-10-05 Tianyang Liu , Canwen Xu , Julian McAuley

Repository-level code generation aims to generate code within the context of a specified repository. Existing approaches typically employ retrieval-augmented generation (RAG) techniques to provide LLMs with relevant contextual information…

Software Engineering · Computer Science 2025-11-04 Yang Liu , Li Zhang , Fang Liu , Zhuohang Wang , Donglin Wei , Zhishuo Yang , Kechi Zhang , Jia Li , Lin Shi
‹ Prev 1 2 3 10 Next ›