English
Related papers

Related papers: Code Needs Comments: Enhancing Code LLMs with Comm…

200 papers

Pre-trained code models rely heavily on high-quality pre-training data, particularly human-written reference comments that bridge code and natural language. However, these comments often become outdated as software evolves, degrading model…

Software Engineering · Computer Science 2025-04-29 Kang Yang , Xinjun Mao , Shangwen Wang , Yanlin Wang , Tanghaoran Zhang , Bo Lin , Yihao Qin , Zhang Zhang , Yao Lu , Kamal Al-Sabahi

Generating accurate code review comments remains a significant challenge due to the inherently diverse and non-unique nature of the task output. Large language models pretrained on both programming and natural language data tend to perform…

Software Engineering · Computer Science 2024-11-18 Md. Asif Haider , Ayesha Binte Mostofa , Sk. Sabit Bin Mosaddek , Anindya Iqbal , Toufique Ahmed

The increasing size and complexity of pre-trained language models have demonstrated superior performance in many applications, but they usually require large training datasets to be adequately trained. Insufficient training sets could…

Computation and Language · Computer Science 2025-02-03 Yaping Chai , Haoran Xie , Joe S. Qin

Including code in the pre-training data mixture, even for models not specifically designed for code, has become a common practice in LLMs pre-training. While there has been anecdotal consensus among practitioners that code data plays a…

Computation and Language · Computer Science 2024-08-21 Viraat Aryabumi , Yixuan Su , Raymond Ma , Adrien Morisot , Ivan Zhang , Acyr Locatelli , Marzieh Fadaee , Ahmet Üstün , Sara Hooker

The advent of large language models (LLMs) has ushered in a new era in automated code translation across programming languages. Since most code-specific LLMs are pretrained on well-commented code from large repositories like GitHub, it is…

Software Engineering · Computer Science 2026-01-26 Monika Gupta , Ajay Meena , Anamitra Roy Choudhury , Vijay Arya , Srikanta Bedathur

The advent of Large Language Models (LLMs) has revolutionized various domains of artificial intelligence, including the realm of software engineering. In this research, we evaluate the efficacy of pre-trained LLMs in replicating the tasks…

Software Engineering · Computer Science 2024-06-10 Tajmilur Rahman , Rahul Singh , Mir Yousuf Sultan

Comments are very useful to the flow of code development. With the increasing commonality of code, novice coders have been creating a significant amount of codebases. Due to lack of commenting standards, their comments are often useless,…

Code comment generation aims at generating natural language descriptions for a code snippet to facilitate developers' program comprehension activities. Despite being studied for a long time, a bottleneck for existing approaches is that…

Software Engineering · Computer Science 2023-06-16 Mingyang Geng , Shangwen Wang , Dezun Dong , Haotian Wang , Ge Li , Zhi Jin , Xiaoguang Mao , Xiangke Liao

Developers deal with code-change-related tasks daily, e.g., reviewing code. Pre-trained code and code-change-oriented models have been adapted to help developers with such tasks. Recently, large language models (LLMs) have shown their…

Software Engineering · Computer Science 2024-07-04 Lishui Fan , Jiakun Liu , Zhongxin Liu , David Lo , Xin Xia , Shanping Li

Code review is a crucial practice in software development. As code review nowadays is lightweight, various issues can be identified, and sometimes, they can be trivial. Research has investigated automated approaches to classify review…

Software Engineering · Computer Science 2025-08-14 Linh Nguyen , Chunhua Liu , Hong Yi Lin , Patanamon Thongtanunam

Code review is an important practice in software development, yet it is time-consuming and requires substantial effort. While open-source datasets have been used to train neural models for automating code review tasks, including review…

Software Engineering · Computer Science 2025-02-07 Chunhua Liu , Hong Yi Lin , Patanamon Thongtanunam

Recent advancements in code large language models (Code-LLMs) have demonstrated remarkable capabilities in resolving programming related tasks. Meanwhile, researchers have recognized that the quality of pre-training data is crucial for…

Software Engineering · Computer Science 2026-04-10 Chengli Xing , Zhengran Zeng , Gexiang Fang , Rui Xie , Wei Ye , Shikun Zhang

Large Language Models (LLMs) and pre-trained Language Models (LMs) have achieved impressive success on many software engineering tasks (e.g., code completion and code generation). By leveraging huge existing code corpora (e.g., GitHub),…

Software Engineering · Computer Science 2025-01-16 Xin Yin , Chao Ni , Xiaodan Xu , Xinrui Li , Xiaohu Yang

This paper presents the system submitted by the team from IIT(ISM) Dhanbad in FIRE IRSE 2023 shared task 1 on the automatic usefulness prediction of code-comment pairs as well as the impact of Large Language Model(LLM) generated data on…

Software Engineering · Computer Science 2023-10-24 Tripti Kumari , Chakali Sai Charan , Ayan Das

Code comments serve a crucial role in software development for documenting functionality, clarifying design choices, and assisting with issue tracking. They capture developers' insights about the surrounding source code, serving as an…

Software Engineering · Computer Science 2026-01-28 Thomas Borsani , Andrea Rosani , Giuseppe Di Fatta

Pre-trained language models for code (PLMCs) have gained attention in recent research. These models are pre-trained on large-scale datasets using multi-modal objectives. However, fine-tuning them requires extensive supervision and is…

Computation and Language · Computer Science 2023-05-11 Hung Quoc To , Nghi D. Q. Bui , Jin Guo , Tien N. Nguyen

The advent of Large Language Models (LLMs) has significantly advanced the field of automated code generation. LLMs rely on large and diverse datasets to learn syntax, semantics, and usage patterns of programming languages. For low-resource…

Software Engineering · Computer Science 2025-02-03 Alessandro Giagnorio , Alberto Martin-Lopez , Gabriele Bavota

Pre-trained code models have emerged as crucial tools in various code intelligence tasks. However, their effectiveness depends on the quality of the pre-training dataset, particularly the human reference comments, which serve as a bridge…

Software Engineering · Computer Science 2023-12-27 Kang Yang , Xinjun Mao , Shangwen Wang , Tanghaoran Zhang , Bo Lin , Yanlin Wang , Yihao Qin , Zhang Zhang , Xiaoguang Mao

In code review, generating structured and relevant comments is crucial for identifying code issues and facilitating accurate code changes that ensure an efficient code review process. Well-crafted comments not only streamline the code…

Software Engineering · Computer Science 2025-02-06 Oussama Ben Sghaier , Martin Weyssow , Houari Sahraoui

With the capabilities of understanding and executing natural language instructions, Large language models (LLMs) can potentially act as a powerful tool for textual data augmentation. However, the quality of augmented data depends heavily on…

Computation and Language · Computer Science 2024-04-30 Yichuan Li , Kaize Ding , Jianling Wang , Kyumin Lee
‹ Prev 1 2 3 10 Next ›