Related papers: Evaluating Non-English Developer Support in Machin…

A Qualitative Investigation into LLM-Generated Multilingual Code Comments and Automatic Evaluation Metrics

Large Language Models are essential coding assistants, yet their training is predominantly English-centric. In this study, we evaluate the performance of code language models in non-English contexts, identifying challenges in their adoption…

Software Engineering · Computer Science 2025-05-22 Jonathan Katzy , Yongcheng Huang , Gopal-Raj Panchu , Maksym Ziemlewski , Paris Loizides , Sander Vermeulen , Arie van Deursen , Maliheh Izadi

Benchmarks and Metrics for Evaluations of Code Generation: A Critical Review

With the rapid development of Large Language Models (LLMs), a large number of machine learning models have been developed to assist programming tasks including the generation of program code from natural language input. However, how to…

Artificial Intelligence · Computer Science 2024-06-19 Debalina Ghosh Paul , Hong Zhu , Ian Bayley

On the Quality of AI-Generated Source Code Comments: A Comprehensive Evaluation

This paper investigates the quality of source code comments automatically generated by Large Language Models (LLMs). While AI-based comment generation has emerged as a promising solution to reduce developers' documentation effort, prior…

Software Engineering · Computer Science 2025-12-02 Ian Guelman , Arthur Gregório Leal , Laerte Xavier , Marco Tulio Valente

A Deep Dive Into Large Language Model Code Generation Mistakes: What and Why?

Recent advancements in Large Language Models (LLMs) have led to their widespread application in automated code generation. However, these models can still generate defective code that deviates from the specification. Previous research has…

Software Engineering · Computer Science 2025-03-21 QiHong Chen , Jiachen Yu , Jiawei Li , Jiecheng Deng , Justin Tian Jin Chen , Iftekhar Ahmed

Identifying Inaccurate Descriptions in LLM-generated Code Comments via Test Execution

Software comments are critical for human understanding of software, and as such many comment generation techniques have been proposed. However, we find that a systematic evaluation of the factual accuracy of generated comments is rare; only…

Software Engineering · Computer Science 2024-06-24 Sungmin Kang , Louis Milliken , Shin Yoo

Large Language Models are Qualified Benchmark Builders: Rebuilding Pre-Training Datasets for Advancing Code Intelligence Tasks

Pre-trained code models rely heavily on high-quality pre-training data, particularly human-written reference comments that bridge code and natural language. However, these comments often become outdated as software evolves, degrading model…

Software Engineering · Computer Science 2025-04-29 Kang Yang , Xinjun Mao , Shangwen Wang , Yanlin Wang , Tanghaoran Zhang , Bo Lin , Yihao Qin , Zhang Zhang , Yao Lu , Kamal Al-Sabahi

LLM-as-a-qualitative-judge: automating error analysis in natural language generation

Prompting large language models (LLMs) to evaluate generated text, known as LLM-as-a-judge, has become a standard evaluation approach in natural language generation (NLG), but is primarily used as a quantitative tool, i.e. with numerical…

Computation and Language · Computer Science 2025-12-22 Nadezhda Chirkova , Tunde Oluwaseyi Ajayi , Seth Aycock , Zain Muhammad Mujahid , Vladana Perlić , Ekaterina Borisova , Markarit Vartampetian

Exploring the Potential of Large Language Models in Fine-Grained Review Comment Classification

Code review is a crucial practice in software development. As code review nowadays is lightweight, various issues can be identified, and sometimes, they can be trivial. Research has investigated automated approaches to classify review…

Software Engineering · Computer Science 2025-08-14 Linh Nguyen , Chunhua Liu , Hong Yi Lin , Patanamon Thongtanunam

Revisiting the Role of Natural Language Code Comments in Code Translation

The advent of large language models (LLMs) has ushered in a new era in automated code translation across programming languages. Since most code-specific LLMs are pretrained on well-commented code from large repositories like GitHub, it is…

Software Engineering · Computer Science 2026-01-26 Monika Gupta , Ajay Meena , Anamitra Roy Choudhury , Vijay Arya , Srikanta Bedathur

What Types of Code Review Comments Do Developers Most Frequently Resolve?

Large language model (LLM)-powered code review automation tools have been introduced to generate code review comments. However, not all generated comments will drive code changes. Understanding what types of generated review comments are…

Software Engineering · Computer Science 2025-10-08 Saul Goldman , Hong Yi Lin , Jirat Pasuksmit , Patanamon Thongtanunam , Kla Tantithamthavorn , Zhe Wang , Ray Zhang , Ali Behnaz , Fan Jiang , Michael Siers , Ryan Jiang , Mike Buller , Minwoo Jeong , Ming Wu

Large Language Models for Code Generation: The Practitioners Perspective

Large Language Models (LLMs) have emerged as coding assistants, capable of generating source code from natural language prompts. With the increasing adoption of LLMs in software development, academic research and industry based projects are…

Software Engineering · Computer Science 2025-01-29 Zeeshan Rasheed , Muhammad Waseem , Kai Kristian Kemell , Aakash Ahmad , Malik Abdul Sami , Jussi Rasku , Kari Systä , Pekka Abrahamsson

Enhancing Code Generation via Bidirectional Comment-Level Mutual Grounding

Large Language Models (LLMs) have demonstrated unprecedented capability in code generation. However, LLM-generated code is still plagued with a wide range of functional errors, especially for complex programming tasks that LLMs have not…

Software Engineering · Computer Science 2025-05-13 Yifeng Di , Tianyi Zhang

Understanding Defects in Generated Codes by Language Models

This study investigates the reliability of code generation by Large Language Models (LLMs), focusing on identifying and analyzing defects in the generated code. Despite the advanced capabilities of LLMs in automating code generation,…

Software Engineering · Computer Science 2024-08-27 Ali Mohammadi Esfahani , Nafiseh Kahani , Samuel A. Ajila

Exploring Multi-Lingual Bias of Large Code Models in Code Generation

Code generation aims to synthesize code and fulfill functional requirements based on natural language (NL) specifications, which can greatly improve development efficiency. In the era of large language models (LLMs), large code models…

Software Engineering · Computer Science 2024-05-01 Chaozheng Wang , Zongjie Li , Cuiyun Gao , Wenxuan Wang , Ting Peng , Hailiang Huang , Yuetang Deng , Shuai Wang , Michael R. Lyu

From Effectiveness to Efficiency: Uncovering Linguistic Bias in Large Language Model-based Code Generation

Large Language Models (LLMs) have demonstrated promising capabilities for code generation. While existing benchmarks evaluate the correctness and efficiency of LLM-generated code, the potential linguistic bias - where code quality varies…

Software Engineering · Computer Science 2025-05-02 Weipeng Jiang , Xuanqi Gao , Juan Zhai , Shiqing Ma , Xiaoyu Zhang , Ziyan Lei , Chao Shen

DeepCRCEval: Revisiting the Evaluation of Code Review Comment Generation

Code review is a vital but demanding aspect of software development, generating significant interest in automating review comments. Traditional evaluation methods for these comments, primarily based on text similarity, face two major…

Software Engineering · Computer Science 2025-01-28 Junyi Lu , Xiaojia Li , Zihan Hua , Lei Yu , Shiqi Cheng , Li Yang , Fengjun Zhang , Chun Zuo

Are Large Language Model-based Evaluators the Solution to Scaling Up Multilingual Evaluation?

Large Language Models (LLMs) excel in various Natural Language Processing (NLP) tasks, yet their evaluation, particularly in languages beyond the top $20$, remains inadequate due to existing benchmarks and metrics limitations. Employing…

Computation and Language · Computer Science 2024-02-14 Rishav Hada , Varun Gumma , Adrian de Wynter , Harshita Diddee , Mohamed Ahmed , Monojit Choudhury , Kalika Bali , Sunayana Sitaram

CodeJudge: Evaluating Code Generation with Large Language Models

Large Language Models (LLMs) have shown promising performance in code generation. However, how to reliably evaluate code generated by LLMs remains an unresolved problem. This paper presents CodeJudge, a code evaluation framework that…

Machine Learning · Computer Science 2024-10-04 Weixi Tong , Tianyi Zhang

Exploring Code Analysis: Zero-Shot Insights on Syntax and Semantics with LLMs

Code analysis is fundamental in Software Engineering, supporting debugging, optimization, and security assessment. Human developers approach it through syntax parsing, static semantics inference, and dynamic reasoning. Traditional tools are…

Software Engineering · Computer Science 2026-05-22 Wei Ma , Zhihao Lin , Shangqing Liu , Qiang Hu , Ye Liu , Wenhan Wang , Cen Zhang , Liming Nie , Li Li , Yang Liu , Lingxiao Jiang

A Survey on Evaluating Large Language Models in Code Generation Tasks

This paper provides a comprehensive review of the current methods and metrics used to evaluate the performance of Large Language Models (LLMs) in code generation tasks. With the rapid growth in demand for automated software development,…

Software Engineering · Computer Science 2025-03-05 Liguo Chen , Qi Guo , Hongrui Jia , Zhengran Zeng , Xin Wang , Yijiang Xu , Jian Wu , Yidong Wang , Qing Gao , Jindong Wang , Wei Ye , Shikun Zhang