Related papers: Evaluating LLM-Generated Code: A Benchmark and Dev…

A Survey on Evaluating Large Language Models in Code Generation Tasks

This paper provides a comprehensive review of the current methods and metrics used to evaluate the performance of Large Language Models (LLMs) in code generation tasks. With the rapid growth in demand for automated software development,…

Software Engineering · Computer Science 2025-03-05 Liguo Chen , Qi Guo , Hongrui Jia , Zhengran Zeng , Xin Wang , Yijiang Xu , Jian Wu , Yidong Wang , Qing Gao , Jindong Wang , Wei Ye , Shikun Zhang

Benchmarks and Metrics for Evaluations of Code Generation: A Critical Review

With the rapid development of Large Language Models (LLMs), a large number of machine learning models have been developed to assist programming tasks including the generation of program code from natural language input. However, how to…

Artificial Intelligence · Computer Science 2024-06-19 Debalina Ghosh Paul , Hong Zhu , Ian Bayley

Holistic Evaluation of State-of-the-Art LLMs for Code Generation

This study presents a comprehensive empirical evaluation of six state-of-the-art large language models (LLMs) for code generation, including both general-purpose and code-specialized models. Using a dataset of 944 real-world LeetCode…

Software Engineering · Computer Science 2025-12-23 Le Zhang , Suresh Kothari

LLM4EFFI: Leveraging Large Language Models to Enhance Code Efficiency and Correctness

Large Language Models (LLMs), particularly Code LLMs, have demonstrated impressive performance in code generation. Current research primarily focuses on the correctness of generated code, while efficiency remains less explored. Recent works…

Software Engineering · Computer Science 2025-02-27 Tong Ye , Weigang Huang , Xuhong Zhang , Tengfei Ma , Peiyu Liu , Jianwei Yin , Wenhai Wang

Large Language Models for Code Generation: A Comprehensive Survey of Challenges, Techniques, Evaluation, and Applications

Large Language Models (LLMs) have demonstrated their remarkable capabilities in numerous fields. This survey focuses on how LLMs empower users, regardless of their technical background, to use human languages to automatically generate…

Software Engineering · Computer Science 2025-04-03 Nam Huynh , Beiyu Lin

What's Wrong with Your Code Generated by Large Language Models? An Extensive Study

The increasing development of LLMs in code generation has drawn significant attention among researchers. To enhance LLM-based code generation ability, current efforts are predominantly directed towards collecting high-quality datasets and…

Software Engineering · Computer Science 2025-10-20 Shihan Dou , Haoxiang Jia , Shenxi Wu , Huiyuan Zheng , Muling Wu , Yunbo Tao , Ming Zhang , Mingxu Chai , Jessica Fan , Zhiheng Xi , Rui Zheng , Yueming Wu , Ming Wen , Tao Gui , Qi Zhang , Xipeng Qiu , Xuanjing Huang

Large Language Models for Code Generation: The Practitioners Perspective

Large Language Models (LLMs) have emerged as coding assistants, capable of generating source code from natural language prompts. With the increasing adoption of LLMs in software development, academic research and industry based projects are…

Software Engineering · Computer Science 2025-01-29 Zeeshan Rasheed , Muhammad Waseem , Kai Kristian Kemell , Aakash Ahmad , Malik Abdul Sami , Jussi Rasku , Kari Systä , Pekka Abrahamsson

Sustainable Code Generation Using Large Language Models: A Systematic Literature Review

Large Language Models (LLMs) are widely used in software engineering to generate, complete, translate, and fix code, improving developer productivity. While most research focuses on the energy consumption and carbon emissions of model…

Software Engineering · Computer Science 2026-04-15 Sabiya Banu Masthan Ali , Oussema Kirmani , Aroosa Hameed , Syed Muhammad Danish , Gautam Srivastava

Understanding Defects in Generated Codes by Language Models

This study investigates the reliability of code generation by Large Language Models (LLMs), focusing on identifying and analyzing defects in the generated code. Despite the advanced capabilities of LLMs in automating code generation,…

Software Engineering · Computer Science 2024-08-27 Ali Mohammadi Esfahani , Nafiseh Kahani , Samuel A. Ajila

CodeLMSec Benchmark: Systematically Evaluating and Finding Security Vulnerabilities in Black-Box Code Language Models

Large language models (LLMs) for automatic code generation have achieved breakthroughs in several programming tasks. Their advances in competition-level programming problems have made them an essential pillar of AI-assisted pair…

Cryptography and Security · Computer Science 2023-10-24 Hossein Hajipour , Keno Hassler , Thorsten Holz , Lea Schönherr , Mario Fritz

A Survey on Large Language Models for Code Generation

Large Language Models (LLMs) have garnered remarkable advancements across diverse code-related tasks, known as Code LLMs, particularly in code generation that generates source code with LLM from natural language descriptions. This…

Computation and Language · Computer Science 2025-10-28 Juyong Jiang , Fan Wang , Jiasi Shen , Sungju Kim , Sunghun Kim

VHDL-Eval: A Framework for Evaluating Large Language Models in VHDL Code Generation

With the unprecedented advancements in Large Language Models (LLMs), their application domains have expanded to include code generation tasks across various programming languages. While significant progress has been made in enhancing LLMs…

Software Engineering · Computer Science 2024-06-10 Prashanth Vijayaraghavan , Luyao Shi , Stefano Ambrogio , Charles Mackin , Apoorva Nitsure , David Beymer , Ehsan Degan

Benchmarking Large Language Models for Automated Verilog RTL Code Generation

Automating hardware design could obviate a significant amount of human error from the engineering process and lead to fewer errors. Verilog is a popular hardware description language to model and design digital systems, thus generating…

Programming Languages · Computer Science 2022-12-22 Shailja Thakur , Baleegh Ahmad , Zhenxing Fan , Hammond Pearce , Benjamin Tan , Ramesh Karri , Brendan Dolan-Gavitt , Siddharth Garg

PerfCodeBench: Benchmarking LLMs for System-Level High-Performance Code Optimization

Large language models (LLMs) can often generate functionally correct code, but their ability to produce efficient implementations for performance-critical systems tasks remains limited. Existing code benchmarks mainly emphasize correctness…

Software Engineering · Computer Science 2026-05-18 Huihao Jing , Wenbin Hu , Haochen Shi , Hanyu Yang , Sirui Zhang , Shaojin Chen , Haoran Li , Yangqiu Song

VerilogEval: Evaluating Large Language Models for Verilog Code Generation

The increasing popularity of large language models (LLMs) has paved the way for their application in diverse domains. This paper proposes a benchmarking framework tailored specifically for evaluating LLM performance in the context of…

Machine Learning · Computer Science 2023-12-12 Mingjie Liu , Nathaniel Pinckney , Brucek Khailany , Haoxing Ren

Is Functional Correctness Enough to Evaluate Code Language Models? Exploring Diversity of Generated Codes

Language models (LMs) have exhibited impressive abilities in generating codes from natural language requirements. In this work, we highlight the diversity of code generated by LMs as a critical criterion for evaluating their code generation…

Software Engineering · Computer Science 2024-08-28 Heejae Chon , Seonghyeon Lee , Jinyoung Yeo , Dongha Lee

Can LLMs Generate Reliable Test Case Generators? A Study on Competition-Level Programming Problems

Large Language Models (LLMs) have demonstrated remarkable capabilities in code generation, capable of tackling complex tasks during inference. However, the extent to which LLMs can be utilized for code checking or debugging through test…

Computation and Language · Computer Science 2026-01-15 Yuhan Cao , Zian Chen , Kun Quan , Ziliang Zhang , Yu Wang , Xiaoning Dong , Yeqi Feng , Guanzhong He , Jingcheng Huang , Jianhao Li , Yixuan Tan , Jiafu Tang , Yilin Tang , Junlei Wu , Qianyu Xiao , Can Zheng , Shouchen Zhou , Yuxiang Zhu , Yiming Huang , Tianxing He

Code Evolution Graphs: Understanding Large Language Model Driven Design of Algorithms

Large Language Models (LLMs) have demonstrated great promise in generating code, especially when used inside an evolutionary computation framework to iteratively optimize the generated algorithms. However, in some cases they fail to…

Neural and Evolutionary Computing · Computer Science 2025-03-24 Niki van Stein , Anna V. Kononova , Lars Kotthoff , Thomas Bäck

Designing Empirical Studies on LLM-Based Code Generation: Towards a Reference Framework

The rise of large language models (LLMs) has introduced transformative potential in automated code generation, addressing a wide range of software engineering challenges. However, empirical evaluation of LLM-based code generation lacks…

Software Engineering · Computer Science 2025-10-07 Nathalia Nascimento , Everton Guimaraes , Paulo Alencar

SafeGenBench: A Benchmark Framework for Security Vulnerability Detection in LLM-Generated Code

The code generation capabilities of large language models(LLMs) have emerged as a critical dimension in evaluating their overall performance. However, prior research has largely overlooked the security risks inherent in the generated code.…

Cryptography and Security · Computer Science 2025-06-23 Xinghang Li , Jingzhe Ding , Chao Peng , Bing Zhao , Xiang Gao , Hongwan Gao , Xinchen Gu