Related papers: Mercury: A Code Efficiency Benchmark for Code Larg…

PerfCodeBench: Benchmarking LLMs for System-Level High-Performance Code Optimization

Large language models (LLMs) can often generate functionally correct code, but their ability to produce efficient implementations for performance-critical systems tasks remains limited. Existing code benchmarks mainly emphasize correctness…

Software Engineering · Computer Science 2026-05-18 Huihao Jing , Wenbin Hu , Haochen Shi , Hanyu Yang , Sirui Zhang , Shaojin Chen , Haoran Li , Yangqiu Song

Mercury: Ultra-Fast Language Models Based on Diffusion

We present Mercury, a new generation of commercial-scale large language models (LLMs) based on diffusion. These models are parameterized via the Transformer architecture and trained to predict multiple tokens in parallel. In this report, we…

Computation and Language · Computer Science 2025-06-24 Inception Labs , Samar Khanna , Siddhant Kharbanda , Shufan Li , Harshit Varma , Eric Wang , Sawyer Birnbaum , Ziyang Luo , Yanis Miraoui , Akash Palrecha , Stefano Ermon , Aditya Grover , Volodymyr Kuleshov

TRACE: Evaluating Execution Efficiency of LLM-Based Code Translation

While Large Language Models (LLMs) have substantially improved the functional correctness of code translation, the critical dimension of \textit{execution efficiency} remains overlooked. We present \textbf{\textsc{trace}}, the first…

Software Engineering · Computer Science 2026-04-15 Zhihao Gong , Zeyu Sun , Dong Huang , Qingyuan Liang , Jie M. Zhang , Dan Hao

TRACE: Evaluating Execution Efficiency of LLM-Based Code Translation

While Large Language Models (LLMs) have substantially improved the functional correctness of code translation, the critical dimension of \textit{execution efficiency} remains overlooked. We present \textbf{\textsc{trace}}, the first…

Software Engineering · Computer Science 2026-03-20 Zhihao Gong , Zeyu Sun , Dong Huang , Qingyuan Liang , Jie M. Zhang , Dan Hao

LLM4EFFI: Leveraging Large Language Models to Enhance Code Efficiency and Correctness

Large Language Models (LLMs), particularly Code LLMs, have demonstrated impressive performance in code generation. Current research primarily focuses on the correctness of generated code, while efficiency remains less explored. Recent works…

Software Engineering · Computer Science 2025-02-27 Tong Ye , Weigang Huang , Xuhong Zhang , Tengfei Ma , Peiyu Liu , Jianwei Yin , Wenhai Wang

Beyond Correctness: Benchmarking Multi-dimensional Code Generation for Large Language Models

In recent years, researchers have proposed numerous benchmarks to evaluate the impressive coding capabilities of large language models (LLMs). However, current benchmarks primarily assess the accuracy of LLM-generated code, while neglecting…

Software Engineering · Computer Science 2024-10-10 Jiasheng Zheng , Boxi Cao , Zhengzhao Ma , Ruotong Pan , Hongyu Lin , Yaojie Lu , Xianpei Han , Le Sun

Pluto: A Benchmark for Evaluating Efficiency of LLM-generated Hardware Code

Large Language Models (LLMs) are increasingly used to automate hardware design tasks, including the generation of Verilog code. While early benchmarks focus primarily on functional correctness, efficient hardware design demands additional…

Computation and Language · Computer Science 2025-10-17 Manar Abdelatty , Maryam Nouh , Jacob K. Rosenstein , Sherief Reda

ECCO: Can We Improve Model-Generated Code Efficiency Without Sacrificing Functional Correctness?

Although large language models (LLMs) have been largely successful in generating functionally correct programs, conditioning models to produce efficient solutions while ensuring correctness remains a challenge. Further, unreliability in…

Computation and Language · Computer Science 2024-10-11 Siddhant Waghjale , Vishruth Veerendranath , Zora Zhiruo Wang , Daniel Fried

FasterPy: An LLM-based Code Execution Efficiency Optimization Framework

Code often suffers from performance bugs. These bugs necessitate the research and practice of code optimization. Traditional rule-based methods rely on manually designing and maintaining rules for specific performance bugs (e.g., redundant…

Software Engineering · Computer Science 2025-12-30 Yue Wu , Minghao Han , Ruiyin Li , Peng Liang , Amjed Tahir , Zengyang Li , Qiong Feng , Mojtaba Shahin

EffiBench-X: A Multi-Language Benchmark for Measuring Efficiency of LLM-Generated Code

Existing code generation benchmarks primarily evaluate functional correctness, with limited focus on code efficiency and often restricted to a single language like Python. To address this gap, we introduce EffiBench-X, the first…

Computation and Language · Computer Science 2025-05-20 Yuhao Qing , Boyu Zhu , Mingzhe Du , Zhijiang Guo , Terry Yue Zhuo , Qianru Zhang , Jie M. Zhang , Heming Cui , Siu-Ming Yiu , Dong Huang , See-Kiong Ng , Luu Anh Tuan

On Evaluating the Efficiency of Source Code Generated by LLMs

Recent years have seen the remarkable capabilities of large language models (LLMs) for code generation. Different from existing work that evaluate the correctness of the code generated by LLMs, we propose to further evaluate its efficiency.…

Software Engineering · Computer Science 2024-04-10 Changan Niu , Ting Zhang , Chuanyi Li , Bin Luo , Vincent Ng

Beyond Synthetic Benchmarks: Evaluating LLM Performance on Real-World Class-Level Code Generation

Large language models (LLMs) have demonstrated strong performance on function-level code generation benchmarks, yet real-world software development increasingly demands class-level implementations that integrate multiple methods,…

Software Engineering · Computer Science 2025-11-06 Musfiqur Rahman , SayedHassan Khatoonabadi , Emad Shihab

COFFE: A Code Efficiency Benchmark for Code Generation

Code generation has largely improved development efficiency in the era of large language models (LLMs). With the ability to follow instructions, current LLMs can be prompted to generate code solutions given detailed descriptions in natural…

Software Engineering · Computer Science 2025-02-06 Yun Peng , Jun Wan , Yichen Li , Xiaoxue Ren

Testing LLMs on Code Generation with Varying Levels of Prompt Specificity

Large language models (LLMs) have demonstrated unparalleled prowess in mimicking human-like text generation and processing. Among the myriad of applications that benefit from LLMs, automated code generation is increasingly promising. The…

Software Engineering · Computer Science 2023-11-15 Lincoln Murr , Morgan Grainger , David Gao

A Performance Study of LLM-Generated Code on Leetcode

This study evaluates the efficiency of code generation by Large Language Models (LLMs) and measures their performance against human-crafted solutions using a dataset from Leetcode. We compare 18 LLMs, considering factors such as model…

Software Engineering · Computer Science 2024-08-01 Tristan Coignion , Clément Quinton , Romain Rouvoy

Benchmarking LLMs for Fine-Grained Code Review with Enriched Context in Practice

Code review is a cornerstone of software quality assurance, and recent advances in Large Language Models (LLMs) have shown promise in its automation. However, existing benchmarks for LLM-based code review face three major limitations. Lack…

Software Engineering · Computer Science 2026-01-01 Ruida Hu , Xinchen Wang , Xin-Cheng Wen , Zhao Zhang , Bo Jiang , Pengfei Gao , Chao Peng , Cuiyun Gao

PEACE: Towards Efficient Project-Level Efficiency Optimization via Hybrid Code Editing

Large Language Models (LLMs) have demonstrated significant capability in code generation, but their potential in code efficiency optimization remains underexplored. Previous LLM-based code efficiency optimization approaches exclusively…

Software Engineering · Computer Science 2025-10-22 Xiaoxue Ren , Jun Wan , Yun Peng , Zhongxin Liu , Ming Liang , Dajun Chen , Wei Jiang , Yong Li

A Survey on Evaluating Large Language Models in Code Generation Tasks

This paper provides a comprehensive review of the current methods and metrics used to evaluate the performance of Large Language Models (LLMs) in code generation tasks. With the rapid growth in demand for automated software development,…

Software Engineering · Computer Science 2025-03-05 Liguo Chen , Qi Guo , Hongrui Jia , Zhengran Zeng , Xin Wang , Yijiang Xu , Jian Wu , Yidong Wang , Qing Gao , Jindong Wang , Wei Ye , Shikun Zhang

How Efficient is LLM-Generated Code? A Rigorous & High-Standard Benchmark

The emergence of large language models (LLMs) has significantly pushed the frontiers of program synthesis. Advancement of LLM-based program synthesis calls for a thorough evaluation of LLM-generated code. Most evaluation frameworks focus on…

Software Engineering · Computer Science 2025-02-20 Ruizhong Qiu , Weiliang Will Zeng , James Ezick , Christopher Lott , Hanghang Tong

Predicting Code Coverage without Execution

Code coverage is a widely used metric for quantifying the extent to which program elements, such as statements or branches, are executed during testing. Calculating code coverage is resource-intensive, requiring code building and execution…

Software Engineering · Computer Science 2023-07-26 Michele Tufano , Shubham Chandel , Anisha Agarwal , Neel Sundaresan , Colin Clement