English
Related papers

Related papers: Mercury: A Code Efficiency Benchmark for Code Larg…

200 papers

Large language models (LLMs) can often generate functionally correct code, but their ability to produce efficient implementations for performance-critical systems tasks remains limited. Existing code benchmarks mainly emphasize correctness…

Software Engineering · Computer Science 2026-05-18 Huihao Jing , Wenbin Hu , Haochen Shi , Hanyu Yang , Sirui Zhang , Shaojin Chen , Haoran Li , Yangqiu Song

We present Mercury, a new generation of commercial-scale large language models (LLMs) based on diffusion. These models are parameterized via the Transformer architecture and trained to predict multiple tokens in parallel. In this report, we…

While Large Language Models (LLMs) have substantially improved the functional correctness of code translation, the critical dimension of \textit{execution efficiency} remains overlooked. We present \textbf{\textsc{trace}}, the first…

Software Engineering · Computer Science 2026-04-15 Zhihao Gong , Zeyu Sun , Dong Huang , Qingyuan Liang , Jie M. Zhang , Dan Hao

While Large Language Models (LLMs) have substantially improved the functional correctness of code translation, the critical dimension of \textit{execution efficiency} remains overlooked. We present \textbf{\textsc{trace}}, the first…

Software Engineering · Computer Science 2026-03-20 Zhihao Gong , Zeyu Sun , Dong Huang , Qingyuan Liang , Jie M. Zhang , Dan Hao

Large Language Models (LLMs), particularly Code LLMs, have demonstrated impressive performance in code generation. Current research primarily focuses on the correctness of generated code, while efficiency remains less explored. Recent works…

Software Engineering · Computer Science 2025-02-27 Tong Ye , Weigang Huang , Xuhong Zhang , Tengfei Ma , Peiyu Liu , Jianwei Yin , Wenhai Wang

In recent years, researchers have proposed numerous benchmarks to evaluate the impressive coding capabilities of large language models (LLMs). However, current benchmarks primarily assess the accuracy of LLM-generated code, while neglecting…

Software Engineering · Computer Science 2024-10-10 Jiasheng Zheng , Boxi Cao , Zhengzhao Ma , Ruotong Pan , Hongyu Lin , Yaojie Lu , Xianpei Han , Le Sun

Large Language Models (LLMs) are increasingly used to automate hardware design tasks, including the generation of Verilog code. While early benchmarks focus primarily on functional correctness, efficient hardware design demands additional…

Computation and Language · Computer Science 2025-10-17 Manar Abdelatty , Maryam Nouh , Jacob K. Rosenstein , Sherief Reda

Although large language models (LLMs) have been largely successful in generating functionally correct programs, conditioning models to produce efficient solutions while ensuring correctness remains a challenge. Further, unreliability in…

Computation and Language · Computer Science 2024-10-11 Siddhant Waghjale , Vishruth Veerendranath , Zora Zhiruo Wang , Daniel Fried

Code often suffers from performance bugs. These bugs necessitate the research and practice of code optimization. Traditional rule-based methods rely on manually designing and maintaining rules for specific performance bugs (e.g., redundant…

Software Engineering · Computer Science 2025-12-30 Yue Wu , Minghao Han , Ruiyin Li , Peng Liang , Amjed Tahir , Zengyang Li , Qiong Feng , Mojtaba Shahin

Existing code generation benchmarks primarily evaluate functional correctness, with limited focus on code efficiency and often restricted to a single language like Python. To address this gap, we introduce EffiBench-X, the first…

Computation and Language · Computer Science 2025-05-20 Yuhao Qing , Boyu Zhu , Mingzhe Du , Zhijiang Guo , Terry Yue Zhuo , Qianru Zhang , Jie M. Zhang , Heming Cui , Siu-Ming Yiu , Dong Huang , See-Kiong Ng , Luu Anh Tuan

Recent years have seen the remarkable capabilities of large language models (LLMs) for code generation. Different from existing work that evaluate the correctness of the code generated by LLMs, we propose to further evaluate its efficiency.…

Software Engineering · Computer Science 2024-04-10 Changan Niu , Ting Zhang , Chuanyi Li , Bin Luo , Vincent Ng

Large language models (LLMs) have demonstrated strong performance on function-level code generation benchmarks, yet real-world software development increasingly demands class-level implementations that integrate multiple methods,…

Software Engineering · Computer Science 2025-11-06 Musfiqur Rahman , SayedHassan Khatoonabadi , Emad Shihab

Code generation has largely improved development efficiency in the era of large language models (LLMs). With the ability to follow instructions, current LLMs can be prompted to generate code solutions given detailed descriptions in natural…

Software Engineering · Computer Science 2025-02-06 Yun Peng , Jun Wan , Yichen Li , Xiaoxue Ren

Large language models (LLMs) have demonstrated unparalleled prowess in mimicking human-like text generation and processing. Among the myriad of applications that benefit from LLMs, automated code generation is increasingly promising. The…

Software Engineering · Computer Science 2023-11-15 Lincoln Murr , Morgan Grainger , David Gao

This study evaluates the efficiency of code generation by Large Language Models (LLMs) and measures their performance against human-crafted solutions using a dataset from Leetcode. We compare 18 LLMs, considering factors such as model…

Software Engineering · Computer Science 2024-08-01 Tristan Coignion , Clément Quinton , Romain Rouvoy

Code review is a cornerstone of software quality assurance, and recent advances in Large Language Models (LLMs) have shown promise in its automation. However, existing benchmarks for LLM-based code review face three major limitations. Lack…

Software Engineering · Computer Science 2026-01-01 Ruida Hu , Xinchen Wang , Xin-Cheng Wen , Zhao Zhang , Bo Jiang , Pengfei Gao , Chao Peng , Cuiyun Gao

Large Language Models (LLMs) have demonstrated significant capability in code generation, but their potential in code efficiency optimization remains underexplored. Previous LLM-based code efficiency optimization approaches exclusively…

Software Engineering · Computer Science 2025-10-22 Xiaoxue Ren , Jun Wan , Yun Peng , Zhongxin Liu , Ming Liang , Dajun Chen , Wei Jiang , Yong Li

This paper provides a comprehensive review of the current methods and metrics used to evaluate the performance of Large Language Models (LLMs) in code generation tasks. With the rapid growth in demand for automated software development,…

Software Engineering · Computer Science 2025-03-05 Liguo Chen , Qi Guo , Hongrui Jia , Zhengran Zeng , Xin Wang , Yijiang Xu , Jian Wu , Yidong Wang , Qing Gao , Jindong Wang , Wei Ye , Shikun Zhang

The emergence of large language models (LLMs) has significantly pushed the frontiers of program synthesis. Advancement of LLM-based program synthesis calls for a thorough evaluation of LLM-generated code. Most evaluation frameworks focus on…

Software Engineering · Computer Science 2025-02-20 Ruizhong Qiu , Weiliang Will Zeng , James Ezick , Christopher Lott , Hanghang Tong

Code coverage is a widely used metric for quantifying the extent to which program elements, such as statements or branches, are executed during testing. Calculating code coverage is resource-intensive, requiring code building and execution…

Software Engineering · Computer Science 2023-07-26 Michele Tufano , Shubham Chandel , Anisha Agarwal , Neel Sundaresan , Colin Clement
‹ Prev 1 2 3 10 Next ›