English
Related papers

Related papers: CONCUR: Benchmarking LLMs for Concurrent Code Gene…

200 papers

Code Large Language Models (CLLMs) have exhibited outstanding performance in program synthesis, attracting the focus of the research community. The evaluation of CLLM's program synthesis capability has generally relied on manually curated…

Software Engineering · Computer Science 2025-05-13 Longtian Wang , Tianlin Li , Xiaofei Xie , Yuhan Zhi , Jian Wang , Chao Shen

Context: Due to the demand for strong algorithmic reasoning, complex logic implementation, and strict adherence to input/output formats and resource constraints, competitive programming generation by large language models (LLMs) is…

Social and Information Networks · Computer Science 2025-07-01 Minnan Wei , Ziming Li , Xiang Chen , Menglin Zheng , Ziyan Qu , Cheng Yu , Siyu Chen , Xiaolin Ju

Large Language Models (LLMs) have demonstrated remarkable capabilities in code generation, capable of tackling complex tasks during inference. However, the extent to which LLMs can be utilized for code checking or debugging through test…

Resolving conflicts from merging different software versions is a challenging task. To reduce the overhead of manual merging, researchers develop various program analysis-based tools which only solve specific types of conflicts and have a…

Software Engineering · Computer Science 2024-09-24 Qingyu Zhang , Liangcai Su , Kai Ye , Chenxiong Qian

Large Language Models (LLMs) have achieved remarkable success in code generation, and the race to improve their performance has become a central focus of AI research. Benchmarks and leaderboards are increasingly popular, offering…

Software Engineering · Computer Science 2025-11-07 Amir Molzam Sharifloo , Maedeh Heydari , Parsa Kazerooni , Daniel Maninger , Mira Mezini

This paper presents insights from evaluating 16 frontier large language models (LLMs) on the WebApp1K benchmark, a test suite designed to assess the ability of LLMs to generate web application code. The results reveal that while all models…

Software Engineering · Computer Science 2024-09-10 Yi Cui

With the growing ubiquity of multi-core architectures, concurrent systems have become essential but increasingly prone to complex issues such as data races and deadlocks. While modern issue-tracking systems facilitate the reporting of such…

Software Engineering · Computer Science 2026-01-26 Shuai Shao , Lu Xiao , Tingting Yu

The increasing development of LLMs in code generation has drawn significant attention among researchers. To enhance LLM-based code generation ability, current efforts are predominantly directed towards collecting high-quality datasets and…

Current benchmarks for coding evaluate language models (LMs) on concrete, well-specified tasks such as fixing specific bugs or writing targeted tests. However, human programmers do not spend all day incessantly addressing isolated tasks.…

Software Engineering · Computer Science 2026-05-14 John Yang , Kilian Lieret , Joyce Yang , Carlos E. Jimenez , Muhtasham Oblokulov , Aryan Siddiqui , Ofir Press , Ludwig Schmidt , Diyi Yang

Large Language Models (LLMs) are gaining popularity among software engineers. A crucial aspect of developing effective code generation LLMs is to evaluate these models using a robust benchmark. Evaluation benchmarks with quality issues can…

Software Engineering · Computer Science 2024-09-05 Mohammed Latif Siddiq , Simantika Dristi , Joy Saha , Joanna C. S. Santos

Large language models (LLMs) have been widely adopted across diverse domains of software engineering, such as code generation, program repair, and vulnerability detection. These applications require understanding beyond surface-level code…

Software Engineering · Computer Science 2026-01-21 Danning Xie , Mingwei Zheng , Xuwei Liu , Jiannan Wang , Chengpeng Wang , Lin Tan , Xiangyu Zhang

Recent advances in Code Large Language Models (CodeLLMs) have primarily focused on open-ended code generation, often overlooking the crucial aspect of code understanding and reasoning. To bridge this gap, we introduce CodeMMLU, a…

Software Engineering · Computer Science 2025-04-10 Dung Nguyen Manh , Thang Phan Chau , Nam Le Hai , Thong T. Doan , Nam V. Nguyen , Quang Pham , Nghi D. Q. Bui

Large Language Models (LLMs) have made significant strides in front-end code generation. However, existing benchmarks exhibit several critical limitations: many tasks are overly simplistic, test cases often lack rigor, and end-to-end…

Software Engineering · Computer Science 2025-06-19 Hongda Zhu , Yiwen Zhang , Bing Zhao , Jingzhe Ding , Siyao Liu , Tong Liu , Dandan Wang , Yanan Liu , Zhaojian Li

Recent advancements in large language models (LLMs) have demonstrated significant progress in math and code reasoning capabilities. However, existing code benchmark are limited in their ability to evaluate the full spectrum of these…

Computation and Language · Computer Science 2026-03-03 Zhexu Wang , Yiping Liu , Yejie Wang , Wenyang He , Bofei Gao , Muxi Diao , Yanxu Chen , Kelin Fu , Flood Sung , Zhilin Yang , Tianyu Liu , Weiran Xu

While code generation has been widely used in various software development scenarios, the quality of the generated code is not guaranteed. This has been a particular concern in the era of large language models (LLMs)- based code generation,…

Software Engineering · Computer Science 2023-10-11 Zhenlan Ji , Pingchuan Ma , Zongjie Li , Shuai Wang

With the significant progress of large reasoning models in complex coding and reasoning tasks, existing benchmarks, like LiveCodeBench and CodeElo, are insufficient to evaluate the coding capabilities of large language models (LLMs) in real…

Computation and Language · Computer Science 2025-06-06 Shiyi Xu , Yiwen Hu , Yingqian Min , Zhipeng Chen , Wayne Xin Zhao , Ji-Rong Wen

Background: The rise of Large Language Models (LLMs) in software development has opened new possibilities for code generation. Despite the widespread use of this technology, it remains unclear how well LLMs generate code solutions in terms…

Software Engineering · Computer Science 2025-08-04 Alfred Santa Molison , Marcia Moraes , Glaucia Melo , Fabio Santos , Wesley K. G. Assuncao

Code review is a cornerstone of software quality assurance, and recent advances in Large Language Models (LLMs) have shown promise in its automation. However, existing benchmarks for LLM-based code review face three major limitations. Lack…

Software Engineering · Computer Science 2026-01-01 Ruida Hu , Xinchen Wang , Xin-Cheng Wen , Zhao Zhang , Bo Jiang , Pengfei Gao , Chao Peng , Cuiyun Gao

Existing code generation benchmarks for Large Language Models (LLMs) such as HumanEval and MBPP are designed to study LLMs' end-to-end performance, where the benchmarks feed a problem description in natural language as input and examine the…

Software Engineering · Computer Science 2025-02-27 Jiarong Wu , Songqiang Chen , Jialun Cao , Hau Ching Lo , Shing-Chi Cheung

With the rapid development of Large Language Models (LLMs), a large number of machine learning models have been developed to assist programming tasks including the generation of program code from natural language input. However, how to…

Artificial Intelligence · Computer Science 2024-06-19 Debalina Ghosh Paul , Hong Zhu , Ian Bayley
‹ Prev 1 2 3 10 Next ›