Related papers: Can Large Language Models Write Parallel Code?

McEval: Massively Multilingual Code Evaluation

Code large language models (LLMs) have shown remarkable advances in code understanding, completion, and generation tasks. Programming benchmarks, comprised of a selection of code challenges and corresponding test cases, serve as a standard…

Programming Languages · Computer Science 2024-06-12 Linzheng Chai , Shukai Liu , Jian Yang , Yuwei Yin , Ke Jin , Jiaheng Liu , Tao Sun , Ge Zhang , Changyu Ren , Hongcheng Guo , Zekun Wang , Boyang Wang , Xianjie Wu , Bing Wang , Tongliang Li , Liqun Yang , Sufeng Duan , Zhoujun Li

L2CEval: Evaluating Language-to-Code Generation Capabilities of Large Language Models

Recently, large language models (LLMs), especially those that are pretrained on code, have demonstrated strong capabilities in generating programs from natural language inputs in a few-shot or even zero-shot manner. Despite promising…

Computation and Language · Computer Science 2023-10-03 Ansong Ni , Pengcheng Yin , Yilun Zhao , Martin Riddell , Troy Feng , Rui Shen , Stephen Yin , Ye Liu , Semih Yavuz , Caiming Xiong , Shafiq Joty , Yingbo Zhou , Dragomir Radev , Arman Cohan

xCodeEval: A Large Scale Multilingual Multitask Benchmark for Code Understanding, Generation, Translation and Retrieval

Recently, pre-trained large language models (LLMs) have shown impressive abilities in generating codes from natural language descriptions, repairing buggy codes, translating codes between languages, and retrieving relevant code segments.…

Computation and Language · Computer Science 2023-11-07 Mohammad Abdullah Matin Khan , M Saiful Bari , Xuan Long Do , Weishi Wang , Md Rizwan Parvez , Shafiq Joty

Unraveling the Potential of Large Language Models in Code Translation: How Far Are We?

While large language models (LLMs) exhibit state-of-the-art performance in various tasks, recent studies have revealed their struggle for code translation. This is because they haven't been extensively pre-trained with parallel multilingual…

Software Engineering · Computer Science 2024-10-15 Qingxiao Tao , Tingrui Yu , Xiaodong Gu , Beijun Shen

Isolating Language-Coding from Problem-Solving: Benchmarking LLMs with PseudoEval

Existing code generation benchmarks for Large Language Models (LLMs) such as HumanEval and MBPP are designed to study LLMs' end-to-end performance, where the benchmarks feed a problem description in natural language as input and examine the…

Software Engineering · Computer Science 2025-02-27 Jiarong Wu , Songqiang Chen , Jialun Cao , Hau Ching Lo , Shing-Chi Cheung

Can Language Models Replace Programmers for Coding? REPOCOD Says 'Not Yet'

Recently, a number of repository-level code generation benchmarks-such as CoderEval, DevEval, RepoEval, RepoBench, and LongCodeArena-have emerged to evaluate the capabilities of large language models (LLMs) beyond standalone benchmarks like…

Software Engineering · Computer Science 2025-06-26 Shanchao Liang , Yiran Hu , Nan Jiang , Lin Tan

From Prompts to Performance: Evaluating LLMs for Task-based Parallel Code Generation

Large Language Models (LLM) show strong abilities in code generation, but their skill in creating efficient parallel programs is less studied. This paper explores how LLMs generate task-based parallel code from three kinds of input prompts:…

Programming Languages · Computer Science 2026-02-27 Linus Bantel , Moritz Strack , Alexander Strack , Dirk Pflüger

MultiPL-E: A Scalable and Extensible Approach to Benchmarking Neural Code Generation

Large language models have demonstrated the ability to generate both natural language and programming language text. Such models open up the possibility of multi-language code generation: could code generation models generalize knowledge…

Machine Learning · Computer Science 2022-12-20 Federico Cassano , John Gouwar , Daniel Nguyen , Sydney Nguyen , Luna Phipps-Costin , Donald Pinckney , Ming-Ho Yee , Yangtian Zi , Carolyn Jane Anderson , Molly Q Feldman , Arjun Guha , Michael Greenberg , Abhinav Jangda

CodeEval: A pedagogical approach for targeted evaluation of code-trained Large Language Models

Large Language Models (LLMs) are predominantly assessed based on their common sense reasoning, language comprehension, and logical reasoning abilities. While models trained in specialized domains like mathematics or coding have demonstrated…

Software Engineering · Computer Science 2026-01-08 Danny Brahman , Mohammad Mahoor

IFEvalCode: Controlled Code Generation

Code large language models (Code LLMs) have made significant progress in code generation by translating natural language descriptions into functional code; however, real-world applications often demand stricter adherence to detailed…

Computation and Language · Computer Science 2025-08-04 Jian Yang , Wei Zhang , Shukai Liu , Linzheng Chai , Yingshui Tan , Jiaheng Liu , Ge Zhang , Wangchunshu Zhou , Guanglin Niu , Zhoujun Li , Binyuan Hui , Junyang Lin

TaskEval: Assessing Difficulty of Code Generation Tasks for Large Language Models

Large Language Models (LLMs) excel in code-related tasks like code generation, but benchmark evaluations often overlook task characteristics, such as difficulty. Moreover, benchmarks are usually built using tasks described with a single…

Software Engineering · Computer Science 2025-10-27 Florian Tambon , Amin Nikanjam , Cyrine Zid , Foutse Khomh , Giuliano Antoniol

TESTEVAL: Benchmarking Large Language Models for Test Case Generation

Testing plays a crucial role in the software development cycle, enabling the detection of bugs, vulnerabilities, and other undesirable behaviors. To perform software testing, testers need to write code snippets that execute the program…

Software Engineering · Computer Science 2025-02-04 Wenhan Wang , Chenyuan Yang , Zhijie Wang , Yuheng Huang , Zhaoyang Chu , Da Song , Lingming Zhang , An Ran Chen , Lei Ma

Prompting Large Language Models to Tackle the Full Software Development Lifecycle: A Case Study

Recent advancements in large language models (LLMs) have significantly enhanced their coding capabilities. However, existing benchmarks predominantly focused on simplified or isolated aspects of coding, such as single-file code generation…

Computation and Language · Computer Science 2024-12-17 Bowen Li , Wenhan Wu , Ziwei Tang , Lin Shi , John Yang , Jinyang Li , Shunyu Yao , Chen Qian , Binyuan Hui , Qicheng Zhang , Zhiyin Yu , He Du , Ping Yang , Dahua Lin , Chao Peng , Kai Chen

HPC-Coder: Modeling Parallel Programs using Large Language Models

Parallel programs in high performance computing (HPC) continue to grow in complexity and scale in the exascale era. The diversity in hardware and parallel programming models make developing, optimizing, and maintaining parallel software…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-05-15 Daniel Nichols , Aniruddha Marathe , Harshitha Menon , Todd Gamblin , Abhinav Bhatele

CoderEval: A Benchmark of Pragmatic Code Generation with Generative Pre-trained Models

Code generation models based on the pre-training and fine-tuning paradigm have been increasingly attempted by both academia and industry, resulting in well-known industrial models such as Codex, CodeGen, and PanGu-Coder. To evaluate the…

Software Engineering · Computer Science 2024-02-26 Hao Yu , Bo Shen , Dezhi Ran , Jiaxin Zhang , Qi Zhang , Yuchi Ma , Guangtai Liang , Ying Li , Qianxiang Wang , Tao Xie

Large Language Models for Code Generation: The Practitioners Perspective

Large Language Models (LLMs) have emerged as coding assistants, capable of generating source code from natural language prompts. With the increasing adoption of LLMs in software development, academic research and industry based projects are…

Software Engineering · Computer Science 2025-01-29 Zeeshan Rasheed , Muhammad Waseem , Kai Kristian Kemell , Aakash Ahmad , Malik Abdul Sami , Jussi Rasku , Kari Systä , Pekka Abrahamsson

HPC-Coder-V2: Studying Code LLMs Across Low-Resource Parallel Languages

Large Language Model (LLM) based coding tools have been tremendously successful as software development assistants, yet they are often designed for general purpose programming tasks and perform poorly for more specialized domains such as…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-12-20 Aman Chaturvedi , Daniel Nichols , Siddharth Singh , Abhinav Bhatele

mHumanEval -- A Multilingual Benchmark to Evaluate Large Language Models for Code Generation

Recent advancements in large language models (LLMs) have significantly enhanced code generation from natural language prompts. The HumanEval Benchmark, developed by OpenAI, remains the most widely used code generation benchmark. However,…

Computation and Language · Computer Science 2025-05-19 Nishat Raihan , Antonios Anastasopoulos , Marcos Zampieri

On Evaluating the Efficiency of Source Code Generated by LLMs

Recent years have seen the remarkable capabilities of large language models (LLMs) for code generation. Different from existing work that evaluate the correctness of the code generated by LLMs, we propose to further evaluate its efficiency.…

Software Engineering · Computer Science 2024-04-10 Changan Niu , Ting Zhang , Chuanyi Li , Bin Luo , Vincent Ng

Large Language Models for Code Generation: A Comprehensive Survey of Challenges, Techniques, Evaluation, and Applications

Large Language Models (LLMs) have demonstrated their remarkable capabilities in numerous fields. This survey focuses on how LLMs empower users, regardless of their technical background, to use human languages to automatically generate…

Software Engineering · Computer Science 2025-04-03 Nam Huynh , Beiyu Lin