Related papers: A Preliminary Study of Multilingual Code Language …

Unraveling the Potential of Large Language Models in Code Translation: How Far Are We?

While large language models (LLMs) exhibit state-of-the-art performance in various tasks, recent studies have revealed their struggle for code translation. This is because they haven't been extensively pre-trained with parallel multilingual…

Software Engineering · Computer Science 2024-10-15 Qingxiao Tao , Tingrui Yu , Xiaodong Gu , Beijun Shen

A Survey on Evaluating Large Language Models in Code Generation Tasks

This paper provides a comprehensive review of the current methods and metrics used to evaluate the performance of Large Language Models (LLMs) in code generation tasks. With the rapid growth in demand for automated software development,…

Software Engineering · Computer Science 2025-03-05 Liguo Chen , Qi Guo , Hongrui Jia , Zhengran Zeng , Xin Wang , Yijiang Xu , Jian Wu , Yidong Wang , Qing Gao , Jindong Wang , Wei Ye , Shikun Zhang

CodeAlignBench: Assessing Code Generation Models on Developer-Preferred Code Adjustments

As large language models become increasingly capable of generating code, evaluating their performance remains a complex and evolving challenge. Existing benchmarks primarily focus on functional correctness, overlooking the diversity of…

Software Engineering · Computer Science 2025-11-03 Forough Mehralian , Ryan Shar , James R. Rae , Alireza Hashemi

McEval: Massively Multilingual Code Evaluation

Code large language models (LLMs) have shown remarkable advances in code understanding, completion, and generation tasks. Programming benchmarks, comprised of a selection of code challenges and corresponding test cases, serve as a standard…

Programming Languages · Computer Science 2024-06-12 Linzheng Chai , Shukai Liu , Jian Yang , Yuwei Yin , Ke Jin , Jiaheng Liu , Tao Sun , Ge Zhang , Changyu Ren , Hongcheng Guo , Zekun Wang , Boyang Wang , Xianjie Wu , Bing Wang , Tongliang Li , Liqun Yang , Sufeng Duan , Zhoujun Li

mHumanEval -- A Multilingual Benchmark to Evaluate Large Language Models for Code Generation

Recent advancements in large language models (LLMs) have significantly enhanced code generation from natural language prompts. The HumanEval Benchmark, developed by OpenAI, remains the most widely used code generation benchmark. However,…

Computation and Language · Computer Science 2025-05-19 Nishat Raihan , Antonios Anastasopoulos , Marcos Zampieri

xCodeEval: A Large Scale Multilingual Multitask Benchmark for Code Understanding, Generation, Translation and Retrieval

Recently, pre-trained large language models (LLMs) have shown impressive abilities in generating codes from natural language descriptions, repairing buggy codes, translating codes between languages, and retrieving relevant code segments.…

Computation and Language · Computer Science 2023-11-07 Mohammad Abdullah Matin Khan , M Saiful Bari , Xuan Long Do , Weishi Wang , Md Rizwan Parvez , Shafiq Joty

Assessing Small Language Models for Code Generation: An Empirical Study with Benchmarks

The recent advancements of Small Language Models (SLMs) have opened new possibilities for efficient code generation. SLMs offer lightweight and cost-effective alternatives to Large Language Models (LLMs), making them attractive for use in…

Software Engineering · Computer Science 2026-01-21 Md Mahade Hasan , Muhammad Waseem , Kai-Kristian Kemell , Jussi Rasku , Juha Ala-Rantala , Pekka Abrahamsson

CodeApex: A Bilingual Programming Evaluation Benchmark for Large Language Models

With the emergence of Large Language Models (LLMs), there has been a significant improvement in the programming capabilities of models, attracting growing attention from researchers. Evaluating the programming capabilities of LLMs is…

Computation and Language · Computer Science 2024-03-12 Lingyue Fu , Huacan Chai , Shuang Luo , Kounianhua Du , Weiming Zhang , Longteng Fan , Jiayi Lei , Renting Rui , Jianghao Lin , Yuchen Fang , Yifan Liu , Jingkuan Wang , Siyuan Qi , Kangning Zhang , Weinan Zhang , Yong Yu

Large Language Models for Code Generation: A Comprehensive Survey of Challenges, Techniques, Evaluation, and Applications

Large Language Models (LLMs) have demonstrated their remarkable capabilities in numerous fields. This survey focuses on how LLMs empower users, regardless of their technical background, to use human languages to automatically generate…

Software Engineering · Computer Science 2025-04-03 Nam Huynh , Beiyu Lin

Exploring Multi-Lingual Bias of Large Code Models in Code Generation

Code generation aims to synthesize code and fulfill functional requirements based on natural language (NL) specifications, which can greatly improve development efficiency. In the era of large language models (LLMs), large code models…

Software Engineering · Computer Science 2024-05-01 Chaozheng Wang , Zongjie Li , Cuiyun Gao , Wenxuan Wang , Ting Peng , Hailiang Huang , Yuetang Deng , Shuai Wang , Michael R. Lyu

CodeScope: An Execution-based Multilingual Multitask Multidimensional Benchmark for Evaluating LLMs on Code Understanding and Generation

Large Language Models (LLMs) have demonstrated remarkable performance on assisting humans in programming and facilitating programming automation. However, existing benchmarks for evaluating the code understanding and generation capacities…

Computation and Language · Computer Science 2024-06-10 Weixiang Yan , Haitian Liu , Yunkun Wang , Yunzhe Li , Qian Chen , Wen Wang , Tingyu Lin , Weishan Zhao , Li Zhu , Hari Sundaram , Shuiguang Deng

CrossPL: Evaluating Large Language Models on Cross Programming Language Code Generation

As large language models (LLMs) become increasingly embedded in software engineering workflows, a critical capability remains underexplored: generating correct code that enables cross-programming-language (CPL) interoperability. This skill…

Software Engineering · Computer Science 2025-07-29 Zhanhang Xiong , Dongxia Wang , Yuekang Li , Xinyuan An , Wenhai Wang

Benchmarks and Metrics for Evaluations of Code Generation: A Critical Review

With the rapid development of Large Language Models (LLMs), a large number of machine learning models have been developed to assist programming tasks including the generation of program code from natural language input. However, how to…

Artificial Intelligence · Computer Science 2024-06-19 Debalina Ghosh Paul , Hong Zhu , Ian Bayley

PerfCodeBench: Benchmarking LLMs for System-Level High-Performance Code Optimization

Large language models (LLMs) can often generate functionally correct code, but their ability to produce efficient implementations for performance-critical systems tasks remains limited. Existing code benchmarks mainly emphasize correctness…

Software Engineering · Computer Science 2026-05-18 Huihao Jing , Wenbin Hu , Haochen Shi , Hanyu Yang , Sirui Zhang , Shaojin Chen , Haoran Li , Yangqiu Song

CodeMixBench: Evaluating Large Language Models on Code Generation with Code-Mixed Prompts

Large Language Models (LLMs) have achieved remarkable success in code generation tasks, powering various applications like code completion, debugging, and programming assistance. However, existing benchmarks such as HumanEval, MBPP, and…

Machine Learning · Computer Science 2025-05-09 Manik Sheokand , Parth Sawant

IFEvalCode: Controlled Code Generation

Code large language models (Code LLMs) have made significant progress in code generation by translating natural language descriptions into functional code; however, real-world applications often demand stricter adherence to detailed…

Computation and Language · Computer Science 2025-08-04 Jian Yang , Wei Zhang , Shukai Liu , Linzheng Chai , Yingshui Tan , Jiaheng Liu , Ge Zhang , Wangchunshu Zhou , Guanglin Niu , Zhoujun Li , Binyuan Hui , Junyang Lin

Can Language Models Replace Programmers for Coding? REPOCOD Says 'Not Yet'

Recently, a number of repository-level code generation benchmarks-such as CoderEval, DevEval, RepoEval, RepoBench, and LongCodeArena-have emerged to evaluate the capabilities of large language models (LLMs) beyond standalone benchmarks like…

Software Engineering · Computer Science 2025-06-26 Shanchao Liang , Yiran Hu , Nan Jiang , Lin Tan

ExeCoder: Empowering Large Language Models with Executability Representation for Code Translation

Code translation is a crucial activity in the software development and maintenance process, and researchers have recently begun to focus on using pre-trained large language models (LLMs) for code translation. However, existing LLMs only…

Software Engineering · Computer Science 2025-09-30 Minghua He , Yue Chen , Fangkai Yang , Pu Zhao , Wenjie Yin , Yu Kang , Qingwei Lin , Saravan Rajmohan , Dongmei Zhang

Beyond Translation Accuracy: Addressing False Failures in LLM-Based Code Translation

Large Language Models (LLMs) have achieved remarkable success in automated code translation. While prior work has focused on improving translation accuracy through advanced prompting and iterative repair, the reliability of the underlying…

Software Engineering · Computer Science 2026-05-11 Fazle Rabbi , Soumit Kanti Saha , Jinqiu Yang

ClassEval-T: Evaluating Large Language Models in Class-Level Code Translation

In recent years, Large Language Models (LLMs) have dramatically advanced the performance of automated code translation, making their computational accuracy score reach up to over 80% on many previous benchmarks. However, most code samples…

Software Engineering · Computer Science 2025-04-15 Pengyu Xue , Linhao Wu , Zhen Yang , Chengyi Wang , Xiang Li , Yuxiang Zhang , Jia Li , Ruikai Jin , Yifei Pei , Zhaoyan Shen , Xiran Lyu , Jacky Wai Keung