Related papers: LLM Benchmarking with LLaMA2: Evaluating Code Deve…

A Survey on Evaluating Large Language Models in Code Generation Tasks

This paper provides a comprehensive review of the current methods and metrics used to evaluate the performance of Large Language Models (LLMs) in code generation tasks. With the rapid growth in demand for automated software development,…

Software Engineering · Computer Science 2025-03-05 Liguo Chen , Qi Guo , Hongrui Jia , Zhengran Zeng , Xin Wang , Yijiang Xu , Jian Wu , Yidong Wang , Qing Gao , Jindong Wang , Wei Ye , Shikun Zhang

Large Language Models for Code Generation: A Comprehensive Survey of Challenges, Techniques, Evaluation, and Applications

Large Language Models (LLMs) have demonstrated their remarkable capabilities in numerous fields. This survey focuses on how LLMs empower users, regardless of their technical background, to use human languages to automatically generate…

Software Engineering · Computer Science 2025-04-03 Nam Huynh , Beiyu Lin

Cross-Task Benchmarking and Evaluation of General-Purpose and Code-Specific Large Language Models

Large Language Models (LLMs) have revolutionized both general natural language processing and domain-specific applications such as code synthesis, legal reasoning, and finance. However, while prior studies have explored individual model…

Software Engineering · Computer Science 2025-12-05 Gunjan Das , Paheli Bhattacharya , Rishabh Gupta

Code Generation and Algorithmic Problem Solving Using Llama 3.1 405B

Code generation by Llama 3.1 models, such as Meta's Llama 3.1 405B, represents a significant advancement in the field of artificial intelligence, particularly in natural language processing and programming automation. This paper explores…

Computation and Language · Computer Science 2025-04-03 Aniket Deroy , Subhankar Maity

Performance Evaluation of Large Language Models in Statistical Programming

The programming capabilities of large language models (LLMs) have revolutionized automatic code generation and opened new avenues for automatic statistical analysis. However, the validity and quality of these generated codes need to be…

Applications · Statistics 2025-02-19 Xinyi Song , Kexin Xie , Lina Lee , Ruizhe Chen , Jared M. Clark , Hao He , Haoran He , Jie Min , Xinlei Zhang , Simin Zheng , Zhiyang Zhang , Xinwei Deng , Yili Hong

Assessing the Emergent Symbolic Reasoning Abilities of Llama Large Language Models

Large Language Models (LLMs) achieve impressive performance in a wide range of tasks, even if they are often trained with the only objective of chatting fluently with users. Among other skills, LLMs show emergent abilities in mathematical…

Computation and Language · Computer Science 2024-06-12 Flavio Petruzzellis , Alberto Testolin , Alessandro Sperduti

Analyzing the Performance of Large Language Models on Code Summarization

Large language models (LLMs) such as Llama 2 perform very well on tasks that involve both natural language and source code, particularly code summarization and code generation. We show that for the task of code summarization, the…

Software Engineering · Computer Science 2024-04-15 Rajarshi Haldar , Julia Hockenmaier

BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions

Task automation has been greatly empowered by the recent advances in Large Language Models (LLMs) via Python code, where the tasks ranging from software engineering development to general-purpose reasoning. While current benchmarks have…

Software Engineering · Computer Science 2025-04-02 Terry Yue Zhuo , Minh Chien Vu , Jenny Chim , Han Hu , Wenhao Yu , Ratnadira Widyasari , Imam Nur Bani Yusuf , Haolan Zhan , Junda He , Indraneil Paul , Simon Brunner , Chen Gong , Thong Hoang , Armel Randy Zebaze , Xiaoheng Hong , Wen-Ding Li , Jean Kaddour , Ming Xu , Zhihan Zhang , Prateek Yadav , Naman Jain , Alex Gu , Zhoujun Cheng , Jiawei Liu , Qian Liu , Zijian Wang , Binyuan Hui , Niklas Muennighoff , David Lo , Daniel Fried , Xiaoning Du , Harm de Vries , Leandro Von Werra

Do AI Models Dream of Faster Code? An Empirical Study on LLM-Proposed Performance Improvements in Real-World Software

Large Language Models (LLMs) can generate code, but can they generate fast code for complex, real-world software systems? In this study, we investigate this question using a dataset of 65 tasks mined from performance-critical open-source…

Software Engineering · Computer Science 2026-04-10 Lirong Yi , Gregory Gay , Philipp Leitner

A Survey of Large Language Models for Code: Evolution, Benchmarking, and Future Trends

General large language models (LLMs), represented by ChatGPT, have demonstrated significant potential in tasks such as code generation in software engineering. This has led to the development of specialized LLMs for software engineering,…

Software Engineering · Computer Science 2024-01-09 Zibin Zheng , Kaiwen Ning , Yanlin Wang , Jingwen Zhang , Dewu Zheng , Mingxi Ye , Jiachi Chen

HPC-Coder: Modeling Parallel Programs using Large Language Models

Parallel programs in high performance computing (HPC) continue to grow in complexity and scale in the exascale era. The diversity in hardware and parallel programming models make developing, optimizing, and maintaining parallel software…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-05-15 Daniel Nichols , Aniruddha Marathe , Harshitha Menon , Todd Gamblin , Abhinav Bhatele

Towards an Understanding of Large Language Models in Software Engineering Tasks

Large Language Models (LLMs) have drawn widespread attention and research due to their astounding performance in text generation and reasoning tasks. Derivative products, like ChatGPT, have been extensively deployed and highly sought after.…

Software Engineering · Computer Science 2024-12-11 Zibin Zheng , Kaiwen Ning , Qingyuan Zhong , Jiachi Chen , Wenqing Chen , Lianghong Guo , Weicheng Wang , Yanlin Wang

Large Language Models for Code Analysis: Do LLMs Really Do Their Job?

Large language models (LLMs) have demonstrated significant potential in the realm of natural language understanding and programming code processing tasks. Their capacity to comprehend and generate human-like code has spurred research into…

Software Engineering · Computer Science 2024-03-07 Chongzhou Fang , Ning Miao , Shaurya Srivastav , Jialin Liu , Ruoyu Zhang , Ruijie Fang , Asmita , Ryan Tsang , Najmeh Nazari , Han Wang , Houman Homayoun

HPC-Coder-V2: Studying Code LLMs Across Low-Resource Parallel Languages

Large Language Model (LLM) based coding tools have been tremendously successful as software development assistants, yet they are often designed for general purpose programming tasks and perform poorly for more specialized domains such as…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-12-20 Aman Chaturvedi , Daniel Nichols , Siddharth Singh , Abhinav Bhatele

Comprehensive Evaluation of Large Language Models on Software Engineering Tasks: A Multi-Task Benchmark

Large Language Models (LLMs) have demonstrated remarkable capabilities in software engineering, yet comprehensive benchmarks covering diverse SE activities remain limited. We present a multi-task evaluation of 11 state-of-the-art LLMs…

Software Engineering · Computer Science 2026-02-10 Go Frendi Gunawan , Mukhlis Amien

Exploring Large Language Models for Code Explanation

Automating code documentation through explanatory text can prove highly beneficial in code understanding. Large Language Models (LLMs) have made remarkable strides in Natural Language Processing, especially within software engineering tasks…

Software Engineering · Computer Science 2023-10-26 Paheli Bhattacharya , Manojit Chakraborty , Kartheek N S N Palepu , Vikas Pandey , Ishan Dindorkar , Rakesh Rajpurohit , Rishabh Gupta

Code Generation with Small Language Models: A Codeforces-Based Study

Large Language Models (LLMs) demonstrate capabilities in code generation, potentially boosting developer productivity. However, their adoption remains limited by high computational costs, among other factors. Small Language Models (SLMs)…

Software Engineering · Computer Science 2025-09-23 Débora Souza , Rohit Gheyi , Lucas Albuquerque , Gustavo Soares , Márcio Ribeiro

Evaluating Large Language Models for the Generation of Unit Tests with Equivalence Partitions and Boundary Values

The design and implementation of unit tests is a complex task many programmers neglect. This research evaluates the potential of Large Language Models (LLMs) in automatically generating test cases, comparing them with manual tests. An…

Software Engineering · Computer Science 2025-05-16 Martín Rodríguez , Gustavo Rossi , Alejandro Fernandez

LLMs for Science: Usage for Code Generation and Data Analysis

Large language models (LLMs) have been touted to enable increased productivity in many areas of today's work life. Scientific research as an area of work is no exception: the potential of LLM-based tools to assist in the daily work of…

Software Engineering · Computer Science 2024-04-24 Mohamed Nejjar , Luca Zacharias , Fabian Stiehle , Ingo Weber

Benchmarks and Metrics for Evaluations of Code Generation: A Critical Review

With the rapid development of Large Language Models (LLMs), a large number of machine learning models have been developed to assist programming tasks including the generation of program code from natural language input. However, how to…

Artificial Intelligence · Computer Science 2024-06-19 Debalina Ghosh Paul , Hong Zhu , Ian Bayley