Related papers: CPP-UT-Bench: Can LLMs Write Complex Unit Tests in…

Harnessing the Power of LLMs: Automating Unit Test Generation for High-Performance Computing

Unit testing is crucial in software engineering for ensuring quality. However, it's not widely used in parallel and high-performance computing software, particularly scientific applications, due to their smaller, diverse user base and…

Software Engineering · Computer Science 2024-07-09 Rabimba Karanjai , Aftab Hussain , Md Rafiqul Islam Rabin , Lei Xu , Weidong Shi , Mohammad Amin Alipour

Go-UT-Bench: A Fine-Tuning Dataset for LLM-Based Unit Test Generation in Go

Training data imbalance poses a major challenge for code LLMs. Most available data heavily over represents raw opensource code while underrepresenting broader software engineering tasks, especially in low resource languages like Golang. As…

Machine Learning · Computer Science 2025-11-17 Yashshi Pipalani , Hritik Raj , Rajat Ghosh , Vaishnavi Bhargava , Debojyoti Dutta

CITYWALK: Enhancing LLM-Based C++ Unit Test Generation via Project-Dependency Awareness and Language-Specific Knowledge

Unit testing plays a pivotal role in the software development lifecycle, as it ensures code quality. However, writing high-quality unit tests remains a time-consuming task for developers in practice. More recently, the application of large…

Software Engineering · Computer Science 2025-08-12 Yuwei Zhang , Qingyuan Lu , Kai Liu , Wensheng Dou , Jiaxin Zhu , Li Qian , Chunxi Zhang , Zheng Lin , Jun Wei

Creating a Dataset for High-Performance Computing Code Translation using LLMs: A Bridge Between OpenMP Fortran and C++

In this study, we present a novel dataset for training machine learning models translating between OpenMP Fortran and C++ code. To ensure reliability and applicability, the dataset is created from a range of representative open-source…

Software Engineering · Computer Science 2023-09-20 Bin Lei , Caiwen Ding , Le Chen , Pei-Hung Lin , Chunhua Liao

A Systematic Study and Comprehensive Evaluation of ChatGPT on Benchmark Datasets

The development of large language models (LLMs) such as ChatGPT has brought a lot of attention recently. However, their evaluation in the benchmark academic datasets remains under-explored due to the difficulty of evaluating the generative…

Computation and Language · Computer Science 2023-07-07 Md Tahmid Rahman Laskar , M Saiful Bari , Mizanur Rahman , Md Amran Hossen Bhuiyan , Shafiq Joty , Jimmy Xiangji Huang

A Large-scale Empirical Study on Fine-tuning Large Language Models for Unit Testing

Unit testing plays a pivotal role in software development, improving software quality and reliability. However, generating effective test cases manually is time-consuming, prompting interest in unit testing research. Recently, Large…

Software Engineering · Computer Science 2024-12-24 Ye Shang , Quanjun Zhang , Chunrong Fang , Siqi Gu , Jianyi Zhou , Zhenyu Chen

CUDABench: Benchmarking LLMs for Text-to-CUDA Generation

Recent studies have demonstrated the potential of Large Language Models (LLMs) in generating GPU Kernels. Current benchmarks focus on the translation of high-level languages into CUDA, overlooking the more general and challenging task of…

Machine Learning · Computer Science 2026-03-04 Jiace Zhu , Wentao Chen , Qi Fan , Zhixing Ren , Junying Wu , Xing Zhe Chai , Chotiwit Rungrueangwutthinon , Yehan Ma , An Zou

CRQBench: A Benchmark of Code Reasoning Questions

Large Language Models have demonstrated exceptional proficiency on coding tasks, but it is challenging to precisely evaluate their code reasoning ability. Existing benchmarks are insufficient as they are unrealistic and conflate semantic…

Software Engineering · Computer Science 2024-08-19 Elizabeth Dinella , Satish Chandra , Petros Maniatis

Benchmarking LLMs for Unit Test Generation from Real-World Functions

Recently, large language models (LLMs) have shown great promise in automating unit test generation, significantly reducing the manual effort required by developers. To effectively evaluate the capabilities of LLMs in this domain, it is…

Software Engineering · Computer Science 2025-08-04 Dong Huang , Jie M. Zhang , Mark Harman , Qianru Zhang , Mingzhe Du , See-Kiong Ng

CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery

Large language models (LLMs) have demonstrated significant potential in advancing various fields of research and society. However, the current community of LLMs overly focuses on benchmarks for analyzing specific foundational skills (e.g.…

Computation and Language · Computer Science 2025-03-03 Xiaoshuai Song , Muxi Diao , Guanting Dong , Zhengyang Wang , Yujia Fu , Runqi Qiao , Zhexu Wang , Dayuan Fu , Huangxuan Wu , Bin Liang , Weihao Zeng , Yejie Wang , Zhuoma GongQue , Jianing Yu , Qiuna Tan , Weiran Xu

DebugBench: Evaluating Debugging Capability of Large Language Models

Large Language Models (LLMs) have demonstrated exceptional coding capability. However, as another critical component of programming proficiency, the debugging capability of LLMs remains relatively unexplored. Previous evaluations of LLMs'…

Software Engineering · Computer Science 2024-06-07 Runchu Tian , Yining Ye , Yujia Qin , Xin Cong , Yankai Lin , Yinxu Pan , Yesai Wu , Haotian Hui , Weichuan Liu , Zhiyuan Liu , Maosong Sun

Evaluating Large Language Models for the Generation of Unit Tests with Equivalence Partitions and Boundary Values

The design and implementation of unit tests is a complex task many programmers neglect. This research evaluates the potential of Large Language Models (LLMs) in automatically generating test cases, comparing them with manual tests. An…

Software Engineering · Computer Science 2025-05-16 Martín Rodríguez , Gustavo Rossi , Alejandro Fernandez

A Comparative Study of Code Generation using ChatGPT 3.5 across 10 Programming Languages

Large Language Models (LLMs) are advanced Artificial Intelligence (AI) systems that have undergone extensive training using large datasets in order to understand and produce language that closely resembles that of humans. These models have…

Software Engineering · Computer Science 2023-08-10 Alessio Buscemi

FullStack Bench: Evaluating LLMs as Full Stack Coders

As the capabilities of code large language models (LLMs) continue to expand, their applications across diverse code intelligence domains are rapidly increasing. However, most existing datasets only evaluate limited application domains. To…

Artificial Intelligence · Computer Science 2025-05-13 Bytedance-Seed-Foundation-Code-Team , : , Yao Cheng , Jianfeng Chen , Jie Chen , Li Chen , Liyu Chen , Wentao Chen , Zhengyu Chen , Shijie Geng , Aoyan Li , Bo Li , Bowen Li , Linyi Li , Boyi Liu , Jiaheng Liu , Kaibo Liu , Qi Liu , Shukai Liu , Siyao Liu , Tianyi Liu , Tingkai Liu , Yongfei Liu , Rui Long , Jing Mai , Guanghan Ning , Z. Y. Peng , Kai Shen , Jiahao Su , Jing Su , Tao Sun , Yifan Sun , Yunzhe Tao , Guoyin Wang , Siwei Wang , Xuwu Wang , Yite Wang , Zihan Wang , Jinxiang Xia , Liang Xiang , Xia Xiao , Yongsheng Xiao , Chenguang Xi , Shulin Xin , Jingjing Xu , Shikun Xu , Hongxia Yang , Jack Yang , Yingxiang Yang , Jianbo Yuan , Jun Zhang , Yufeng Zhang , Yuyu Zhang , Shen Zheng , He Zhu , Ming Zhu

LLM-CSEC: Empirical Evaluation of Security in C/C++ Code Generated by Large Language Models

The security of code generated by large language models (LLMs) is a significant concern, as studies indicate that such code often contains vulnerabilities and lacks essential defensive programming constructs. This work focuses on examining…

Artificial Intelligence · Computer Science 2025-11-25 Muhammad Usman Shahid , Chuadhry Mujeeb Ahmed , Rajiv Ranjan

DSCodeBench: A Realistic Benchmark for Data Science Code Generation

We introduce DSCodeBench, a new benchmark designed to evaluate large language models (LLMs) on complicated and realistic data science code generation tasks. DSCodeBench consists of 1,000 carefully constructed problems sourced from realistic…

Software Engineering · Computer Science 2025-11-18 Shuyin Ouyang , Dong Huang , Jingwen Guo , Zeyu Sun , Qihao Zhu , Jie M. Zhang

TestBench: Evaluating Class-Level Test Case Generation Capability of Large Language Models

Software testing is a crucial phase in the software life cycle, helping identify potential risks and reduce maintenance costs. With the advancement of Large Language Models (LLMs), researchers have proposed an increasing number of LLM-based…

Software Engineering · Computer Science 2024-09-27 Quanjun Zhang , Ye Shang , Chunrong Fang , Siqi Gu , Jianyi Zhou , Zhenyu Chen

ChatUniTest: A Framework for LLM-Based Test Generation

Unit testing is an essential yet frequently arduous task. Various automated unit test generation tools have been introduced to mitigate this challenge. Notably, methods based on large language models (LLMs) have garnered considerable…

Software Engineering · Computer Science 2024-05-08 Yinghao Chen , Zehao Hu , Chen Zhi , Junxiao Han , Shuiguang Deng , Jianwei Yin

UniTSyn: A Large-Scale Dataset Capable of Enhancing the Prowess of Large Language Models for Program Testing

The remarkable capability of large language models (LLMs) in generating high-quality code has drawn increasing attention in the software testing community. However, existing code LLMs often demonstrate unsatisfactory capabilities in…

Software Engineering · Computer Science 2024-02-07 Yifeng He , Jiabo Huang , Yuyang Rong , Yiwen Guo , Ethan Wang , Hao Chen

Beyond Code Pairs: Dialogue-Based Data Generation for LLM Code Translation

Large language models (LLMs) have shown remarkable capabilities in code translation, yet their performance deteriorates in low-resource programming domains such as Fortran and emerging frameworks like CUDA, where high-quality parallel data…

Programming Languages · Computer Science 2025-12-04 Le Chen , Nuo Xu , Winson Chen , Bin Lei , Pei-Hung Lin , Dunzhi Zhou , Rajeev Thakur , Caiwen Ding , Ali Jannesari , Chunhua Liao