Related papers: KernelBench: Can LLMs Write Efficient GPU Kernels?

MultiKernelBench: A Multi-Platform Benchmark for Kernel Generation

The automatic generation of deep learning (DL) kernels using large language models (LLMs) has emerged as a promising approach to reduce the manual effort and hardware-specific expertise required for writing high-performance operator…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-07-29 Zhongzhen Wen , Yinghui Zhang , Zhong Li , Zhongxin Liu , Linna Xie , Tian Zhang

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

LLM-based Triton kernel generation has attracted significant interest, yet a fundamental empirical question remains unanswered: where does this capability break down, and why? We present KernelBenchX, a benchmark designed to answer this…

Machine Learning · Computer Science 2026-05-12 Han Wang , Jintao Zhang , Kai Jiang , Haoxu Wang , Jianfei Chen , Jun Zhu

GPU Kernel Optimization Beyond Full Builds: An LLM Framework with Minimal Executable Programs

In high-performance computing, hotspot GPU kernels are primary bottlenecks, and expert manual tuning is costly and hard to port. Large language model methods often assume kernels can be compiled and executed cheaply, which fails in large…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-12-30 Ruifan Chu , Anbang Wang , Xiuxiu Bai , Shuai Liu , Xiaoshe Dong

Counting Without Running: Evaluating LLMs' Reasoning About Code Complexity

Modern GPU software stacks demand developers who can anticipate performance bottlenecks before ever launching a kernel; misjudging floating-point workloads upstream can derail tuning, scheduling, and even hardware procurement. Yet despite…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-12-05 Gregory Bolet , Giorgis Georgakoudis , Konstantinos Parasyris , Harshitha Menon , Niranjan Hasabnis , Kirk W. Cameron , Gal Oren

PerfCodeBench: Benchmarking LLMs for System-Level High-Performance Code Optimization

Large language models (LLMs) can often generate functionally correct code, but their ability to produce efficient implementations for performance-critical systems tasks remains limited. Existing code benchmarks mainly emphasize correctness…

Software Engineering · Computer Science 2026-05-18 Huihao Jing , Wenbin Hu , Haochen Shi , Hanyu Yang , Sirui Zhang , Shaojin Chen , Haoran Li , Yangqiu Song

KernelFoundry: Hardware-aware evolutionary GPU kernel optimization

Optimizing GPU kernels presents a significantly greater challenge for large language models (LLMs) than standard code generation tasks, as it requires understanding hardware architecture, parallel optimization strategies, and performance…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-03-16 Nina Wiedemann , Quentin Leboutet , Michael Paulitsch , Diana Wofk , Benjamin Ummenhofer

KernelSkill: A Multi-Agent Framework for GPU Kernel Optimization

Improving GPU kernel efficiency is crucial for advancing AI systems. Recent work has explored leveraging large language models (LLMs) for GPU kernel generation and optimization. However, existing LLM-based kernel optimization pipelines…

Machine Learning · Computer Science 2026-03-12 Qitong Sun , Jun Han , Tianlin Li , Zhe Tang , Sheng Chen , Fei Yang , Aishan Liu , Xianglong Liu , Yang Liu

WritingBench: A Comprehensive Benchmark for Generative Writing

Recent advancements in large language models (LLMs) have significantly enhanced text generation capabilities, yet evaluating their performance in generative writing remains a challenge. Existing benchmarks primarily focus on generic text…

Artificial Intelligence · Computer Science 2025-12-01 Yuning Wu , Jiahao Mei , Ming Yan , Chenliang Li , Shaopeng Lai , Yuran Ren , Zijia Wang , Ji Zhang , Mengyue Wu , Qin Jin , Fei Huang

STARK: Strategic Team of Agents for Refining Kernels

The efficiency of GPU kernels is central to the progress of modern AI, yet optimizing them remains a difficult and labor-intensive task due to complex interactions between memory hierarchies, thread scheduling, and hardware-specific…

Artificial Intelligence · Computer Science 2025-10-21 Juncheng Dong , Yang Yang , Tao Liu , Yang Wang , Feng Qi , Vahid Tarokh , Kaushik Rangadurai , Shuang Yang

GPU-Accelerated Approximate Kernel Method for Quantum Machine Learning

Conventional kernel-based machine learning models for ab initio potential energy surfaces, while accurate and convenient in small data regimes, suffer immense computational cost as training set sizes increase. We introduce QML-Lightning, a…

Chemical Physics · Physics 2022-12-21 Nicholas J. Browning , Felix A. Faber , O. Anatole von Lilienfeld

TritonBench: Benchmarking Large Language Model Capabilities for Generating Triton Operators

Triton, a high-level Python-like language designed for building efficient GPU kernels, is widely adopted in deep learning frameworks due to its portability, flexibility, and accessibility. However, programming and parallel optimization…

Computation and Language · Computer Science 2025-02-21 Jianling Li , Shangzhan Li , Zhenye Gao , Qi Shi , Yuxuan Li , Zefan Wang , Jiacheng Huang , Haojie Wang , Jianrong Wang , Xu Han , Zhiyuan Liu , Maosong Sun

FastKernels: Benchmarking GPU Kernel Generation in Production

LLM-based agents for GPU kernel generation are advancing rapidly, yet their progress is fundamentally constrained by the benchmarks they optimize against. Existing benchmarks are poorly aligned with production inference frameworks: they…

Machine Learning · Computer Science 2026-05-25 Gabriele Oliaro , Yichao Fu , May Jiang , Owen Lu , Junli Wang , Zhihao Jia , Hao Zhang , Samyam Rajbhandari

ConCuR: Conciseness Makes State-of-the-Art Kernel Generation

GPU kernel generation by LLMs has recently experienced rapid development, leveraging test-time scaling and reinforcement learning techniques. However, a key challenge for kernel generation is the scarcity of high-quality data, as most…

Machine Learning · Computer Science 2025-10-10 Lingcheng Kong , Jiateng Wei , Hanzhang Shen , Huan Wang

ResBench: Benchmarking LLM-Generated FPGA Designs with Resource Awareness

Field-Programmable Gate Arrays (FPGAs) are widely used in modern hardware design, yet writing Hardware Description Language (HDL) code for FPGA implementation remains a complex and time-consuming task. Large Language Models (LLMs) have…

Hardware Architecture · Computer Science 2025-03-25 Ce Guo , Tong Zhao

CUDABench: Benchmarking LLMs for Text-to-CUDA Generation

Recent studies have demonstrated the potential of Large Language Models (LLMs) in generating GPU Kernels. Current benchmarks focus on the translation of high-level languages into CUDA, overlooking the more general and challenging task of…

Machine Learning · Computer Science 2026-03-04 Jiace Zhu , Wentao Chen , Qi Fan , Zhixing Ren , Junying Wu , Xing Zhe Chai , Chotiwit Rungrueangwutthinon , Yehan Ma , An Zou

QuanBench: Benchmarking Quantum Code Generation with Large Language Models

Large language models (LLMs) have demonstrated good performance in general code generation; however, their capabilities in quantum code generation remain insufficiently studied. This paper presents QuanBench, a benchmark for evaluating LLMs…

Software Engineering · Computer Science 2025-10-21 Xiaoyu Guo , Minggu Wang , Jianjun Zhao

RealBench: Benchmarking Verilog Generation Models with Real-World IP Designs

The automatic generation of Verilog code using Large Language Models (LLMs) has garnered significant interest in hardware design automation. However, existing benchmarks for evaluating LLMs in Verilog generation fall short in replicating…

Machine Learning · Computer Science 2025-07-23 Pengwei Jin , Di Huang , Chongxiao Li , Shuyao Cheng , Yang Zhao , Xinyao Zheng , Jiaguo Zhu , Shuyi Xing , Bohan Dou , Rui Zhang , Zidong Du , Qi Guo , Xing Hu

MobileKernelBench: Can LLMs Write Efficient Kernels for Mobile Devices?

Large language models (LLMs) have demonstrated remarkable capabilities in code generation, yet their potential for generating kernels specifically for mobile devices remains largely unexplored. In this work, we extend the scope of automated…

Machine Learning · Computer Science 2026-03-17 Xingze Zou , Jing Wang , Yuhua Zheng , Xueyi Chen , Haolei Bai , Lingcheng Kong , Syed A. R. Abu-Bakar , Zhaode Wang , Chengfei Lv , Haoji Hu , Huan Wang

Making LLMs Optimize Multi-Scenario CUDA Kernels Like Experts

Optimizing GPU kernels manually is a challenging and time-consuming task. With the rapid development of LLMs, automated GPU kernel optimization is gradually becoming a tangible reality. However, current LLM-driven automated optimization…

Machine Learning · Computer Science 2026-03-10 Yuxuan Han , Meng-Hao Guo , Zhengning Liu , Wenguang Chen , Shi-Min Hu

Benchmarking optimization algorithms for auto-tuning GPU kernels

Recent years have witnessed phenomenal growth in the application, and capabilities of Graphical Processing Units (GPUs) due to their high parallel computation power at relatively low cost. However, writing a computationally efficient GPU…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-10-05 Richard Schoonhoven , Ben van Werkhoven , Kees Joost Batenburg