Related papers: QiMeng-Kernel: Macro-Thinking Micro-Coding Paradig…

GPU Kernel Optimization Beyond Full Builds: An LLM Framework with Minimal Executable Programs

In high-performance computing, hotspot GPU kernels are primary bottlenecks, and expert manual tuning is costly and hard to port. Large language model methods often assume kernels can be compiled and executed cheaply, which fails in large…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-12-30 Ruifan Chu , Anbang Wang , Xiuxiu Bai , Shuai Liu , Xiaoshe Dong

GPU Kernel Scientist: An LLM-Driven Framework for Iterative Kernel Optimization

Optimizing GPU kernels for high performance is a complex task, often demanding deep architectural knowledge, extensive profiling, and iterative experimentation. This challenge is amplified when targeting newer or less-documented GPU…

Machine Learning · Computer Science 2025-08-25 Martin Andrews , Sam Witteveen

An Empirical Study of Reasoning Steps in Thinking Code LLMs

Thinking Large Language Models (LLMs) generate explicit intermediate reasoning traces before final answers, potentially improving transparency, interpretability, and solution accuracy for code generation. However, the quality of these…

Artificial Intelligence · Computer Science 2025-11-11 Haoran Xue , Gias Uddin , Song Wang

STARK: Strategic Team of Agents for Refining Kernels

The efficiency of GPU kernels is central to the progress of modern AI, yet optimizing them remains a difficult and labor-intensive task due to complex interactions between memory hierarchies, thread scheduling, and hardware-specific…

Artificial Intelligence · Computer Science 2025-10-21 Juncheng Dong , Yang Yang , Tao Liu , Yang Wang , Feng Qi , Vahid Tarokh , Kaushik Rangadurai , Shuang Yang

LLaMoCo: Instruction Tuning of Large Language Models for Optimization Code Generation

Recent research explores optimization using large language models (LLMs) by either iteratively seeking next-step solutions from LLMs or directly prompting LLMs for an optimizer. However, these approaches exhibit inherent limitations,…

Optimization and Control · Mathematics 2024-03-06 Zeyuan Ma , Hongshu Guo , Jiacheng Chen , Guojun Peng , Zhiguang Cao , Yining Ma , Yue-Jiao Gong

Fine-Tuning GPT-5 for GPU Kernel Generation

Developing efficient GPU kernels is essential for scaling modern AI systems, yet it remains a complex task due to intricate hardware architectures and the need for specialized optimization expertise. Although Large Language Models (LLMs)…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-02-12 Ali Tehrani , Yahya Emara , Essam Wissam , Wojciech Paluch , Waleed Atallah , Łukasz Dudziak , Mohamed S. Abdelfattah

TritonBench: Benchmarking Large Language Model Capabilities for Generating Triton Operators

Triton, a high-level Python-like language designed for building efficient GPU kernels, is widely adopted in deep learning frameworks due to its portability, flexibility, and accessibility. However, programming and parallel optimization…

Computation and Language · Computer Science 2025-02-21 Jianling Li , Shangzhan Li , Zhenye Gao , Qi Shi , Yuxuan Li , Zefan Wang , Jiacheng Huang , Haojie Wang , Jianrong Wang , Xu Han , Zhiyuan Liu , Maosong Sun

Computational Reasoning of Large Language Models

With the rapid development and widespread application of Large Language Models (LLMs), multidimensional evaluation has become increasingly critical. However, current evaluations are often domain-specific and overly complex, limiting their…

Computation and Language · Computer Science 2025-05-20 Haitao Wu , Zongbo Han , Joey Tianyi Zhou , Huaxi Huang , Changqing Zhang

THINK-Bench: Evaluating Thinking Efficiency and Chain-of-Thought Quality of Large Reasoning Models

Large reasoning models (LRMs) have achieved impressive performance in complex tasks, often outperforming conventional large language models (LLMs). However, the prevalent issue of overthinking severely limits their computational efficiency.…

Computation and Language · Computer Science 2025-05-29 Zhiyuan Li , Yi Chang , Yuan Wu

Highly Optimized Kernels and Fine-Grained Codebooks for LLM Inference on Arm CPUs

Large language models (LLMs) have transformed the way we think about language understanding and generation, enthralling both researchers and developers. However, deploying LLMs for inference has been a significant challenge due to their…

Machine Learning · Computer Science 2025-01-03 Dibakar Gope , David Mansell , Danny Loh , Ian Bratt

LLM-Powered Code Analysis and Optimization for Gaussian Splatting Kernels

3D Gaussian splatting (3DGS) is a transformative technique with profound implications on novel view synthesis and real-time rendering. Given its importance, there have been many attempts to improve its performance. However, with the…

Hardware Architecture · Computer Science 2025-10-14 Yi Hu , Huiyang Zhou

PerfCodeBench: Benchmarking LLMs for System-Level High-Performance Code Optimization

Large language models (LLMs) can often generate functionally correct code, but their ability to produce efficient implementations for performance-critical systems tasks remains limited. Existing code benchmarks mainly emphasize correctness…

Software Engineering · Computer Science 2026-05-18 Huihao Jing , Wenbin Hu , Haochen Shi , Hanyu Yang , Sirui Zhang , Shaojin Chen , Haoran Li , Yangqiu Song

KernelBench: Can LLMs Write Efficient GPU Kernels?

Efficient GPU kernels are crucial for building performant machine learning architectures, but writing them is a time-consuming challenge that requires significant expertise; therefore, we explore using language models (LMs) to automate…

Machine Learning · Computer Science 2025-02-18 Anne Ouyang , Simon Guo , Simran Arora , Alex L. Zhang , William Hu , Christopher Ré , Azalia Mirhoseini

KernelSkill: A Multi-Agent Framework for GPU Kernel Optimization

Improving GPU kernel efficiency is crucial for advancing AI systems. Recent work has explored leveraging large language models (LLMs) for GPU kernel generation and optimization. However, existing LLM-based kernel optimization pipelines…

Machine Learning · Computer Science 2026-03-12 Qitong Sun , Jun Han , Tianlin Li , Zhe Tang , Sheng Chen , Fei Yang , Aishan Liu , Xianglong Liu , Yang Liu

Training Chain-of-Thought via Latent-Variable Inference

Large language models (LLMs) solve problems more accurately and interpretably when instructed to work out the answer step by step using a ``chain-of-thought'' (CoT) prompt. One can also improve LLMs' performance on a specific task by…

Machine Learning · Computer Science 2023-12-06 Du Phan , Matthew D. Hoffman , David Dohan , Sholto Douglas , Tuan Anh Le , Aaron Parisi , Pavel Sountsov , Charles Sutton , Sharad Vikram , Rif A. Saurous

From Token to Line: Enhancing Code Generation with a Long-Term Perspective

The emergence of large language models (LLMs) has significantly promoted the development of code generation task, sparking a surge in pertinent literature. Current research is hindered by redundant generation results and a tendency to…

Computation and Language · Computer Science 2026-02-10 Tingwei Lu , Yangning Li , Liyuan Wang , Binghuai Lin , Qingsong Lv , Zishan Xu , Hai-Tao Zheng , Yinghui Li , Hong-Gee Kim

Exploring Code Language Models for Automated HLS-based Hardware Generation: Benchmark, Infrastructure and Analysis

Recent advances in code generation have illuminated the potential of employing large language models (LLMs) for general-purpose programming languages such as Python and C++, opening new opportunities for automating software development and…

Machine Learning · Computer Science 2025-03-06 Jiahao Gai , Hao Mark Chen , Zhican Wang , Hongyu Zhou , Wanru Zhao , Nicholas Lane , Hongxiang Fan

Improving Efficiency of GPU Kernel Optimization Agents using a Domain-Specific Language and Speed-of-Light Guidance

Optimizing GPU kernels with LLM agents is an iterative process over a large design space. Every candidate must be generated, compiled, validated, and profiled, so fewer trials will save both runtime and cost. We make two key observations.…

Machine Learning · Computer Science 2026-04-01 Siva Kumar Sastry Hari , Vignesh Balaji , Sana Damani , Qijing Huang , Christos Kozyrakis

CUDA-LLM: LLMs Can Write Efficient CUDA Kernels

Large Language Models (LLMs) have demonstrated strong capabilities in general-purpose code generation. However, generating the code which is deeply hardware-specific, architecture-aware, and performance-critical, especially for massively…

Machine Learning · Computer Science 2025-06-12 Wentao Chen , Jiace Zhu , Qi Fan , Yehan Ma , An Zou

Thinking Before Running! Efficient Code Generation with Thorough Exploration and Optimal Refinement

Code generation is crucial in software engineering for automating the coding process efficiently. While test-time computation methods show promise, they suffer from high latency due to multiple computation rounds. To overcome this, we…

Software Engineering · Computer Science 2025-05-28 Xiaoqing Zhang , Yuhan Liu , Flood Sung , Xiuying Chen , Shuo Shang , Rui Yan