English
Related papers

Related papers: QiMeng-Kernel: Macro-Thinking Micro-Coding Paradig…

200 papers

In high-performance computing, hotspot GPU kernels are primary bottlenecks, and expert manual tuning is costly and hard to port. Large language model methods often assume kernels can be compiled and executed cheaply, which fails in large…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-12-30 Ruifan Chu , Anbang Wang , Xiuxiu Bai , Shuai Liu , Xiaoshe Dong

Optimizing GPU kernels for high performance is a complex task, often demanding deep architectural knowledge, extensive profiling, and iterative experimentation. This challenge is amplified when targeting newer or less-documented GPU…

Machine Learning · Computer Science 2025-08-25 Martin Andrews , Sam Witteveen

Thinking Large Language Models (LLMs) generate explicit intermediate reasoning traces before final answers, potentially improving transparency, interpretability, and solution accuracy for code generation. However, the quality of these…

Artificial Intelligence · Computer Science 2025-11-11 Haoran Xue , Gias Uddin , Song Wang

The efficiency of GPU kernels is central to the progress of modern AI, yet optimizing them remains a difficult and labor-intensive task due to complex interactions between memory hierarchies, thread scheduling, and hardware-specific…

Artificial Intelligence · Computer Science 2025-10-21 Juncheng Dong , Yang Yang , Tao Liu , Yang Wang , Feng Qi , Vahid Tarokh , Kaushik Rangadurai , Shuang Yang

Recent research explores optimization using large language models (LLMs) by either iteratively seeking next-step solutions from LLMs or directly prompting LLMs for an optimizer. However, these approaches exhibit inherent limitations,…

Optimization and Control · Mathematics 2024-03-06 Zeyuan Ma , Hongshu Guo , Jiacheng Chen , Guojun Peng , Zhiguang Cao , Yining Ma , Yue-Jiao Gong

Developing efficient GPU kernels is essential for scaling modern AI systems, yet it remains a complex task due to intricate hardware architectures and the need for specialized optimization expertise. Although Large Language Models (LLMs)…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-02-12 Ali Tehrani , Yahya Emara , Essam Wissam , Wojciech Paluch , Waleed Atallah , Łukasz Dudziak , Mohamed S. Abdelfattah

Triton, a high-level Python-like language designed for building efficient GPU kernels, is widely adopted in deep learning frameworks due to its portability, flexibility, and accessibility. However, programming and parallel optimization…

Computation and Language · Computer Science 2025-02-21 Jianling Li , Shangzhan Li , Zhenye Gao , Qi Shi , Yuxuan Li , Zefan Wang , Jiacheng Huang , Haojie Wang , Jianrong Wang , Xu Han , Zhiyuan Liu , Maosong Sun

With the rapid development and widespread application of Large Language Models (LLMs), multidimensional evaluation has become increasingly critical. However, current evaluations are often domain-specific and overly complex, limiting their…

Computation and Language · Computer Science 2025-05-20 Haitao Wu , Zongbo Han , Joey Tianyi Zhou , Huaxi Huang , Changqing Zhang

Large reasoning models (LRMs) have achieved impressive performance in complex tasks, often outperforming conventional large language models (LLMs). However, the prevalent issue of overthinking severely limits their computational efficiency.…

Computation and Language · Computer Science 2025-05-29 Zhiyuan Li , Yi Chang , Yuan Wu

Large language models (LLMs) have transformed the way we think about language understanding and generation, enthralling both researchers and developers. However, deploying LLMs for inference has been a significant challenge due to their…

Machine Learning · Computer Science 2025-01-03 Dibakar Gope , David Mansell , Danny Loh , Ian Bratt

3D Gaussian splatting (3DGS) is a transformative technique with profound implications on novel view synthesis and real-time rendering. Given its importance, there have been many attempts to improve its performance. However, with the…

Hardware Architecture · Computer Science 2025-10-14 Yi Hu , Huiyang Zhou

Large language models (LLMs) can often generate functionally correct code, but their ability to produce efficient implementations for performance-critical systems tasks remains limited. Existing code benchmarks mainly emphasize correctness…

Software Engineering · Computer Science 2026-05-18 Huihao Jing , Wenbin Hu , Haochen Shi , Hanyu Yang , Sirui Zhang , Shaojin Chen , Haoran Li , Yangqiu Song

Efficient GPU kernels are crucial for building performant machine learning architectures, but writing them is a time-consuming challenge that requires significant expertise; therefore, we explore using language models (LMs) to automate…

Machine Learning · Computer Science 2025-02-18 Anne Ouyang , Simon Guo , Simran Arora , Alex L. Zhang , William Hu , Christopher Ré , Azalia Mirhoseini

Improving GPU kernel efficiency is crucial for advancing AI systems. Recent work has explored leveraging large language models (LLMs) for GPU kernel generation and optimization. However, existing LLM-based kernel optimization pipelines…

Machine Learning · Computer Science 2026-03-12 Qitong Sun , Jun Han , Tianlin Li , Zhe Tang , Sheng Chen , Fei Yang , Aishan Liu , Xianglong Liu , Yang Liu

Large language models (LLMs) solve problems more accurately and interpretably when instructed to work out the answer step by step using a ``chain-of-thought'' (CoT) prompt. One can also improve LLMs' performance on a specific task by…

The emergence of large language models (LLMs) has significantly promoted the development of code generation task, sparking a surge in pertinent literature. Current research is hindered by redundant generation results and a tendency to…

Computation and Language · Computer Science 2026-02-10 Tingwei Lu , Yangning Li , Liyuan Wang , Binghuai Lin , Qingsong Lv , Zishan Xu , Hai-Tao Zheng , Yinghui Li , Hong-Gee Kim

Recent advances in code generation have illuminated the potential of employing large language models (LLMs) for general-purpose programming languages such as Python and C++, opening new opportunities for automating software development and…

Machine Learning · Computer Science 2025-03-06 Jiahao Gai , Hao Mark Chen , Zhican Wang , Hongyu Zhou , Wanru Zhao , Nicholas Lane , Hongxiang Fan

Optimizing GPU kernels with LLM agents is an iterative process over a large design space. Every candidate must be generated, compiled, validated, and profiled, so fewer trials will save both runtime and cost. We make two key observations.…

Machine Learning · Computer Science 2026-04-01 Siva Kumar Sastry Hari , Vignesh Balaji , Sana Damani , Qijing Huang , Christos Kozyrakis

Large Language Models (LLMs) have demonstrated strong capabilities in general-purpose code generation. However, generating the code which is deeply hardware-specific, architecture-aware, and performance-critical, especially for massively…

Machine Learning · Computer Science 2025-06-12 Wentao Chen , Jiace Zhu , Qi Fan , Yehan Ma , An Zou

Code generation is crucial in software engineering for automating the coding process efficiently. While test-time computation methods show promise, they suffer from high latency due to multiple computation rounds. To overcome this, we…

Software Engineering · Computer Science 2025-05-28 Xiaoqing Zhang , Yuhan Liu , Flood Sung , Xiuying Chen , Shuo Shang , Rui Yan
‹ Prev 1 2 3 10 Next ›