English
Related papers

Related papers: Integrating Performance Tools in Model Reasoning f…

200 papers

In this paper, we investigate code-integrated reasoning, where models generate code when necessary and integrate feedback by executing it through a code interpreter. To acquire this capability, models must learn when and how to use external…

Computation and Language · Computer Science 2025-06-02 Fei Bai , Yingqian Min , Beichen Zhang , Zhipeng Chen , Wayne Xin Zhao , Lei Fang , Zheng Liu , Zhongyuan Wang , Ji-Rong Wen

Future computing systems, from handhelds to supercomputers, will undoubtedly be more parallel and heterogeneous than todays systems to provide more performance and energy efficiency. Thus, GPUs are increasingly being used to accelerate…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-10-18 Saeed Taheri , Apan Qasem , Martin Burtscher

Code optimization is the process of enhancing code efficiency, while preserving its intended functionality. This process often requires a deep understanding of the code execution behavior at run-time to identify and address inefficiencies…

Software Engineering · Computer Science 2026-04-02 Federico Di Menna , Luca Traini , Gabriele Bavota , Vittorio Cortellessa

GPU kernels have come to the forefront of computing due to their utility in varied fields, from high-performance computing to machine learning. A typical GPU compute kernel is invoked millions, if not billions of times in a typical…

Machine Learning · Computer Science 2024-04-18 Khawir Mahmood , Jehandad Khan , Hammad Afzal

Fine-tuning large language models is a popular choice among users trying to adapt them for specific applications. However, fine-tuning these models is a demanding task because the user has to examine several factors, such as resource…

Machine Learning · Computer Science 2024-06-07 Arjun Singh , Nikhil Pandey , Anup Shirgaonkar , Pavan Manoj , Vijay Aski

In particular, large-scale deep learning and artificial intelligence model training uses a lot of computational power and energy, so it poses serious sustainability issues. The fast rise in model complexity has resulted in exponential…

Hardware Architecture · Computer Science 2025-08-20 Yashasvi Makin , Rahul Maliakkal

Accurate hardware performance models are critical to efficient code generation. They can be used by compilers to make heuristic decisions, by superoptimizers as a minimization objective, or by autotuners to find an optimal configuration for…

Large language models are increasingly trained on corpora containing both natural language and non-linguistic data like source code. Aside from aiding programming-related tasks, anecdotal evidence suggests that including code in pretraining…

Computation and Language · Computer Science 2025-02-26 Jackson Petty , Sjoerd van Steenkiste , Tal Linzen

Recent advancements in large language models (LLMs) have resulted in increasingly anthropomorphic language concerning the ability of LLMs to reason. Whether reasoning in LLMs should be understood to be inherently different is, however,…

Machine Learning · Computer Science 2025-07-28 Bertram Højer , Oliver Jarvis , Stefan Heinrich

Performance modeling, a pivotal domain in program cost analysis, currently relies on manually crafted models constrained by various program and hardware limitations, especially in the intricate landscape of GPGPU. Meanwhile, Large Language…

Performance · Computer Science 2025-03-17 Khoi N. M. Nguyen , Hoang Duy Nguyen Do , Huyen Thao Le , Thanh Tuan Dao

Optimizing scientific software is a difficult task because codebases are often large and complex, and performance can depend upon several factors including the algorithm, its implementation, and hardware among others. Causes of poor…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-04-30 Daniel Nichols , Pranav Polasam , Harshitha Menon , Aniruddha Marathe , Todd Gamblin , Abhinav Bhatele

One of the major research trends currently is the evolution of heterogeneous parallel computing. GP-GPU computing is being widely used and several applications have been designed to exploit the massive parallelism that GP-GPU's have to…

Machine Learning · Computer Science 2014-04-16 Vivek Kulkarni , Rami Al-Rfou' , Bryan Perozzi , Steven Skiena

This work deals with the optimization of computer programs targeting Graphics Processing Units (GPUs). The goal is to lift, from programmers to optimizing compilers, the heavy burden of determining program details that are dependent on the…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-01-16 Xiaohui Chen , Marc Moreno-Maza , Jeeva Paudel , Ning Xie

The energy requirements of current natural language processing models continue to grow at a rapid, unsustainable pace. Recent works highlighting this problem conclude there is an urgent need for methods that reduce the energy needs of NLP…

Computation and Language · Computer Science 2023-05-03 Joseph McDonald , Baolin Li , Nathan Frey , Devesh Tiwari , Vijay Gadepally , Siddharth Samsi

Recent trends in language modeling have focused on increasing performance through scaling, and have resulted in an environment where training language models is out of reach for most researchers and practitioners. While most in the…

Computation and Language · Computer Science 2022-12-29 Jonas Geiping , Tom Goldstein

The progress made in code modeling has been tremendous in recent years thanks to the design of natural language processing learning approaches based on state-of-the-art model architectures. Nevertheless, we believe that the current…

Software Engineering · Computer Science 2022-02-22 Martin Weyssow , Houari Sahraoui , Bang Liu

While GPUs are responsible for training the vast majority of state-of-the-art deep learning models, the implications of their architecture are often overlooked when designing new deep learning (DL) models. As a consequence, modifying a DL…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-02-01 Quentin Anthony , Jacob Hatef , Deepak Narayanan , Stella Biderman , Stas Bekman , Junqi Yin , Aamir Shafi , Hari Subramoni , Dhabaleswar Panda

GPUs have been favored for training deep learning models due to their highly parallelized architecture. As a result, most studies on training optimization focus on GPUs. There is often a trade-off, however, between cost and efficiency when…

Advancements in large language models (LLMs) are showing promising impact in software development and programming assistance. However, these models struggle when operating on low-level backend code. This challenge is exacerbated in the…

Software Engineering · Computer Science 2025-12-23 Muhammad Usman Tariq , Abhinav Jangda , Angelica Moreira , Madan Musuvathi , Tyler Sorensen

Large Language Models (LLMs) show promise for automated code optimization but struggle without performance context. This work introduces Opal, a modular framework that connects performance analytics insights with the vast body of published…

Performance · Computer Science 2025-10-02 Mohammad Zaeed , Tanzima Z. Islam , Vladimir Inđić
‹ Prev 1 2 3 10 Next ›