English
Related papers

Related papers: FlipFlop: A Static Analysis-based Energy Optimizat…

200 papers

Graphics Processing Units (GPUs) have become an integral part of High-Performance Computing to achieve an Exascale performance. The main goal of application developers of GPU is to tune their code extensively to obtain optimal performance,…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-05-04 Gargi Alavani , Santonu Sarkar

In high-performance computing, hotspot GPU kernels are primary bottlenecks, and expert manual tuning is costly and hard to port. Large language model methods often assume kernels can be compiled and executed cheaply, which fails in large…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-12-30 Ruifan Chu , Anbang Wang , Xiuxiu Bai , Shuai Liu , Xiaoshe Dong

Artificial Intelligence (AI) workloads drive a rapid expansion of high-performance computing (HPC) infrastructures and increase their power and energy demands towards a critical level. AI benchmarks representing state-of-the art workloads…

Performance · Computer Science 2026-03-18 Martin Mayr , Sebastian Wind , Lukas Schröder , Georg Hager , Harald Köstler , Gerhard Wellein

Modern GPU software stacks demand developers who can anticipate performance bottlenecks before ever launching a kernel; misjudging floating-point workloads upstream can derail tuning, scheduling, and even hardware procurement. Yet despite…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-12-05 Gregory Bolet , Giorgis Georgakoudis , Konstantinos Parasyris , Harshitha Menon , Niranjan Hasabnis , Kirk W. Cameron , Gal Oren

Pipelining between data loading and computation is a critical tensor program optimization for GPUs. In order to unleash the high performance of latest GPUs, we must perform a synergetic optimization of multi-stage pipelining across the…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-05-09 Guyue Huang , Yang Bai , Liu Liu , Yuke Wang , Bei Yu , Yufei Ding , Yuan Xie

As AI workloads drive increases in datacenter power consumption, accurate GPU power estimation is critical for proactive power management. However, existing power models face a scalability bottleneck not in the modeling techniques…

Hardware Architecture · Computer Science 2026-04-23 Kyungmi Lee , Zhiye Song , Eun Kyung Lee , Xin Zhang , Tamar Eilam , Anantha P. Chandrakasan

Graph Neural Networks (GNNs) have shown great superiority on non-Euclidean graph data, achieving ground-breaking performance on various graph-related tasks. As a practical solution to train GNN on large graphs with billions of nodes and…

Machine Learning · Computer Science 2024-09-24 Zeyu Zhu , Peisong Wang , Qinghao Hu , Gang Li , Xiaoyao Liang , Jian Cheng

GPUs are widely used to accelerate the training of machine learning workloads. As modern machine learning models become increasingly larger, they require a longer time to train, leading to higher GPU energy consumption. This paper presents…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-01-06 Farui Wang , Weizhe Zhang , Shichao Lai , Meng Hao , Zheng Wang

Modern computing paradigms, such as cloud computing, are increasingly adopting GPUs to boost their computing capabilities primarily due to the heterogeneous nature of AI/ML/deep learning workloads. However, the energy consumption of GPUs is…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-04-29 Shashikant Ilager , Rajeev Muralidhar , Kotagiri Rammohanrao , Rajkumar Buyya

Graphics Processing Units (GPUs) have revolutionized the computing landscape over the past decade. However, the growing energy demands of data centres and computing facilities equipped with GPUs come with significant capital and…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-11-15 Richard Schoonhoven , Bram Veenboer , Ben van Werkhoven , Kees Joost Batenburg

Energy-efficiency is a key concern for neural network applications. To alleviate this issue, hardware acceleration using FPGAs or GPUs can provide better energy-efficiency than general-purpose processors. However, further improvement of the…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-06-29 Seyed Morteza Nabavinejad , Behzad Salami

Performance optimization can be a daunting task especially as the hardware architecture becomes more and more complex. This paper takes a kernel from the Materials Science code BerkeleyGW, and demonstrates a few performance analysis and…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-09-24 Charlene Yang

In recent years, deep neural networks (DNNs), have yielded strong results on a wide range of applications. Graphics Processing Units (GPUs) have been one key enabling factor leading to the current popularity of DNNs. However, despite…

Neural and Evolutionary Computing · Computer Science 2016-11-22 Matthew W. Moskewicz , Ali Jannesari , Kurt Keutzer

Developing efficient GPU kernels can be difficult because of the complexity of GPU architectures and programming models. Existing performance tools only provide coarse-grained suggestions at the kernel level, if any. In this paper, we…

Performance · Computer Science 2020-11-25 Keren Zhou , Xiaozhu Meng , Ryuichi Sai , John Mellor-Crummey

Ubiquity of AI makes optimizing GPU power a priority as large GPU-based clusters are often employed to train and serve AI models. An important first step in optimizing GPU power consumption is high-fidelity and fine-grain power measurement…

Hardware Architecture · Computer Science 2025-04-01 Varsha Singhania , Shaizeen Aga , Mohamed Assem Ibrahim

Recent advances in multi and many-core processors have led to significant improvements in the performance of scientific computing applications. However, the addition of a large number of complex cores have also increased the overall power…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-02-23 Akash Dutta , Jee Choi , Ali Jannesari

Energy efficiency is a first-order concern in AI deployment, as long-running inference can exceed training in cumulative carbon impact. We propose a bio-inspired framework that maps protein-folding energy basins to inference cost landscapes…

Machine Learning · Computer Science 2026-01-09 Mustapha Hamdi , Mourad Jabou

As AI inference becomes mainstream, research has begun to focus on improving the energy consumption of inference servers. Inference kernels commonly underutilize a GPU's compute resources and waste power from idling components. To improve…

Systems and Control · Electrical Eng. & Systems 2025-06-17 Ryan Quach , Yidi Wang , Ali Jahanshahi , Daniel Wong , Hyoseung Kim

Optimizing the performance of GPU kernels is challenging for both human programmers and code generators. For example, CUDA programmers must set thread and block parameters for a kernel, but might not have the intuition to make a good…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-06-30 Robert V. Lim , Boyana Norris , Allen D. Malony

Due to their highly parallel multi-cores architecture, GPUs are being increasingly used in a wide range of computationally intensive applications. Compared to CPUs, GPUs can achieve higher performances at accelerating the programs'…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-10-05 Frédéric Magoulès , Abal-Kassim Cheik Ahamed , Alban Desmaison , Jean-Christophe Léchenet , François Mayer , Haifa Ben Salem , Thomas Zhu
‹ Prev 1 2 3 10 Next ›