English
Related papers

Related papers: Optimising GPGPU Execution Through Runtime Micro-A…

200 papers

Vortex, a newly proposed open-source GPGPU platform based on the RISC-V ISA, offers a valid alternative for GPGPU research over the broadly-used modeling platforms based on commercial GPUs. Similarly to the push originating from the RISC-V…

Hardware Architecture · Computer Science 2025-12-02 Giuseppe M. Sarda , Nimish Shah , Abubakr Nada , Debjyoti Bhattacharjee , Marian Verhelst

Choosing an appropriate programming paradigm for high-performance computing on low-power devices can be useful to speed up calculations. Many Android devices have an integrated GPU and - although not officially supported - the OpenCL…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-12-10 Robert Fritze , Claudia Plant

The analysis of source code through machine learning techniques is an increasingly explored research topic aiming at increasing smartness in the software toolchain to exploit modern architectures in the best possible way. In the case of…

Machine Learning · Computer Science 2020-12-15 Emanuele Parisi , Francesco Barchi , Andrea Bartolini , Giuseppe Tagliavini , Andrea Acquaviva

Graphics Processing Units (GPUs) have become an integral part of High-Performance Computing to achieve an Exascale performance. The main goal of application developers of GPU is to tune their code extensively to obtain optimal performance,…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-05-04 Gargi Alavani , Santonu Sarkar

With high-performance computing systems now running at exascale, optimizing power-scaling management and resource utilization has become more critical than ever. This paper explores runtime power-capping optimizations that leverage…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-06-26 Maria Patrou , Thomas Wang , Wael Elwasif , Markus Eisenbach , Ross Miller , William Godoy , Oscar Hernandez

Performance tools for emerging heterogeneous exascale platforms must address two principal challenges when analyzing execution measurements. First, measurement of large-scale executions may record mountains of performance data. Second,…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-03-11 Jonathon Anderson , Yumeng Liu , John Mellor-Crummey

In high-performance computing, hotspot GPU kernels are primary bottlenecks, and expert manual tuning is costly and hard to port. Large language model methods often assume kernels can be compiled and executed cheaply, which fails in large…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-12-30 Ruifan Chu , Anbang Wang , Xiuxiu Bai , Shuai Liu , Xiaoshe Dong

Measurements of absolute runtime are useful as a summary of performance when studying parallel visualization and analysis methods on computational platforms of increasing concurrency and complexity. We can obtain even more insights by…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-10-07 E. Wes Bethel , David Camp , Talita Perciano , Colleen Heinemann

The ability to model, analyze, and predict execution time of computations is an important building block supporting numerous efforts, such as load balancing, performance optimization, and automated performance tuning for high performance,…

Performance · Computer Science 2020-06-22 James D. Stevens , Andreas Klöckner

As RISC-V architectures proliferate across embedded and high-performance domains, developers face persistent challenges in performance optimization due to fragmented tooling, immature hardware features, and platform-specific defects. This…

Performance · Computer Science 2025-07-31 Alexander Batashev

With heterogeneous systems, the number of GPUs per chip increases to provide computational capabilities for solving science at a nanoscopic scale. However, low utilization for single GPUs defies the need to invest more money for expensive…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-03-11 Tanzima Z. Islam , Aniruddha Marathe , Holland Schutte , Mohammad Zaeed

Graphics processing units (GPUs) excel at parallel processing, but remain largely unexplored in ultra-low-power edge devices (TinyAI) due to their power and area limitations, as well as the lack of suitable programming frameworks. To…

Hardware Architecture · Computer Science 2026-03-17 Simone Machetti , Pasquale Davide Schiavone , Lara Orlandic , Darong Huang , Deniz Kasap , Giovanni Ansaloni , David Atienza

Analyzing large-scale performance logs from GPU profilers often requires terabytes of memory and hours of runtime, even for basic summaries. These constraints prevent timely insight and hinder the integration of performance analytics into…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-06-27 Ankur Lahiry , Ayush Pokharel , Seth Ockerman , Amal Gueroudji , Line Pouchard , Tanzima Z. Islam

GPUs are the most popular platform for accelerating HPC workloads, such as artificial intelligence and science simulations. However, most microarchitectural research in academia relies on GPU core pipeline designs based on architectures…

Hardware Architecture · Computer Science 2025-10-30 Rodrigo Huerta , Mojtaba Abaie Shoushtary , José-Lorenzo Cruz , Antonio González

Despite the high computational throughput of GPUs, limited memory capacity and bandwidth-limited CPU-GPU communication via PCIe links remain significant bottlenecks for accelerating large-scale data analytics workloads. This paper…

Databases · Computer Science 2025-02-14 Yichao Yuan , Advait Iyer , Lin Ma , Nishil Talati

Graphics Processing Units (GPUs) support dynamic voltage and frequency scaling (DVFS) in order to balance computational performance and energy consumption. However, there still lacks simple and accurate performance estimation of a given GPU…

Performance · Computer Science 2018-06-14 Qiang Wang , Xiaowen Chu

Graph analytics techniques based on spectral methods process extremely large sparse matrices with millions or even billions of non-zero values. Behind these algorithms lies the Top-K sparse eigenproblem, the computation of the largest…

Hardware Architecture · Computer Science 2022-01-20 Francesco Sgherzi , Alberto Parravicini , Marco Domenico Santambrogio

This paper wants to focus on providing a characterization of the runtime performances of state-of-the-art implementations of KGE alghoritms, in terms of memory footprint and execution time. Despite the rapidly growing interest in KGE…

Machine Learning · Computer Science 2020-11-10 Angelica Sofia Valeriani

GPU-based HPC clusters are attracting more scientific application developers due to their extensive parallelism and energy efficiency. In order to achieve portability among a variety of multi/many core architectures, a popular choice for an…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-04-10 Ali TehraniJamsaz , Alok Mishra , Akash Dutta , Abid M. Malik , Barbara Chapman , Ali Jannesari

This report focuses on the architecture and performance of the Intelligence Processing Unit (IPU), a novel, massively parallel platform recently introduced by Graphcore and aimed at Artificial Intelligence/Machine Learning (AI/ML)…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-12-10 Zhe Jia , Blake Tillman , Marco Maggioni , Daniele Paolo Scarpazza
‹ Prev 1 2 3 10 Next ›