Related papers: Leveraging Hardware Performance Counters for Predi…

Performance Evaluation of a Next-Generation SX-Aurora TSUBASA Vector Supercomputer

Data movement is a key bottleneck in terms of both performance and energy efficiency in modern HPC systems. The NEC SX-series supercomputers have a long history of accelerating memory-intensive HPC applications by providing sufficient…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-08-15 Keichi Takahashi , Soya Fujimoto , Satoru Nagase , Yoko Isobe , Yoichi Shimomura , Ryusuke Egawa , Hiroyuki Takizawa

Performance Analysis of HPC applications on the Aurora Supercomputer: Exploring the Impact of HBM-Enabled Intel Xeon Max CPUs

The Aurora supercomputer is an exascale-class system designed to tackle some of the most demanding computational workloads. Equipped with both High Bandwidth Memory (HBM) and DDR memory, it provides unique trade-offs in performance,…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-04-07 Huda Ibeid , Vikram Narayana , Jeongnim Kim , Anthony Nguyen , Vitali Morozov , Ye Luo

A HPC Co-Scheduler with Reinforcement Learning

Although High Performance Computing (HPC) users understand basic resource requirements such as the number of CPUs and memory limits, internal infrastructural utilization data is exclusively leveraged by cluster operators, who use it to…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-01-19 Abel Souza , Kristiaan Pelckmans , Johan Tordsson

Predicting System-level Power for a Hybrid Supercomputer

For current High Performance Computing systems to scale towards the holy grail of ExaFLOP performance, their power consumption has to be reduced by at least one order of magnitude. This goal can be achieved only through a combination of…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-06-01 Alina Sîrbu , Ozalp Babaoglu

Work-in-Progress: Real-Time Neural Network Inference on a Custom RISC-V Multicore Vector Processor

Neural networks are increasingly used in real-time systems, such as automated driving applications. This requires high-performance hardware with predictable timing behavior. State-of-the-art real-time hardware is limited in memory and…

Hardware Architecture · Computer Science 2024-10-15 Maximilian Kirschner , Konstantin Dudzik , Jürgen Becker

Energy-Efficiency Prediction of Multithreaded Workloads on Heterogeneous Composite Cores Architectures using Machine Learning Techniques

Heterogeneous architectures have emerged as a promising alternative for homogeneous architectures to improve the energy-efficiency of computer systems. Composite Cores Architecture (CCA), a class of dynamic heterogeneous architectures…

Hardware Architecture · Computer Science 2018-08-07 Hossein Sayadi

Improving virtual host efficiency through resource and interference aware scheduling

Modern Infrastructure-as-a-Service Clouds operate in a competitive environment that caters to any user's requirements for computing resources. The sharing of the various types of resources by diverse applications poses a series of…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-01-28 Evangelos Angelou , Konstantinos Kaffes , Athanasia Asiki , Georgios Goumas , Nectarios Koziris

The MIT Supercloud Workload Classification Challenge

High-Performance Computing (HPC) centers and cloud providers support an increasingly diverse set of applications on heterogenous hardware. As Artificial Intelligence (AI) and Machine Learning (ML) workloads have become an increasingly…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-09-12 Benny J. Tang , Qiqi Chen , Matthew L. Weiss , Nathan Frey , Joseph McDonald , David Bestor , Charles Yee , William Arcand , Chansup Byun , Daniel Edelman , Matthew Hubbell , Michael Jones , Jeremy Kepner , Anna Klein , Adam Michaleas , Peter Michaleas , Lauren Milechin , Julia Mullen , Andrew Prout , Albert Reuther , Antonio Rosa , Andrew Bowne , Lindsey McEvoy , Baolin Li , Devesh Tiwari , Vijay Gadepally , Siddharth Samsi

A Quantitative Model for Predicting Cross-application Interference in Virtual Environments

Cross-application interference can affect drastically performance of HPC applications when running in clouds. This problem is caused by concurrent access performed by co-located applications to shared and non-sliceable resources such as…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-10-17 Maicon Melo Alves , Lúcia Maria de Assumpção Drummond

Understanding Cloud Workloads Performance in a Production like Environment

Understanding inter-VM interference is of paramount importance to provide a sound knowledge and understand where performance degradation comes from in the current public cloud. With this aim, this paper devises a workload taxonomy that…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-10-13 Lucia Pons , Josué Feliu , José Puche , Chaoyi Huang , Salvador Petit , Julio Pons , María E. Gómez , Julio Sahuquillo

A Hardware-based HEFT Scheduler Implementation for Dynamic Workloads on Heterogeneous SoCs

Non-uniform performance and power consumption across the processing elements (PEs) of heterogeneous SoCs increase the computation complexity of the task scheduling problem compared to homogeneous architectures. Latency of a software-based…

Hardware Architecture · Computer Science 2022-11-15 Alexander Fusco , Sahil Hassan , Joshua Mack , Ali Akoglu

Modeling and Characterizing Service Interference in Dynamic Infrastructures

Performance interference can occur when various services are executed over the same physical infrastructure in a cloud system. This can lead to performance degradation compared to the execution of services in isolation. This work proposes a…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-02-08 VÍctor Medel , Unai Arronategui , Omer Rana , JosÉ Ángel BaÑares , Rafael Tolosana-Calasanz

A Data-Driven Approach to Lightweight DVFS-Aware Counter-Based Power Modeling for Heterogeneous Platforms

Computing systems have shifted towards highly parallel and heterogeneous architectures to tackle the challenges imposed by limited power budgets. These architectures must be supported by novel power management paradigms addressing the…

Performance · Computer Science 2023-05-12 Sergio Mazzola , Thomas Benz , Björn Forsberg , Luca Benini

Understanding GPU Resource Interference One Level Deeper

GPUs are vastly underutilized, even when running resource-intensive AI applications, as GPU kernels within each job have diverse resource profiles that may saturate some parts of a device while often leaving other parts idle. Colocating…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-02-17 Paul Elvinger , Foteini Strati , Natalie Enright Jerger , Ana Klimovic

HTS: A Hardware Task Scheduler for Heterogeneous Systems

As the Moore's scaling era comes to an end, application specific hardware accelerators appear as an attractive way to improve the performance and power efficiency of our computing systems. A massively heterogeneous system with a large…

Operating Systems · Computer Science 2019-07-02 Kartik Hegde , Abhishek Srivastava , Rohit Agrawal

Exploration of Performance and Energy Trade-offs for Heterogeneous Multicore Architectures

Energy-efficiency has become a major challenge in modern computer systems. To address this challenge, candidate systems increasingly integrate heterogeneous cores in order to satisfy diverse computation requirements by selecting cores with…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-02-07 Anastasiia Butko , Florent Bruguier , David Novo , Abdoulaye Gamatié , Gilles Sassatelli

Scheduler Technologies in Support of High Performance Data Analysis

Job schedulers are a key component of scalable computing infrastructures. They orchestrate all of the work executed on the computing infrastructure and directly impact the effectiveness of the system. Recently, job workloads have…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-03-06 Albert Reuther , Chansup Byun , William Arcand , David Bestor , Bill Bergeron , Matthew Hubbell , Michael Jones , Peter Michaleas , Andrew Prout , Antonio Rosa , Jeremy Kepner

HW-GPT-Bench: Hardware-Aware Architecture Benchmark for Language Models

The increasing size of language models necessitates a thorough analysis across multiple dimensions to assess trade-offs among crucial hardware metrics such as latency, energy consumption, GPU memory usage, and performance. Identifying…

Machine Learning · Computer Science 2024-11-05 Rhea Sanjay Sukthanker , Arber Zela , Benedikt Staffler , Aaron Klein , Lennart Purucker , Joerg K. H. Franke , Frank Hutter

A Survey of Real-time Scheduling on Accelerator-based Heterogeneous Architecture for Time Critical Applications

Accelerator-based heterogeneous architectures, such as CPU-GPU, CPU-TPU, and CPU-FPGA systems, are widely adopted to support the popular artificial intelligence (AI) algorithms that demand intensive computation. When deployed in real-time…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-05-20 An Zou , Yuankai Xu , Yinchen Ni , Jintao Chen , Yehan Ma , Jing Li , Christopher Gill , Xuan Zhang , Yier Jin

Characterizing Production GPU Workloads using System-wide Telemetry Data

GPGPU-accelerated clusters and supercomputers are central to modern high-performance computing (HPC). Over the past decade, these systems continue to expand, and GPUs now expose a wide range of hardware counters that provide detailed views…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-02-25 Onur Cankur , Brian Austin , Dhruva Kulkarni , Abhinav Bhatele