English
Related papers

Related papers: Leveraging Hardware Performance Counters for Predi…

200 papers

Data movement is a key bottleneck in terms of both performance and energy efficiency in modern HPC systems. The NEC SX-series supercomputers have a long history of accelerating memory-intensive HPC applications by providing sufficient…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-08-15 Keichi Takahashi , Soya Fujimoto , Satoru Nagase , Yoko Isobe , Yoichi Shimomura , Ryusuke Egawa , Hiroyuki Takizawa

The Aurora supercomputer is an exascale-class system designed to tackle some of the most demanding computational workloads. Equipped with both High Bandwidth Memory (HBM) and DDR memory, it provides unique trade-offs in performance,…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-04-07 Huda Ibeid , Vikram Narayana , Jeongnim Kim , Anthony Nguyen , Vitali Morozov , Ye Luo

Although High Performance Computing (HPC) users understand basic resource requirements such as the number of CPUs and memory limits, internal infrastructural utilization data is exclusively leveraged by cluster operators, who use it to…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-01-19 Abel Souza , Kristiaan Pelckmans , Johan Tordsson

For current High Performance Computing systems to scale towards the holy grail of ExaFLOP performance, their power consumption has to be reduced by at least one order of magnitude. This goal can be achieved only through a combination of…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-06-01 Alina Sîrbu , Ozalp Babaoglu

Neural networks are increasingly used in real-time systems, such as automated driving applications. This requires high-performance hardware with predictable timing behavior. State-of-the-art real-time hardware is limited in memory and…

Hardware Architecture · Computer Science 2024-10-15 Maximilian Kirschner , Konstantin Dudzik , Jürgen Becker

Heterogeneous architectures have emerged as a promising alternative for homogeneous architectures to improve the energy-efficiency of computer systems. Composite Cores Architecture (CCA), a class of dynamic heterogeneous architectures…

Hardware Architecture · Computer Science 2018-08-07 Hossein Sayadi

Modern Infrastructure-as-a-Service Clouds operate in a competitive environment that caters to any user's requirements for computing resources. The sharing of the various types of resources by diverse applications poses a series of…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-01-28 Evangelos Angelou , Konstantinos Kaffes , Athanasia Asiki , Georgios Goumas , Nectarios Koziris

High-Performance Computing (HPC) centers and cloud providers support an increasingly diverse set of applications on heterogenous hardware. As Artificial Intelligence (AI) and Machine Learning (ML) workloads have become an increasingly…

Cross-application interference can affect drastically performance of HPC applications when running in clouds. This problem is caused by concurrent access performed by co-located applications to shared and non-sliceable resources such as…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-10-17 Maicon Melo Alves , Lúcia Maria de Assumpção Drummond

Understanding inter-VM interference is of paramount importance to provide a sound knowledge and understand where performance degradation comes from in the current public cloud. With this aim, this paper devises a workload taxonomy that…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-10-13 Lucia Pons , Josué Feliu , José Puche , Chaoyi Huang , Salvador Petit , Julio Pons , María E. Gómez , Julio Sahuquillo

Non-uniform performance and power consumption across the processing elements (PEs) of heterogeneous SoCs increase the computation complexity of the task scheduling problem compared to homogeneous architectures. Latency of a software-based…

Hardware Architecture · Computer Science 2022-11-15 Alexander Fusco , Sahil Hassan , Joshua Mack , Ali Akoglu

Performance interference can occur when various services are executed over the same physical infrastructure in a cloud system. This can lead to performance degradation compared to the execution of services in isolation. This work proposes a…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-02-08 VÍctor Medel , Unai Arronategui , Omer Rana , JosÉ Ángel BaÑares , Rafael Tolosana-Calasanz

Computing systems have shifted towards highly parallel and heterogeneous architectures to tackle the challenges imposed by limited power budgets. These architectures must be supported by novel power management paradigms addressing the…

Performance · Computer Science 2023-05-12 Sergio Mazzola , Thomas Benz , Björn Forsberg , Luca Benini

GPUs are vastly underutilized, even when running resource-intensive AI applications, as GPU kernels within each job have diverse resource profiles that may saturate some parts of a device while often leaving other parts idle. Colocating…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-02-17 Paul Elvinger , Foteini Strati , Natalie Enright Jerger , Ana Klimovic

As the Moore's scaling era comes to an end, application specific hardware accelerators appear as an attractive way to improve the performance and power efficiency of our computing systems. A massively heterogeneous system with a large…

Operating Systems · Computer Science 2019-07-02 Kartik Hegde , Abhishek Srivastava , Rohit Agrawal

Energy-efficiency has become a major challenge in modern computer systems. To address this challenge, candidate systems increasingly integrate heterogeneous cores in order to satisfy diverse computation requirements by selecting cores with…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-02-07 Anastasiia Butko , Florent Bruguier , David Novo , Abdoulaye Gamatié , Gilles Sassatelli

Job schedulers are a key component of scalable computing infrastructures. They orchestrate all of the work executed on the computing infrastructure and directly impact the effectiveness of the system. Recently, job workloads have…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-03-06 Albert Reuther , Chansup Byun , William Arcand , David Bestor , Bill Bergeron , Matthew Hubbell , Michael Jones , Peter Michaleas , Andrew Prout , Antonio Rosa , Jeremy Kepner

The increasing size of language models necessitates a thorough analysis across multiple dimensions to assess trade-offs among crucial hardware metrics such as latency, energy consumption, GPU memory usage, and performance. Identifying…

Accelerator-based heterogeneous architectures, such as CPU-GPU, CPU-TPU, and CPU-FPGA systems, are widely adopted to support the popular artificial intelligence (AI) algorithms that demand intensive computation. When deployed in real-time…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-05-20 An Zou , Yuankai Xu , Yinchen Ni , Jintao Chen , Yehan Ma , Jing Li , Christopher Gill , Xuan Zhang , Yier Jin

GPGPU-accelerated clusters and supercomputers are central to modern high-performance computing (HPC). Over the past decade, these systems continue to expand, and GPUs now expose a wide range of hardware counters that provide detailed views…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-02-25 Onur Cankur , Brian Austin , Dhruva Kulkarni , Abhinav Bhatele
‹ Prev 1 2 3 10 Next ›