Related papers: Leveraging Hardware Performance Counters for Predi…
Data movement is a key bottleneck in terms of both performance and energy efficiency in modern HPC systems. The NEC SX-series supercomputers have a long history of accelerating memory-intensive HPC applications by providing sufficient…
The Aurora supercomputer is an exascale-class system designed to tackle some of the most demanding computational workloads. Equipped with both High Bandwidth Memory (HBM) and DDR memory, it provides unique trade-offs in performance,…
Although High Performance Computing (HPC) users understand basic resource requirements such as the number of CPUs and memory limits, internal infrastructural utilization data is exclusively leveraged by cluster operators, who use it to…
For current High Performance Computing systems to scale towards the holy grail of ExaFLOP performance, their power consumption has to be reduced by at least one order of magnitude. This goal can be achieved only through a combination of…
Neural networks are increasingly used in real-time systems, such as automated driving applications. This requires high-performance hardware with predictable timing behavior. State-of-the-art real-time hardware is limited in memory and…
Heterogeneous architectures have emerged as a promising alternative for homogeneous architectures to improve the energy-efficiency of computer systems. Composite Cores Architecture (CCA), a class of dynamic heterogeneous architectures…
Modern Infrastructure-as-a-Service Clouds operate in a competitive environment that caters to any user's requirements for computing resources. The sharing of the various types of resources by diverse applications poses a series of…
High-Performance Computing (HPC) centers and cloud providers support an increasingly diverse set of applications on heterogenous hardware. As Artificial Intelligence (AI) and Machine Learning (ML) workloads have become an increasingly…
Cross-application interference can affect drastically performance of HPC applications when running in clouds. This problem is caused by concurrent access performed by co-located applications to shared and non-sliceable resources such as…
Understanding inter-VM interference is of paramount importance to provide a sound knowledge and understand where performance degradation comes from in the current public cloud. With this aim, this paper devises a workload taxonomy that…
Non-uniform performance and power consumption across the processing elements (PEs) of heterogeneous SoCs increase the computation complexity of the task scheduling problem compared to homogeneous architectures. Latency of a software-based…
Performance interference can occur when various services are executed over the same physical infrastructure in a cloud system. This can lead to performance degradation compared to the execution of services in isolation. This work proposes a…
Computing systems have shifted towards highly parallel and heterogeneous architectures to tackle the challenges imposed by limited power budgets. These architectures must be supported by novel power management paradigms addressing the…
GPUs are vastly underutilized, even when running resource-intensive AI applications, as GPU kernels within each job have diverse resource profiles that may saturate some parts of a device while often leaving other parts idle. Colocating…
As the Moore's scaling era comes to an end, application specific hardware accelerators appear as an attractive way to improve the performance and power efficiency of our computing systems. A massively heterogeneous system with a large…
Energy-efficiency has become a major challenge in modern computer systems. To address this challenge, candidate systems increasingly integrate heterogeneous cores in order to satisfy diverse computation requirements by selecting cores with…
Job schedulers are a key component of scalable computing infrastructures. They orchestrate all of the work executed on the computing infrastructure and directly impact the effectiveness of the system. Recently, job workloads have…
The increasing size of language models necessitates a thorough analysis across multiple dimensions to assess trade-offs among crucial hardware metrics such as latency, energy consumption, GPU memory usage, and performance. Identifying…
Accelerator-based heterogeneous architectures, such as CPU-GPU, CPU-TPU, and CPU-FPGA systems, are widely adopted to support the popular artificial intelligence (AI) algorithms that demand intensive computation. When deployed in real-time…
GPGPU-accelerated clusters and supercomputers are central to modern high-performance computing (HPC). Over the past decade, these systems continue to expand, and GPUs now expose a wide range of hardware counters that provide detailed views…