分布式、并行与集群计算

Resource Allocation in HyperX Networks

As high-performance computing systems scale in size and complexity, efficient resource management is essential to minimize communication overhead. The HyperX is a richly connected, low-diameter network that offers a scalable and…

分布式、并行与集群计算 · 计算机科学 2026-05-28 Alejandro Cano , Cristóbal Camarero , Carmen Martínez , Ramón Beivide

SiDP: Memory-Efficient Data Parallelism for Offline LLM Inference

The rapid adoption of large language models (LLMs) has shifted a substantial portion of inference workloads into throughput-oriented offline regimes, where fully utilizing GPU compute requires large batch sizes. However, existing…

分布式、并行与集群计算 · 计算机科学 2026-05-28 Alan Zhao , Cyril Y. He

Addressing Variable Heterogeneity in Distributed Multimodal Training with Entrain

Multimodal LLM datasets are inherently heterogeneous, with significant data variability. Although each modality exhibits independent variability, sample-level entanglement makes it difficult to balance workloads across both modalities and…

分布式、并行与集群计算 · 计算机科学 2026-05-28 Insu Jang , Mosharaf Chowdhury

SOLANET: Distributed Neighbor Graph Construction on GPU-Accelerated Systems

Neighbor graphs capture relationships among data points and are widely used in data analytics and AI workloads. Many studies have explored approximate construction methods for single-node systems, including GPUs. However, extending this to…

分布式、并行与集群计算 · 计算机科学 2026-05-28 Keita Iwabuchi , Trevor Steil , Benjamin W. Priest , Grace J. Li , Geoffrey Sanders , Roger Pearce

Carbon-Aware Mapping and Scheduling for Deadline-Constrained Workflows

As datacenters continue to grow in scale, their energy consumption and resulting carbon footprint have become pressing concerns. With the increasing share of renewable energy in a datacenter's mixed energy supply, shifting task execution to…

分布式、并行与集群计算 · 计算机科学 2026-05-28 Dominik Schweisgut , Anne Benoit , Yves Robert , Henning Meyerhenke

A Methodology to Assess Power Modeling in Energy-Aware Federated Learning on Heterogeneous Mobile Devices

Estimating CPU power on heterogeneous ARM-based commodity devices is challenging due to limited access to CPU's voltage domains. As a result, state-of-the-art energy-aware Federated Learning (FL) frameworks typically rely on simplified…

分布式、并行与集群计算 · 计算机科学 2026-05-28 Chaimae Jallouli , Karim Boubouh , Robert Basmadjian

Context-aware Simopt-Power: Using structural data with simulation metadata to optimise FPGA designs

Pre-implementation behavioural simulation routinely validates functional correctness, yet it also produces rich switching-activity traces that are typically discarded by FPGA computer-aided design (CAD) flows. Prior simulation-guided and…

分布式、并行与集群计算 · 计算机科学 2026-05-28 Eashan Wadhwa , Georgios Floros , Shanker Shreejith

A Formal Semantics of C with OpenMP Parallelism (Extended Version)

OpenMP is a popular parallelization framework that lets users transform sequential code into parallel code with a few simple annotations. Unfortunately, it is also easy to inadvertently introduce errors by adding OpenMP pragmas into…

分布式、并行与集群计算 · 计算机科学 2026-05-28 Ke Du , Anshu Sharma , Liyi Li , William Mansky

Orbax: Distributed Checkpointing with JAX

In a landscape of high-performance distributed ML systems, JAX has emerged as a framework of choice. However, JAX's modular design philosophy leaves it without a standardized checkpointing solution. In this paper, we introduce Orbax, a…

分布式、并行与集群计算 · 计算机科学 2026-05-28 Colin Gaffney , Shutong Li , Daniel Ng , Anastasia Petrushkina , Niket Kumar , Adam Cogdell , Mridul Sahu , Yaning Liang , Nikhil Bansal , Justin Pan , Angel Mau , Abhishek Agrawal , Marco Berlot , Ruoxin Sang , Kiranbir Sodhia , Rakesh Iyer

Ding-Dong Ditch: Peeking Into Spot Instance Availability

Spot instances offer significant cost savings of up to 90% over on-demand prices, making them an attractive resource for large-scale computing workloads. However, understanding their availability dynamics is essential for building systems…

分布式、并行与集群计算 · 计算机科学 2026-05-28 Kyumin Kim , Moohyun Song , Taeyoon Kim , Kyungyong Lee

Continuous benchmarking: Keeping pace with an evolving ecosystem of models and technologies

Drawing on ideas from continuous integration, we present concepts of an automated benchmarking pipeline for high performance applications. Customization and collaboration have been key design goals owing to the requirements of…

分布式、并行与集群计算 · 计算机科学 2026-05-28 Jan Vogelsang , Melissa Lober , Catherine Mia Schöfmann , José Villamar , Dennis Terhorst , Johanna Senk , Hans Ekkehard Plesser , Markus Diesmann , Susanne Kunkel , Anno C. Kurth

Accelerating discovery across scientific disciplines through reproducible workflows with AiiDAlab

With ever-increasing computational capabilities, robust and automated research workflows have become essential for orchestrating large numbers of interdependent simulations. However, significant technical expertise is still required to…

分布式、并行与集群计算 · 计算机科学 2026-05-28 Aliaksandr V. Yakutovich , Daniel Hollas , Edan Bainglass , Jusong Yu , Corsin Battaglia , Miki Bonacci , Lucas Fernandez Vilanova , Stephan Henne , Anders Kaestner , Michel Kenzelmann , Graham Kimbell , Jakob Lass , Fabio Lopes , Daniel G. Mazzone , Andres Ortega-Guerrero , Xing Wang , Nicola Marzari , Carlo A. Pignedoli , Giovanni Pizzi

Autonomic Federated-Market Orchestration for the Edge-Cloud Continuum

The edge-cloud computing continuum demands self-management mechanisms that scale across autonomous administrative domains while honouring tenant- and operator-specified data sovereignty. We present Neural Pub/Sub, a federated-broker…

分布式、并行与集群计算 · 计算机科学 2026-05-27 Lauri Lovén , Roberto Morabito , Abhishek Kumar , Susanna Pirttikangas , Jukka Riekki , Sasu Tarkoma

Nonlinear spectral clustering with C++ GraphBLAS

Nonlinear reformulations of the spectral clustering method have gained a lot of recent attention due to their increased numerical benefits and their solid mathematical background. However, the estimation of the multiple nonlinear…

分布式、并行与集群计算 · 计算机科学 2026-05-27 Dimosthenis Pasadakis , Olaf Schenk , Verner Vlacic , Albert-Jan Yzelman

Revisiting Bruck: Phase-Efficient All-to-All Communication in Reconfigurable Networks

All-to-All communication is a key performance bottleneck for distributed machine learning (ML) and high-performance computing (HPC) workloads, where dense traffic increasingly stresses scale-up interconnects. While these ML and HPC…

分布式、并行与集群计算 · 计算机科学 2026-05-27 Anton Juerss , Stefan Schmid

Reducing Internal State in Eigenvalue-Only Divide-and-Conquer Tridiagonal Eigensolvers

Divide and Conquer (D&C) is a widely used algorithmic strategy for symmetric eigenvalue decomposition. Its natural parallelism makes D&C attractive on modern multicore CPUs and GPUs, but existing eigenvalue-only routines often default to…

分布式、并行与集群计算 · 计算机科学 2026-05-27 Ruiyi Zhan , Shaoshuai Zhang

StreamSplit: Continuous Audio Representation Learning via Uncertainty-Guided Adaptive Splitting

Large-batch Contrastive Learning (CL), the foundation of modern representation learning, is fundamentally incompatible with the volatile resource constraints of edge devices. This conflict creates a dilemma: small on-device batches degrade…

分布式、并行与集群计算 · 计算机科学 2026-05-27 Minh K. Quan , Pubudu N. Pathirana

Characterization-Guided GPU Fault Resilience in NVIDIA MPS

NVIDIA Multi-Process Service (MPS) enables fine-grained GPU sharing by allowing multiple processes to execute concurrently on the same GPU, making it an important mechanism for improving GPU utilization. However, MPS has weak fault…

分布式、并行与集群计算 · 计算机科学 2026-05-27 Rixin Liu , Xingqi Cui , Kaijian Wang , Xinheng Ding , Zirui Liu , Yuke Wang , Jiarong Xing

Configuration-Driven Dynamic API Routing for Resilient Service Integrations

Modern online services rely on third-party APIs for authentication, payments, communication, identity verification, fraud detection, observability, and fulfillment. These dependencies are outside the direct operational control of the…

分布式、并行与集群计算 · 计算机科学 2026-05-27 Nataraj Agaram Sundar , Tejas Morabia

GridPilot: Real-Time Grid-Responsive Control for AI Supercomputers

At global scale, data-center electricity demand is growing faster than the grids that supply it, while system operators increasingly require large flexible loads that can adjust power within seconds to absorb variable wind and solar…

分布式、并行与集群计算 · 计算机科学 2026-05-27 Denisa-Andreea Constantinescu , David Atienza