分布式、并行与集群计算

A Morton-Type Space-Filling Curve for Pyramid Subdivision and Hybrid Adaptive Mesh Refinement

The forest-of-refinement-trees approach allows for dynamic adaptive mesh refinement (AMR) at negligible cost. While originally developed for quadrilateral and hexahedral elements, previous work established the theory and algorithms for…

分布式、并行与集群计算 · 计算机科学 2026-05-25 David Knapp , Johannes Albrecht Holke , Thomas Spenke , Carsten Burstedde , Lukas Dreyer

ZipMoE: Efficient On-Device MoE Serving via Lossless Compression and Cache-Affinity Scheduling

While Mixture-of-Experts (MoE) architectures substantially bolster the expressive power of large-language models, their prohibitive memory footprint severely impedes the practical deployment on resource-constrained edge devices, especially…

分布式、并行与集群计算 · 计算机科学 2026-05-25 Yuchen Yang , Yaru Zhao , Pu Yang , Shaowei Wang , Zhi-Hua Zhou

AI-Driven Multi-Region Provisioning for Cloud Services Using Spot Fleets

Cloud service platforms increasingly rely on elastic infrastructures to support dynamic workloads. Spot instances provide discounted computing resources but introduce uncertainty due to dynamic pricing, resource availability, and…

分布式、并行与集群计算 · 计算机科学 2026-05-22 Javier Fabra , Enrique Molina-Giménez , Pedro García-López

Relay-Based Synchronization of Replicated Data Types in Opportunistic Networks

In Opportunistic Networks (OppNets), the dissemination of information can only rely on transient pairwise radio contacts between mobile devices (peers). Designing distributed applications that can run in such conditions is a challenge, but…

分布式、并行与集群计算 · 计算机科学 2026-05-22 Frédéric Guidec , Yves Mahéo

Exploiting Multicast for Accelerating Collective Communication

Reducing collective communication latency is a critical goal for large model training and inference in both academia and industry. Many-to-many communications, such as AllGather and AlltoAll (dispatch), are core components of modern…

分布式、并行与集群计算 · 计算机科学 2026-05-22 Chao Xu , Xu Zhang , Zihang Luo , Yuyan Wu , Guoxin Qian , Yufeng Yao , Chihyung Wang , Jingbin Zhou

Nf-PEAK: Process-Based Energy Attribution for Nextflow Workflows on Kubernetes Clusters

Scientific workflows are pipelines of interdependent tasks. They are increasingly executed on shared Kubernetes clusters via workflow engines such as Nextflow. Their energy consumption matters for both cost and sustainability. It is…

分布式、并行与集群计算 · 计算机科学 2026-05-22 Philipp Thamm , Somayeh Mohammadi , Kathleen West , Knut Reinert , Lauritz Thamsen , Ulf Leser

Secure and Parallel Determinant Computation for Large-Scale Matrices in Edge Environments

The advent of edge computing has enabled resource-constrained clients to delegate intensive computational tasks to distributed edge servers, especially within Internet of Things (IoT) environments. Among such tasks, Matrix Determinant…

分布式、并行与集群计算 · 计算机科学 2026-05-22 Prajwal Panth

LiveR: Fine-Grained Elasticity via Live Reconfiguration for Model Training

To reduce user costs and maximize cluster utilization, large model training increasingly leverages volatile but inexpensive GPU capacity, such as spot instances and reclaimable resources in shared clusters. Yet, capitalizing on these…

分布式、并行与集群计算 · 计算机科学 2026-05-22 Haoyuan Liu , Kairui Zhou , Shuyao Qi , Qinwei Yang , Shengkai Lin , Shizhen Zhao , Wei Zhang

DynaFlow: Transparent and Flexible Intra-Device Parallelism via Programmable Operator Scheduling

Intra-device parallelism addresses resource under-utilization in ML inference and training by overlapping the execution of operators with different resource usage. However, its wide adoption is hindered by a fundamental conflict with the…

分布式、并行与集群计算 · 计算机科学 2026-05-22 Yi Pan , Yile Gu , Jinbin Luo , Yibo Wu , Ziren Wang , Hongtao Zhang , Ziyi Xu , Shengkai Lin , Baris Kasikci , Stephanie Wang

Dooly: Configuration-Agnostic, Redundancy-Aware Profiling for LLM Inference Simulation

Selecting the optimal LLM inference configuration requires evaluation across hardware, serving engines, attention backends, and model architectures, since no single choice performs best across all workloads. Profile-based simulators are the…

分布式、并行与集群计算 · 计算机科学 2026-05-22 Joon Ha Kim , Geon-Woo Kim , Anoop Rachakonda , Daehyeok Kim

SpaceMoE: Realizing Distributed Mixture-of-Experts Inference over Space Networks

Leveraging continuous solar energy harvesting at high efficiency, space data centers are envisioned as a promising platform for executing energy-intensive large language models (LLMs). Recognizing this advantage, space and AI conglomerates…

分布式、并行与集群计算 · 计算机科学 2026-05-22 Zhanwei Wang , Huiling Yang , Min Sheng , Khaled B. Letaief , Kaibin Huang

Evidential Trust-Aware Model Personalization in Decentralized Federated Learning for Wearable IoT

Decentralized federated learning (DFL) enables collaborative model training across edge devices without centralized coordination, offering resilience against single points of failure. However, statistical heterogeneity arising from…

分布式、并行与集群计算 · 计算机科学 2026-05-22 Murtaza Rangwala , Richard O. Sinnott , Rajkumar Buyya

WarmServe: Enabling One-for-Many GPU Prewarming for Multi-LLM Serving

Deploying multiple models within shared GPU clusters is a key strategy to improve resource efficiency in large language model (LLM) serving. Existing multi-LLM serving systems improve GPU utilization at the cost of degraded inference…

分布式、并行与集群计算 · 计算机科学 2026-05-22 Chiheng Lou , Sheng Qi , Rui Kang , Yong Zhang , Chen Sun , Pengcheng Wang , Xuanzhe Liu , Xin Jin

Intelligence per Watt: Measuring Intelligence Efficiency of Local AI

Large language model (LLM) queries are predominantly processed by frontier models in centralized cloud infrastructure. Demand growth strains this paradigm faster than providers can scale. Two advances create an opportunity to rethink it:…

分布式、并行与集群计算 · 计算机科学 2026-05-22 Jon Saad-Falcon , Avanika Narayan , Hakki Orhun Akengin , J. Wes Griffin , Herumb Shandilya , Adrian Gamarra Lafuente , Medhya Goel , Rebecca Joseph , Shlok Natarajan , Etash Kumar Guha , Shang Zhu , Ben Athiwaratkun , John Hennessy , Azalia Mirhoseini , Christopher Ré

ML-Based Optimum Sub-system Size Heuristic for the GPU Implementation of the Tridiagonal Partition Method

This paper presents a machine learning (ML)-based heuristic for finding the optimum sub-system size for the CUDA implementation of the parallel partition algorithm. Computational experiments for different system of linear algebraic equation…

分布式、并行与集群计算 · 计算机科学 2026-05-22 Milena Veneva

A Distributed Consensus Algorithm for Prioritizing Autonomous Vehicle Passing at Unsignalized Intersections under Mixed Traffic

We propose a methodology for connected autonomous vehicles (CAVs) to determine their passing priority at unsignalized intersections where they coexist with human-driven vehicles (HVs). Assuming that CAVs can perceive the entry order of…

分布式、并行与集群计算 · 计算机科学 2026-05-22 Younjeong Lee , Young Yoon

ML-Based Optimum Number of CUDA Streams for the GPU Implementation of the Tridiagonal Partition Method

This paper presents a heuristic for finding the optimum number of CUDA streams by using tools common to the modern AI-oriented approaches and applied to the parallel partition algorithm. A time complexity model for the GPU realization of…

分布式、并行与集群计算 · 计算机科学 2026-05-22 Milena Veneva , Toshiyuki Imamura

Frontier: Towards Comprehensive and Accurate LLM Inference Simulation

Modern LLM serving is no longer homogeneous or monolithic. Production systems now combine disaggregated execution, complex parallelism, runtime optimizations, and stateful workloads such as reasoning, agents, and RL rollouts. Simulation is…

分布式、并行与集群计算 · 计算机科学 2026-05-21 Yicheng Feng , Xin Tan , Yangtao Deng , Yimin Jiang , Yibo Zhu , Hong Xu

Cloud-Native Operation of Roadside Infrastructure Enabling Demand-Driven Collective Perception via V2X

Intelligent roadside infrastructure is a key enabler for cooperative intelligent transport systems (C-ITS), supporting vehicles equipped with automated driving systems (ADS), e.g., through enhanced environment perception. With a growing…

分布式、并行与集群计算 · 计算机科学 2026-05-21 Lukas Zanger , Fabian Thomsen , Guido Linden , Jean-Pierre Busch , Lennart Reiher , Lutz Eckstein

Automated Byzantine-Resilient Clustered Decentralized Federated Learning for Battery Intelligence in Connected EVs

Federated learning (FL) has emerged as a promising paradigm for managing electric vehicle (EV) battery data in intelligent transportation systems (ITS), enabling privacy-preserving tasks such as anomaly detection and capacity estimation.…

分布式、并行与集群计算 · 计算机科学 2026-05-21 Mouhamed Amine Bouchiha , Abdelaziz Amara Korba , Yacine Ghamri-Doudane