分布式、并行与集群计算

ScaleAcross Explorer: Exploring Communication Optimization for Scale-Across AI Model Training

The rapid scaling of large language model training requires distributing GPU resources across multiple data center buildings and regions. We refer to such paradigm as "scale-across" training. As infrastructure expands, the system design…

分布式、并行与集群计算 · 计算机科学 2026-05-26 Minghao Li , Alicia Golden , Samuel Hsia , Michael Kuchnik , Adi Gangidi , Xu Zhang , Ashmitha Jeevaraj Shetty , Zachary DeVito , Weiwei Chu , Dong He , Haoci Zhang , Yuchen Hao , Ruoming Pang , James Hongyi Zeng , Ying Zhang , Minlan Yu , Carole-Jean Wu

Resident KV Claims: A Conformance Contract for Future Reuse under Active KV Pressure

KV-cache reuse mechanisms increasingly expose priority, duration, offload, routing hints, scheduler modes, and event streams. These mechanisms help preserve reusable prefixes, but they do not by themselves define a portable contract for…

分布式、并行与集群计算 · 计算机科学 2026-05-26 Lukas Stepanek

Polar: Agentic RL on Any Harness at Scale

Reinforcement learning for language agents increasingly depends on custom harnesses that manage long-running context, multi-turn tool use and multi-agent orchestration. However, porting these harnesses into RL environment interfaces remains…

分布式、并行与集群计算 · 计算机科学 2026-05-26 Binfeng Xu , Hao Zhang , Shaokun Zhang , Songyang Han , Mingjie Liu , Jian Hu , Shizhe Diao , Zhenghui Jin , Yunheng Zou , Michael Demoret , Jan Kautz , Yi Dong

A Tabular Schedule Abstraction for Communication-Aware Evaluation of Pipeline-Parallel LLM Training

Pipeline parallelism is a key technique for distributed training of large language models because it reduces per-device parameter and activation memory. However, comparing pipeline schedules is difficult: analytical models expose structural…

分布式、并行与集群计算 · 计算机科学 2026-05-26 Daniel Barley , Jonathan Leis , Benjamin Klenk , Holger Fröning

TSFLora: Token-Compressed Split Fine-Tuning for Wireless Edge Networks

Adapting large AI models (LAMs) to personalized edge data is challenging because wireless devices have limited memory, computation, and uplink capacity. Federated fine-tuning preserves data privacy but still requires each device to host the…

分布式、并行与集群计算 · 计算机科学 2026-05-26 Xianke Qiang , Zheng Chang , Li Wang , Ying-Chang Liang

RASC: Region-Aware Self-Calibration for Dense 2D Sensor Arrays

BJT-based 2D temperature-sensor arrays are factory-calibrated to +/-0.1 degC, but post-deployment thermal and mechanical stresses drift their per-sensor gain-offset parameters by an order of magnitude, and in-lab recalibration is…

分布式、并行与集群计算 · 计算机科学 2026-05-26 Yinglei Ma , Fei Xiao

The Model Parking Tax: Quantifying the Hidden Energy Cost of Always-On GPU Model Deployment

The AI inference industry keeps models loaded in GPU memory around the clock to avoid cold-start latency, implicitly treating idle power as a fixed cost of readiness. Yet the structure of this cost has never been empirically decomposed -…

分布式、并行与集群计算 · 计算机科学 2026-05-26 Sai Sathvik Vadari

VineLM: Trie-Based Fine-Grained Control for Agentic Workflows

Agentic workflows interleave configurable LLM stages with tool stages and often include retries or refinement loops. Existing workflow managers profile full workflow configurations offline and assign each request a static workflow-level…

分布式、并行与集群计算 · 计算机科学 2026-05-26 Nikos Pagonas , Matthew Lou , Tianyi Peng , Dan Rubenstein , Kostis Kaffes

Can LoRA Fusion Support Cross-Domain Tasks in Cloud-Edge Collaboration?

Cloud-hosted large language models (LLMs) commonly rely on LoRA for domain adaptation, yet domain data are distributed across multiple edge devices and cannot be uploaded due to privacy constraints. This raises a fundamental question: how…

分布式、并行与集群计算 · 计算机科学 2026-05-26 Yatong Wang , Fali Wang , Naibin Gu , Zheng Lin , Zhengxiao Liu , Dingyu Yao , Zhiwei Zhang , Jianxin Shi , Weiping Wang

Cross-Platform Fused MoE Dispatch in Triton: Portable Expert Routing Without CUDA

Mixture-of-Experts (MoE) architectures power the majority of frontier large language models, but their inference is bottlenecked by irregular memory access patterns and expert routing overhead. Existing optimized MoE kernels (Megablocks,…

分布式、并行与集群计算 · 计算机科学 2026-05-26 Subhadip Mitra

Mathematical Foundations for Peer-to-Peer Lattice Computation

We give structured proofs for five mathematical propositions governing synchronous peer-to-peer computation on a finite grid graph embedded in $\mathbb{Z}^2$. Proposition 1 gives three lower bounds: a transport-work bound $\sum_i a_i \ell_i…

分布式、并行与集群计算 · 计算机科学 2026-05-26 Danil Gorinevski

PipeSD: An Efficient Cloud-Edge Collaborative Pipeline Inference Framework with Speculative Decoding

Speculative decoding can significantly accelerate LLM inference, especially given that its cloud-edge collaborative deployment offers cloud workload offloading, offline robustness, and privacy enhancement. However, existing collaborative…

分布式、并行与集群计算 · 计算机科学 2026-05-26 Yunhe Han , Yunqi Gao , Bing Hu , Mahdi Boloursaz Mashhadi , Yitong Duan , Pei Xiao , Yanfeng Zhang

An Uncertainty-Aware Resilience Micro-Agent for Causal Observability in the Computing Continuum

Grey failures in the computing continuum produce ambiguous overlapping symptoms that existing approaches fail to diagnose reliably, either due to a lack of causal awareness or acting under high epistemic uncertainty, risking destructive…

分布式、并行与集群计算 · 计算机科学 2026-05-26 Suvi De Silva , Alfreds Lapkovskis , Alaa Saleh , Sasu Tarkoma , Praveen Kumar Donta

Taming Request Imbalance: SLO-Aware Scheduling for Disaggregated LLM Inference

In production environments, large language model (LLM) serving is required to meet stringent service-level objectives (SLOs) amid highly variable request patterns. In practice, request lengths follow a long-tail distribution, which gives…

分布式、并行与集群计算 · 计算机科学 2026-05-26 Qipeng Wang

SwiftFusion: Scalable Sequence Parallelism for Distributed Inference of Diffusion Transformers on GPUs

Diffusion Transformers (DiTs) have gained increasing adoption in high-quality image and video generation. As demand for higher-resolution images and longer videos increases, single-GPU inference becomes inefficient due to increased latency…

分布式、并行与集群计算 · 计算机科学 2026-05-26 Jiacheng Yang , Jun Wu , Yaoyao Ding , Zhiying Xu , Yida Wang , Gennady Pekhimenko

VLCs: Managing Parallelism with Virtualized Libraries

As the complexity and scale of modern parallel machines continue to grow, programmers increasingly rely on composition of software libraries to encapsulate and exploit parallelism. However, many libraries are not designed with composition…

分布式、并行与集群计算 · 计算机科学 2026-05-26 Yineng Yan , William Ruys , Hochan Lee , Ian Henriksen , Arthur Peters , Sean Stephens , Bozhi You , Henrique Fingler , Martin Burtscher , Milos Gligoric , Keshav Pingali , Mattan Erez , George Biros , Christopher J. Rossbach

A Multi-Armed Bandit-Based Participant Selection Method for Federated Recommendation Systems

Federated Recommendation Systems (FRS) enable privacy-preserving model training by keeping user data on edge devices. However, the practical deployment of FRS in Edge-Cloud environments faces significant challenges due to system and…

分布式、并行与集群计算 · 计算机科学 2026-05-26 Jintao Liu , Mohammad Goudarzi , Adel Nadjaran Toosi

An Ecosystem of Services for FAIR Computational Workflows

Computational workflows represent major investments of effort and expertise. As first-class, publishable research objects of their own, they are key to sharing methodological know-how for reuse, reproducibility, and transparency. Thus, the…

分布式、并行与集群计算 · 计算机科学 2026-05-26 Sean R. Wilkinson , Johan Gustafsson , Finn Bacall , Khalid Belhajjame , Salvador Capella , Jose Maria Fernandez Gonzalez , Jacob Fosso Tande , Luiz Gadelha , Daniel Garijo , Patricia Grubel , Bjorn Grüning , Farah Zaib Khan , Sehrish Kanwal , Simone Leo , Stuart Owen , Luca Pireddu , Line Pouchard , Laura Rodríguez-Navas , Beatriz Serrano-Solano , Stian Soiland-Reyes , Baiba Vilne , Alan Williams , Merridee Ann Wouters , Frederik Coppens , Carole Goble

Communication-Efficient Hybrid Language Model via Uncertainty-Aware Opportunistic and Compressed Transmission

To support emerging language-based applications using dispersed and heterogeneous computing resources, the hybrid language model (HLM) offers a promising architecture, where an on-device small language model (SLM) generates draft tokens…

分布式、并行与集群计算 · 计算机科学 2026-05-26 Seungeun Oh , Jinhyuk Kim , Jihong Park , Seung-Woo Ko , Jinho Choi , Tony Q. S. Quek , Seong-Lyun Kim

LA-IMR: Latency-Aware, Predictive In-Memory Routing and Proactive Autoscaling for Tail-Latency-Sensitive Cloud Robotics

Hybrid cloud-edge infrastructures now support latency-critical workloads ranging from autonomous vehicles and surgical robotics to immersive AR/VR. However, they continue to experience crippling long-tail latency spikes whenever bursty…

分布式、并行与集群计算 · 计算机科学 2026-05-26 Eunil Seo , Chanh Nguyen , Erik Elmroth