分布式、并行与集群计算

Optimistic, Signature-Free Reliable Broadcast and Its Applications

Reliable broadcast (RBC) is a key primitive in fault-tolerant distributed systems, and improving its efficiency can benefit a wide range of applications. This work focuses on signature-free RBC protocols, which are particularly attractive…

分布式、并行与集群计算 · 计算机科学 2026-05-26 Nibesh Shrestha , Qianyu Yu , Aniket Kate , Giuliano Losa , Kartik Nayak , Xuechao Wang

Efficient Distributed MLLM Training with Cornstarch

Multimodal large language models (MLLMs) extend the capabilities of large language models (LLMs) by combining heterogeneous model architectures to handle diverse modalities like images and audio. However, this inherent heterogeneity in MLLM…

分布式、并行与集群计算 · 计算机科学 2026-05-26 Insu Jang , Runyu Lu , Nikhil Bansal , Ang Chen , Mosharaf Chowdhury

Enhancing Energy Efficiency in Scientific Workflows through CFD based PIVAEs

The growing complexity and scale of scientific workflows in high performance computing (HPC) environments have led to significant challenges in managing energy consumption without compromising computational performance. Traditional…

分布式、并行与集群计算 · 计算机科学 2026-05-25 Ali Zahir , Ashiq Anjum , Mark Wilkinson , Jeyan Thiyagalingam

HyperParallel-MoE: Multi-Core Interleaved Scheduling for Fast MoE Training on Ascend NPUs

Modern Mixture-of-Experts (MoE) models increasingly rely on large-scale AI accelerator clusters for efficient training. Ascend NPUs expose heterogeneous on-chip compute resources, including matrix-oriented AIC units and vector-oriented AIV…

分布式、并行与集群计算 · 计算机科学 2026-05-25 Zewen Jin , Congkun Ai , Guangpeng Zhang , Hanbo Zhang , Haoran Wang , Shihan Xiao , Da Lei , Xuefeng Jin , Teng Su , Cheng Li

Flare: Leveraging Serverless Elasticity to Absorb Microservice Load Spikes

Online services strive to maintain application responsiveness even when the traffic is unpredictable and fluctuating. Today's online services are commonly deployed as chains of microservices, each microservice packaged as one or more…

分布式、并行与集群计算 · 计算机科学 2026-05-25 Dilina Dehigama , Shyam Jesalpura , David Schall , Antonios Katsarakis , Marios Kogias , Rakesh Kumar , Boris Grot

AMP: Arc Multi-Proposer Protocol with Bounded Inclusion Guarantees

Blockchain systems that settle financial transactions face a structural tension: the single validator that assembles each block holds unilateral power over transaction inclusion and ordering. Traditional markets curb this very power through…

分布式、并行与集群计算 · 计算机科学 2026-05-25 Daniel Cason , Gordon Liao , Sergio Mena , Nenad Milošević , Adi Seredinschi , Alessandro Sforzin , João Sousa , Preston Vander Vos

Herring: Parallel Batch-Order-Fairness on DAG-based Blockchain Consensus

Transaction ordering attacks extract billions of dollars annually from decentralized finance users in the form of Maximal Extractable Value (MEV). Byzantine Fault-Tolerant (BFT) consensus protocols guarantee total order but place no…

分布式、并行与集群计算 · 计算机科学 2026-05-25 Marko Putnik , Jérémie Decouchant

Multi-Factor Trust-Driven Secure Communication Model for Cloud-Based Digital Twins

Cloud-based Digital Twin (DT) platforms enable real-time monitoring, simulation, and collaborative decision-making across distributed clients. However, ensuring secure and trustworthy communication remains a critical challenge due to…

分布式、并行与集群计算 · 计算机科学 2026-05-25 Deepika Saxena , Ashutosh Kumar Singh

Multi-Round Visibility: A Post-Consensus Ordering Layer for DAG-Based BFT

Directed acyclic graph (DAG)-based Byzantine Fault-Tolerant (BFT) protocols achieve high throughput by decoupling dissemination from agreement and allowing many vertices to be committed concurrently. This same concurrency, however, weakens…

分布式、并行与集群计算 · 计算机科学 2026-05-25 Pengkun Ren , Dong Hai , Nasrin Sohrabi , Zahir Tari

AlignedServe: Orchestrating Prefix-aware Batching to Build a High-throughput and Computing-efficient LLM Serving System

High-throughput inference serving is essential for applications built on large language models (LLMs). Existing serving frameworks reduce request-level and batch-level bubbles through batching and scheduling, but often overlook bubbles…

分布式、并行与集群计算 · 计算机科学 2026-05-25 Fengyao Bai , Hongbin Zhang , Zhitao Chen , Jiangsu Du , Zhiguang Chen , Yutong Lu

XWind: A Cross-site Router for Large Language Model Inference Serving at Renewable Energy Farms

AI power demand is growing at an unprecedented rate while power grids are often ailing and struggle to keep up. Grid expansion comes with high capital expenditure and long-distance transmission losses, yet there is abundant renewable energy…

分布式、并行与集群计算 · 计算机科学 2026-05-25 Tella Rajashekhar Reddy , Atharva Deshmukh , Liangcheng Yu , Chaojie Zhang , Mike Shepperd , Rohan Gandhi , Anjaly Parayil , Srinivasan Iyengar , Ajay Manchepalli , Debopam Bhattacherjee

Budgeted Dynamic Trace Structures for Token-Efficient Sequential Computation

Sequential computation increasingly produces long traces containing nested branches, status transitions, textual payloads, and compact summaries of earlier execution. This paper introduces budgeted dynamic trace structures (BDTS), a…

分布式、并行与集群计算 · 计算机科学 2026-05-25 Faruk Alpay , Levent Sarioglu

ObjectCache: Layerwise Object-Storage Retrieval for KV Cache Reuse

Prefix KV caching has become a key mechanism in LLM serving: it reduces time to first token (TTFT) by avoiding redundant computation across requests that share a prefix (i.e., the system prompt). However, the accumulated KV cache is often…

分布式、并行与集群计算 · 计算机科学 2026-05-25 Yu Zhu , Aditya Dhakal , Yunming Xiao , Dejan Milojicic , Gustavo Alonso

Intercloud: Eventual Consistency for Decentralised Economies via Chilling-Effect Consensus

We present Intercloud, a decentralised economic network in which streams of private data are secured by Watcher swarms that observe only cryptographic hashes, never plaintext. Intercloud requires no global consensus beyond a single shared…

分布式、并行与集群计算 · 计算机科学 2026-05-25 Gregory Magarshak

KPI2KVI: A Multi Agent Workflow for Calculating Key Value Indicators from Service Descriptions

Key Value Indicators (KVIs) provide a decision oriented view of a service by summarizing how operational performance translates into stakeholder value, risk, and outcomes. However, in many domains KVIs are difficult to compute in practice…

分布式、并行与集群计算 · 计算机科学 2026-05-25 Masoud Shokrnezhad , Tarik Taleb , Yan Chen , Qize Guo

An AI-Driven Framework for Energy-Efficient Environmental Monitoring in Smart Cities Using Edge Intelligence

Environmental monitoring is a crucial component of the smart city infrastructure. It enables informed decision making which enhances sustainability, public health and urban planning. However, the large-scale deployments of the smart sensors…

分布式、并行与集群计算 · 计算机科学 2026-05-25 Yichen Liu , Imam Akintomiwa Akinlade , Xiaochong Jiang , Wenting Yang , Shiqi Yang

Hybrid Edge-HPC Systems for Low-Latency Data-Driven Inference

Emerging cyber-physical systems increasingly require low-latency inference from streaming sensor data while maintaining models that reflect complex and evolving physical processes. In many domains, however, model updates depend on…

分布式、并行与集群计算 · 计算机科学 2026-05-25 Liubov Kurafeeva , Ryan Hartung , Benjamin Carter , Alan Subedi , Avhishek Biswas , Michael Fay , Shantenu Jha , Chandra Krintz , Andre Merzky , Douglas Thain , Memet Can Vuran , Rich Wolski

ReCoVer: Resilient LLM Pre-Training System via Fault-Tolerant Collective and Versatile Workload

Pre-training large language models on massive GPU clusters has made hardware faults routine rather than rare, driving the need for resilient training systems. Yet existing frameworks either focus on specific parallelism schemes or risk…

分布式、并行与集群计算 · 计算机科学 2026-05-25 Ziyue Liu , Zhengyang Wang , Ruijie Zhang , Avinash Maurya , Hui Zhou , Paul Hovland , Sheng Di , Franck Cappello , Bogdan Nicolae , Zheng Zhang

Closer in the Gap: Towards Portable Performance on RISC-V Vector Processors

The RISC-V Vector Extension~(RVV) is a cornerstone for supporting compute throughout in scientific and machine learning workloads. Yet compiler support and performance monitoring on real RVV~1.0 hardware are still evolving. In this work, we…

分布式、并行与集群计算 · 计算机科学 2026-05-25 Ruimin Shi , Maya Gokhale , Pei-Hung Lin , Xavier Teruel , Ivy Peng

Communication Offloading on SmartNIC DPUs: A Quantitative Approach

SmartNIC Data Processing Units (DPUs) offer a promising solution for saving high-end CPU resources by offloading tasks to programmable cores near the network interface. In this work, we explore the feasibility of SmartNIC DPUs in supporting…

分布式、并行与集群计算 · 计算机科学 2026-05-25 Jacob Wahlgren , Andong Hu , Roger Pearce , Maya Gokhale , Ivy Peng