Computer Science

Zero-Scan Data Quality: Leveraging Table Format Metadata for Continuous Observability at Scale

Modern table formats such as Apache Iceberg compute and store metadata-commit timestamps, record counts, and column-level statistics such as null counts and value bounds at write time as part of file writing. These statistics serve query…

Databases · Computer Science 2026-05-29 Mohit Verma , Shantanu Rawat , Christian Bush , Sumedh Sakdeo , Lokesh Amarnath Ravindranathan , Dwarak Bakshi

RAFI -- A Ray/Work Forwarding Infrastructure for Data Parallel Multi-Node/Multi-GPU Computing

We present RaFI, a CUDA and MPI based software framework that simplifies the task of building GPU-enabled data-parallel software where rays or similar work items need to migrate between different GPUs. RaFI provides a simple interface for…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-29 Ingo Wald , Serkan Demirci , Alper Sahistan , Stefan Zellmann , Andrea Paris , Patrick Moran , Milan Jaros , Tatiana von Landesberger , Ugur Gudukbay , Valerio Pascucci

Unveiling the Visual Counting Bottleneck in Vision-Language Models

While Large Vision-Language Models (VLMs) excel at interpolation, they suffer catastrophic failures in systematic generalization, most notably in visual counting. In this work, we investigate this extrapolation bottleneck by deconstructing…

Multimedia · Computer Science 2026-05-29 Xingzhou Pang , Yifan Hou , Junling Wang , Mrinmaya Sachan

The Missing Dimensions in Geo-Distributed Database Evaluation

Geo-distributed OLTP databases are widely deployed across cloud regions, yet current evaluation practices do not cover the challenges of this aspect. Existing benchmarks assume stable network conditions; they lack explicit settings for data…

Databases · Computer Science 2026-05-29 Oto Mraz , Kyriakos Psarakis , George Christodoulou , Paris Carbone , Asterios Katsifodimos

Effective MPI: User-defined Datatypes and Cartesian Communicators for Zero-copy All-to-all Communication in Multidimensional Tori

We present and show how to implement a non-trivial all-to-all communication algorithm for arbitrary $d$-dimensional tori effectively in MPI. Given a factorization of the number of processes $p$ into $d$ factors that can be mapped onto a…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-29 Jesper Larsson Träff

CARM Tool: Cache-Aware Roofline Model Automatic Benchmarking and Application Analysis

In recent years, HPC systems and CPU architectures as their central components, have become increasingly complex, making application development and optimization quite challenging. In this respect, intuitive performance models like the…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-29 José Morgado , Leonel Sousa , Aleksandar Ilic

PRISM: Processing-In-Memory Sparse MTTKRP for Tensor Decomposition Acceleration

Sparse tensors are the most used representation of sparse multidimensional data. Operations that decompose them, selecting their most important features while reducing their dimension, have become prevalent procedures in machine learning.…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-29 Daniel Pacheco , Leonel Sousa , Aleksandar Ilic

Towards Reliable Agentic Progressive Text-to-Visualization with Verification Rules

Text-to-Visualization (Text-to-Vis) translates natural language queries into visualization query languages, enabling non-expert users to perform data analysis. However, most existing methods follow a one-shot paradigm that requires users to…

Databases · Computer Science 2026-05-29 Wenxin Xu , Chen Jason Zhang , Xiaoyong Wei , Haoyang Li , Hwanhee Kim , Yuanfeng Song , Raymond Chi-Wing Wong

AMDP: Asynchronous Multi-Directional Pipeline Parallelism for Large-Scale Models Training

Pipeline parallelism is essential for large-scale model training, but existing asynchronous approaches often degrade convergence due to parameter mismatch between forward and backward passes. We propose Asynchronous Multi-Directional…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-29 Ling Chen , Houming Wu , Wenjie Yu

TC-MIS: Maximal Independent Set on Tensor-cores

Maximal Independent Set (MIS) in a graph is a fundamental problem with applications in resource allocation, scheduling, and network optimization. Although graphs are inherently un-structured and challenging for GPU parallelism due to…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-29 Prajjwal Nijhara , Dip Sankar Banerjee

State-Anchored Complete-View Distillation for Robust Conversational Multimodal Emotion Recognition

Conversational multimodal emotion recognition (MER) requires reliable prediction when language, acoustic, or visual observations are missing or unreliable. Many missing-modality methods reconstruct absent inputs, yet such recovery can be…

Multimedia · Computer Science 2026-05-29 Zhaoyan Pan , Xiangdong Li , Wenke Wu , Mengting Ma , Ye Lou , Ji Zhou , Jiatong Pan , Wei Zhang

Design and Implementation of a Serverless MapReduce Framework for Scalable Data Pipelines

Modern logistics systems tend to generate continuous streams of data from sources such as GPS, IoT sensors, and logistics management systems. The aggregation, processing, and analysis of data have become vital for monitoring operations,…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-29 Angelos Dorotheos Chatzopoulos , Babis Andreou , Kakia Panagidi , Stathes Hadjiefthymiades

Silent Data Corruption Protection through Efficient Task Replication

The trend of increasing cluster sizes of supercomputers leads to a growing susceptibility to Silent Data Corruption (SDC) that can invalidate program results. A common strategy for SDC protection is replication, where the computation is…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-29 Mia Reitz , Claudia Fohry

Understanding and Reducing Metadata-Driven Host Overheads in Sampling-Based GNN Training

Modern deep learning workloads increasingly exhibit dynamic, metadata-driven execution, where runtime-generated information determines memory provisioning and kernel launch decisions. In sampling-based graph neural network (GNN) training,…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-29 Yidong Gong , Saima Afrin , Yuchen Ma , Guannan Wang , Bin Ren , Pradeep Kumar

One Ring to Shuffle Them All: Scalable Intra-Process Data Redistribution with Ring-Buffer Shuffle in Redpanda Oxla

As server CPUs scale to dozens and now hundreds of cores per socket, parallel query engines must rethink how they redistribute data between threads. Partitioned operators such as hash joins and aggregations require frequent data…

Databases · Computer Science 2026-05-29 Adam Szymański , Tyler Akidau

ScanTwin: Simulating Performance Regressions Without Access to Tenant Data

In cloud data platforms, developers often encounter performance regressions that occur in specific tenant datasets. However, due to confidentiality constraints, they cannot access the original data, which makes it difficult to reproduce…

Databases · Computer Science 2026-05-29 Donghyun Sohn , Jennie Rogers

IORM: Hierarchical I/O Governance for Thousands of Consolidated Databases on Oracle Exadata

Oracle Exadata consolidates thousands of tenant databases onto shared storage infrastructure deployed at hundreds of customer sites worldwide. Oracle Multitenant architecture enables this extreme density, with thousands of tenant databases…

Databases · Computer Science 2026-05-29 Rajarshi Chowdhury , Akshay Shah , Zakaria Alrmaih , Chenhao Guo , Anubhav Singh , Sue Lee

HPC-vQPU: A Service-Export Architecture for Virtual QPUs on Batch-Scheduled HPC Systems

Device-aware quantum simulation increasingly requires HPC-scale accelerators, yet secure supercomputers expose batch-scheduled execution environments rather than the interactive, backend-oriented interfaces expected by quantum software. The…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-29 Shusen Liu , Pascal Jahan Elahi , Ugo Varetto

Monte Cimone v3: Where RISC-V Stands in High-Performance Computing

The Monte Cimone project provides a RISC-V testbed for High-Performacne Computing cluster. This paper presents Monte Cimone v3 (MCv3), the third iteration of the Monte Cimone RISC-V HPC cluster, integrating the SOPHGO Sophon SG2044…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-29 Emanuele Venieri , Simone Manoni , Giacomo Madella , Federico Proverbio , Federico Ficarelli , Luca Benini , Andrea Bartolini

SPEED-Bench: A Unified and Diverse Benchmark for Speculative Decoding

Speculative Decoding (SD) has emerged as a critical technique for accelerating Large Language Model (LLM) inference. Unlike deterministic system optimizations, SD performance is inherently data-dependent, meaning that diverse and…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-29 Talor Abramovich , Maor Ashkenazi , Izzy Putterman , Benjamin Chislett , Tiyasa Mitra , Bita Darvish Rouhani , Ran Zilberstein , Yonatan Geifman