Computer Science

RAFI -- A Ray/Work Forwarding Infrastructure for Data Parallel Multi-Node/Multi-GPU Computing

We present RaFI, a CUDA and MPI based software framework that simplifies the task of building GPU-enabled data-parallel software where rays or similar work items need to migrate between different GPUs. RaFI provides a simple interface for…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-29 Ingo Wald , Serkan Demirci , Alper Sahistan , Stefan Zellmann , Andrea Paris , Patrick Moran , Milan Jaros , Tatiana von Landesberger , Ugur Gudukbay , Valerio Pascucci

Audio Jailbreaks in Large Audio-Language Models: Taxonomy, Attack-Defense Analysis, and Cost-Aware Evaluation

Large Audio Language Models (LALMs) expand jailbreak risks from token-level prompting to the full speech perception-to-reasoning pipeline, where unsafe behavior can be induced through semantics, acoustic style, signal artifacts, or internal…

Sound · Computer Science 2026-05-29 Bo-Han Feng , Yu-Hsuan Li Liang , Chien-Feng Liu , You-Hsuan Chang , Yun-Nung Chen

Effective MPI: User-defined Datatypes and Cartesian Communicators for Zero-copy All-to-all Communication in Multidimensional Tori

We present and show how to implement a non-trivial all-to-all communication algorithm for arbitrary $d$-dimensional tori effectively in MPI. Given a factorization of the number of processes $p$ into $d$ factors that can be mapped onto a…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-29 Jesper Larsson Träff

HoliTok:A Coutinuous Holistic Tokenization with Robust Dual Capabilities of Speech Generation and Understanding

Unified speech foundation models require a holistic tokenization space that is both learnable by language models and decodable into high-quality waveforms. Existing speech tokenizers, however, often fail to satisfy these requirements…

Sound · Computer Science 2026-05-29 Bohan Li , Shi Lian , Hankun Wang , Yiwei Guo , Yu Xi , Zhihan Li , Da Zheng , Colin Zhang , Kai Yu

CARM Tool: Cache-Aware Roofline Model Automatic Benchmarking and Application Analysis

In recent years, HPC systems and CPU architectures as their central components, have become increasingly complex, making application development and optimization quite challenging. In this respect, intuitive performance models like the…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-29 José Morgado , Leonel Sousa , Aleksandar Ilic

PRISM: Processing-In-Memory Sparse MTTKRP for Tensor Decomposition Acceleration

Sparse tensors are the most used representation of sparse multidimensional data. Operations that decompose them, selecting their most important features while reducing their dimension, have become prevalent procedures in machine learning.…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-29 Daniel Pacheco , Leonel Sousa , Aleksandar Ilic

AMDP: Asynchronous Multi-Directional Pipeline Parallelism for Large-Scale Models Training

Pipeline parallelism is essential for large-scale model training, but existing asynchronous approaches often degrade convergence due to parameter mismatch between forward and backward passes. We propose Asynchronous Multi-Directional…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-29 Ling Chen , Houming Wu , Wenjie Yu

COMET: Concept Space Dissection of the Modality Gap in Audio-Text Multimodal Contrastive Embeddings

Contrastive Language-Audio Pretraining (CLAP) models are widely used for audio understanding and support modality-agnostic condition swapping in many zero-shot applications. However, their performance is heavily affected by the modality gap…

Sound · Computer Science 2026-05-29 Yonggang Zhu , Liting Gao , Aidong Men , Wenwu Wang

TC-MIS: Maximal Independent Set on Tensor-cores

Maximal Independent Set (MIS) in a graph is a fundamental problem with applications in resource allocation, scheduling, and network optimization. Although graphs are inherently un-structured and challenging for GPU parallelism due to…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-29 Prajjwal Nijhara , Dip Sankar Banerjee

Design and Implementation of a Serverless MapReduce Framework for Scalable Data Pipelines

Modern logistics systems tend to generate continuous streams of data from sources such as GPS, IoT sensors, and logistics management systems. The aggregation, processing, and analysis of data have become vital for monitoring operations,…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-29 Angelos Dorotheos Chatzopoulos , Babis Andreou , Kakia Panagidi , Stathes Hadjiefthymiades

Audio Deepfake Detection with Half-Truth Localisation Using Cross-Attentive Feature Fusion

Audio deepfake detection is well-studied as a binary problem, but partially manipulated speech, where a short synthesised segment is spliced into an otherwise genuine utterance, poses a harder and more realistic threat. Detecting such…

Sound · Computer Science 2026-05-29 S. Sutharya , Remya K. Sasi

Silent Data Corruption Protection through Efficient Task Replication

The trend of increasing cluster sizes of supercomputers leads to a growing susceptibility to Silent Data Corruption (SDC) that can invalidate program results. A common strategy for SDC protection is replication, where the computation is…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-29 Mia Reitz , Claudia Fohry

Understanding and Reducing Metadata-Driven Host Overheads in Sampling-Based GNN Training

Modern deep learning workloads increasingly exhibit dynamic, metadata-driven execution, where runtime-generated information determines memory provisioning and kernel launch decisions. In sampling-based graph neural network (GNN) training,…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-29 Yidong Gong , Saima Afrin , Yuchen Ma , Guannan Wang , Bin Ren , Pradeep Kumar

ChildVox: A Speech, Audio, and Large Audio-Language Model Benchmark in Understanding and Characterizing Sound across Childhood

We present ChildVox, a novel benchmark for characterizing the diverse acoustic signals through which children communicate. Specifically, ChildVox follows the full developmental trajectory from birth through school age, covering…

Sound · Computer Science 2026-05-29 Tiantian Feng , Anfeng Xu , Xuan Shi , Aditya Kommineni , Shakhrul Iman Siam , Megan Micheletti , Zhonghao Shi , Helen Tager-Flusberg , Mi Zhang , Lynn K. Perry , Catherine Lord , Daniel Messinger , Shrikanth Narayanan

HPC-vQPU: A Service-Export Architecture for Virtual QPUs on Batch-Scheduled HPC Systems

Device-aware quantum simulation increasingly requires HPC-scale accelerators, yet secure supercomputers expose batch-scheduled execution environments rather than the interactive, backend-oriented interfaces expected by quantum software. The…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-29 Shusen Liu , Pascal Jahan Elahi , Ugo Varetto

Monte Cimone v3: Where RISC-V Stands in High-Performance Computing

The Monte Cimone project provides a RISC-V testbed for High-Performacne Computing cluster. This paper presents Monte Cimone v3 (MCv3), the third iteration of the Monte Cimone RISC-V HPC cluster, integrating the SOPHGO Sophon SG2044…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-29 Emanuele Venieri , Simone Manoni , Giacomo Madella , Federico Proverbio , Federico Ficarelli , Luca Benini , Andrea Bartolini

EVA-Bench: A New End-to-end Framework for Evaluating Voice Agents

Voice agents, artificial intelligence systems that conduct spoken conversations to complete tasks, are increasingly deployed across enterprise applications. However, no existing benchmark jointly addresses two core evaluation challenges:…

Sound · Computer Science 2026-05-29 Tara Bogavelli , Gabrielle Gauthier Melançon , Katrina Stankiewicz , Oluwanifemi Bamgbose , Fanny Riols , Hoang H. Nguyen , Raghav Mehndiratta , Lindsay Devon Brin , Joseph Marinier , Hari Subramani , Anil Madamala , Sridhar Krishna Nemala , Srinivas Sunkara

MedMosaic: A Challenging Large Scale Benchmark of Diverse Medical Audio

Medical audio data is difficult to collect due to privacy regulations and high annotation costs arising from domain expertise. Thus, existing benchmarks tend to underrepresent complex medical audio scenarios. To address this challenge, we…

Sound · Computer Science 2026-05-29 Harshit Rajgarhia , Shuubham Ojha , Asif Shaik , Akhil Pothanapalli , Rachuri Lokesh , Abhishek Mukherji , Prasanna Desikan

BEAT: Tokenizing and Generating Symbolic Music by Uniform Temporal Steps

Tokenizing music to fit the general framework of language models is a compelling challenge, especially considering the diverse symbolic structures in which music can be represented (e.g., sequences, grids, and graphs). To date, most…

Sound · Computer Science 2026-05-29 Lekai Qian , Haoyu Gu , Jingwei Zhao , Ziyu Wang

SPEED-Bench: A Unified and Diverse Benchmark for Speculative Decoding

Speculative Decoding (SD) has emerged as a critical technique for accelerating Large Language Model (LLM) inference. Unlike deterministic system optimizations, SD performance is inherently data-dependent, meaning that diverse and…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-29 Talor Abramovich , Maor Ashkenazi , Izzy Putterman , Benjamin Chislett , Tiyasa Mitra , Bita Darvish Rouhani , Ran Zilberstein , Yonatan Geifman