Computer Science

RAFI -- A Ray/Work Forwarding Infrastructure for Data Parallel Multi-Node/Multi-GPU Computing

We present RaFI, a CUDA and MPI based software framework that simplifies the task of building GPU-enabled data-parallel software where rays or similar work items need to migrate between different GPUs. RaFI provides a simple interface for…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-29 Ingo Wald , Serkan Demirci , Alper Sahistan , Stefan Zellmann , Andrea Paris , Patrick Moran , Milan Jaros , Tatiana von Landesberger , Ugur Gudukbay , Valerio Pascucci

Neural Network Verification using Partial Multi-Neuron Relaxation

The increasing integration of deep neural networks in critical systems has spawned a theoretical and practical interest in formally guaranteeing safety properties about their behavior. To achieve this, contemporary verification algorithms…

Logic in Computer Science · Computer Science 2026-05-29 Ido Shmuel , Guy Katz

A Rust-to-Lean Verification Pipeline with AI Provers: An Experience Report

We describe a verification pipeline that takes production Rust cryptographic code and produces machine-checked correctness proofs in Lean 4. The pipeline combines three components: symbolic extraction tools (Charon and Aeneas, or Hax) that…

Logic in Computer Science · Computer Science 2026-05-29 Natalia Klaus , Palina Tolmach , Juan Conejero

Effective MPI: User-defined Datatypes and Cartesian Communicators for Zero-copy All-to-all Communication in Multidimensional Tori

We present and show how to implement a non-trivial all-to-all communication algorithm for arbitrary $d$-dimensional tori effectively in MPI. Given a factorization of the number of processes $p$ into $d$ factors that can be mapped onto a…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-29 Jesper Larsson Träff

Strong (D)QBF Dependency Schemes via Pure Paths with Applications to Proof Checking

Certification for Quantified Boolean Formulas (QBF) and Dependency Quantified Boolean Formulas (DQBF) is an ongoing challenge. Recent proof complexity work has shown that the majority of QBF and DQBF techniques can be p-simulated by using…

Logic in Computer Science · Computer Science 2026-05-29 Leroy Chew , Tomáš Peitl

CARM Tool: Cache-Aware Roofline Model Automatic Benchmarking and Application Analysis

In recent years, HPC systems and CPU architectures as their central components, have become increasingly complex, making application development and optimization quite challenging. In this respect, intuitive performance models like the…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-29 José Morgado , Leonel Sousa , Aleksandar Ilic

PRISM: Processing-In-Memory Sparse MTTKRP for Tensor Decomposition Acceleration

Sparse tensors are the most used representation of sparse multidimensional data. Operations that decompose them, selecting their most important features while reducing their dimension, have become prevalent procedures in machine learning.…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-29 Daniel Pacheco , Leonel Sousa , Aleksandar Ilic

AMDP: Asynchronous Multi-Directional Pipeline Parallelism for Large-Scale Models Training

Pipeline parallelism is essential for large-scale model training, but existing asynchronous approaches often degrade convergence due to parameter mismatch between forward and backward passes. We propose Asynchronous Multi-Directional…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-29 Ling Chen , Houming Wu , Wenjie Yu

TC-MIS: Maximal Independent Set on Tensor-cores

Maximal Independent Set (MIS) in a graph is a fundamental problem with applications in resource allocation, scheduling, and network optimization. Although graphs are inherently un-structured and challenging for GPU parallelism due to…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-29 Prajjwal Nijhara , Dip Sankar Banerjee

Design and Implementation of a Serverless MapReduce Framework for Scalable Data Pipelines

Modern logistics systems tend to generate continuous streams of data from sources such as GPS, IoT sensors, and logistics management systems. The aggregation, processing, and analysis of data have become vital for monitoring operations,…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-29 Angelos Dorotheos Chatzopoulos , Babis Andreou , Kakia Panagidi , Stathes Hadjiefthymiades

Silent Data Corruption Protection through Efficient Task Replication

The trend of increasing cluster sizes of supercomputers leads to a growing susceptibility to Silent Data Corruption (SDC) that can invalidate program results. A common strategy for SDC protection is replication, where the computation is…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-29 Mia Reitz , Claudia Fohry

Unifying Semantic Path Order and Weighted Path Order

Monotonic semantic path orders and weighted path orders are powerful reduction orders for proving termination of term rewrite systems. In this paper we present their simple unification as reduction orders and reduction pairs. We also…

Logic in Computer Science · Computer Science 2026-05-29 Teppei Saito , Nao Hirokawa

Understanding and Reducing Metadata-Driven Host Overheads in Sampling-Based GNN Training

Modern deep learning workloads increasingly exhibit dynamic, metadata-driven execution, where runtime-generated information determines memory provisioning and kernel launch decisions. In sampling-based graph neural network (GNN) training,…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-29 Yidong Gong , Saima Afrin , Yuchen Ma , Guannan Wang , Bin Ren , Pradeep Kumar

libhmm: A Modern C++20 Library for Hidden Markov Models with Correct MLE Emission M-Steps

We describe libhmm, a C++20 library for Hidden Markov Model parameter estimation, sequence decoding, and model selection. libhmm addresses two gaps in existing software: the absence of a well-maintained, zero-dependency C++ HMM library…

Mathematical Software · Computer Science 2026-05-29 Gary Wolfman

HPC-vQPU: A Service-Export Architecture for Virtual QPUs on Batch-Scheduled HPC Systems

Device-aware quantum simulation increasingly requires HPC-scale accelerators, yet secure supercomputers expose batch-scheduled execution environments rather than the interactive, backend-oriented interfaces expected by quantum software. The…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-29 Shusen Liu , Pascal Jahan Elahi , Ugo Varetto

mstlo: Efficient Online Monitoring of Signal Temporal Logic

We present mstlo (mistletoe), a Rust library for high-performance online monitoring of signal temporal logic (STL), with Python bindings. The library provides: (i) a unified interface for multiple STL semantics, including Robust…

Logic in Computer Science · Computer Science 2026-05-29 Andreas Kaag Thomsen , Niels Viggo Stark Madsen , Valdemar Tang Evans , Thomas David Wright , Lukas Esterle , Peter Gorm Larsen

Keep the Proof State Live: Snapshotting for Efficient Tactic Search in Lean 4

Automated theorem proving systems built on Lean 4 increasingly rely on parallel tactic search over partially specified proofs, such as those generated by Draft-Sketch-Prove (DSP) pipelines. In current systems, each search branch…

Logic in Computer Science · Computer Science 2026-05-29 Austin Shen , Yunong Shi

Monte Cimone v3: Where RISC-V Stands in High-Performance Computing

The Monte Cimone project provides a RISC-V testbed for High-Performacne Computing cluster. This paper presents Monte Cimone v3 (MCv3), the third iteration of the Monte Cimone RISC-V HPC cluster, integrating the SOPHGO Sophon SG2044…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-29 Emanuele Venieri , Simone Manoni , Giacomo Madella , Federico Proverbio , Federico Ficarelli , Luca Benini , Andrea Bartolini

A Linear Temporal Logic of Frequencies on Series of Events

This paper introduces LTLF, a temporal logic designed to express the frequency properties of event series in a natural but rigorous manner. By introducing novel, measure-sensitive operators, LTLF allows for the evaluation of frequencies and…

Logic in Computer Science · Computer Science 2026-05-29 Melissa Antonelli , Leonardo Ceragioli , Alessandro Giuseppe Buda , Giuseppe Primiero

SPEED-Bench: A Unified and Diverse Benchmark for Speculative Decoding

Speculative Decoding (SD) has emerged as a critical technique for accelerating Large Language Model (LLM) inference. Unlike deterministic system optimizations, SD performance is inherently data-dependent, meaning that diverse and…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-29 Talor Abramovich , Maor Ashkenazi , Izzy Putterman , Benjamin Chislett , Tiyasa Mitra , Bita Darvish Rouhani , Ran Zilberstein , Yonatan Geifman