Computer Science

RAFI -- A Ray/Work Forwarding Infrastructure for Data Parallel Multi-Node/Multi-GPU Computing

We present RaFI, a CUDA and MPI based software framework that simplifies the task of building GPU-enabled data-parallel software where rays or similar work items need to migrate between different GPUs. RaFI provides a simple interface for…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-29 Ingo Wald , Serkan Demirci , Alper Sahistan , Stefan Zellmann , Andrea Paris , Patrick Moran , Milan Jaros , Tatiana von Landesberger , Ugur Gudukbay , Valerio Pascucci

GRASP: Plan-Guided Graph Retrieval with Adaptive Fusion and Reranking on Semi-Structured Knowledge Bases

Semi-structured knowledge bases (SKBs) embed textual documents in a typed graph of entities and relations, and underpin applications such as product search, academic paper search, and precision-medicine inquiries. Existing hybrid retrieval…

Information Retrieval · Computer Science 2026-05-29 Yicheng Tao , Yiqun Wang , Xiangchen Song , Xin Luo , Kai Liu , Jie Liu

LexPath: A domain-oriented multi-path framework for legal article retrieval

Legal article retrieval is critical for building traceable and reliable legal AI systems, where conclusions must be grounded in specific legal articles. However, existing open-domain retrieval methods rely heavily on surface-level lexical…

Information Retrieval · Computer Science 2026-05-29 Weixuan Liu , Qingfeng Zhuge , Xuyang Chen

No More K-means:Single-Stage Sparse Coding for Efficient Multi-Vector Retrieval

Multi-vector retrieval (MVR) models, exemplified by ColBERT, have established new benchmarks in retrieval accuracy by preserving fine-grained token-level interactions. However, this granularity imposes prohibitive storage and retrieval…

Information Retrieval · Computer Science 2026-05-29 Lixuan Guo , Yifei Wang , Tiansheng Wen , Aosong Feng , Stefanie Jegelka , Chenyu You

Effective MPI: User-defined Datatypes and Cartesian Communicators for Zero-copy All-to-all Communication in Multidimensional Tori

We present and show how to implement a non-trivial all-to-all communication algorithm for arbitrary $d$-dimensional tori effectively in MPI. Given a factorization of the number of processes $p$ into $d$ factors that can be mapped onto a…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-29 Jesper Larsson Träff

Uncertainty Quantification for Multimodal Retrieval Augmented Generation

Retrieval Augmented Generation (RAG) improves the question answering capabilities of Large Language Models (LLMs) by incorporating external knowledge and has recently been extended to multimodal settings through Vision-Language Models…

Information Retrieval · Computer Science 2026-05-29 Simon Binz , Heydar Soudani , Faegheh Hasibi

Rec-Distill: An Industrial Distillation Pipeline for Large-Scale Recommendation Models

Large recommendation models have demonstrated substantial potential gains under scaling laws, yet these gains are difficult to realize in industrial recommendation systems because real-world deployment requires lightweight models with…

Information Retrieval · Computer Science 2026-05-29 Haoran Ding , Wenlin Zhao , Yuchen Jiang , Juren Li , Jie Zhu , Xinchun Li , Yishujie Zhao , Yi Zhang , Ao Qiao , Jianhui Dong , Cheng Chen , Ziyan Gong , Deping Xie , Peng Xu , Zikai Wang , Yuwei Wang , Huizhi Yang , Zhe Chen , Yuchao Zheng

CARM Tool: Cache-Aware Roofline Model Automatic Benchmarking and Application Analysis

In recent years, HPC systems and CPU architectures as their central components, have become increasingly complex, making application development and optimization quite challenging. In this respect, intuitive performance models like the…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-29 José Morgado , Leonel Sousa , Aleksandar Ilic

PRISM: Processing-In-Memory Sparse MTTKRP for Tensor Decomposition Acceleration

Sparse tensors are the most used representation of sparse multidimensional data. Operations that decompose them, selecting their most important features while reducing their dimension, have become prevalent procedures in machine learning.…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-29 Daniel Pacheco , Leonel Sousa , Aleksandar Ilic

AMDP: Asynchronous Multi-Directional Pipeline Parallelism for Large-Scale Models Training

Pipeline parallelism is essential for large-scale model training, but existing asynchronous approaches often degrade convergence due to parameter mismatch between forward and backward passes. We propose Asynchronous Multi-Directional…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-29 Ling Chen , Houming Wu , Wenjie Yu

TC-MIS: Maximal Independent Set on Tensor-cores

Maximal Independent Set (MIS) in a graph is a fundamental problem with applications in resource allocation, scheduling, and network optimization. Although graphs are inherently un-structured and challenging for GPU parallelism due to…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-29 Prajjwal Nijhara , Dip Sankar Banerjee

Design and Implementation of a Serverless MapReduce Framework for Scalable Data Pipelines

Modern logistics systems tend to generate continuous streams of data from sources such as GPS, IoT sensors, and logistics management systems. The aggregation, processing, and analysis of data have become vital for monitoring operations,…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-29 Angelos Dorotheos Chatzopoulos , Babis Andreou , Kakia Panagidi , Stathes Hadjiefthymiades

FLASH-MAXSIM: IO-Aware Fused Kernels for Late-Interaction Scoring

Late-interaction retrieval (ColBERT, ColPali) scores a query against a document with the MaxSim operator: for every query token, the maximum similarity over the document tokens, summed over query tokens. The standard implementation…

Information Retrieval · Computer Science 2026-05-29 Roi Pony , Adi Raz Goldfarb , Idan Friedman , Daniel Ezer , Udi Barzelay

Silent Data Corruption Protection through Efficient Task Replication

The trend of increasing cluster sizes of supercomputers leads to a growing susceptibility to Silent Data Corruption (SDC) that can invalidate program results. A common strategy for SDC protection is replication, where the computation is…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-29 Mia Reitz , Claudia Fohry

Latent Terms: Dense Retrievers Contain Trivially Extractable BM25-ready Zipfian Vocabularies

We propose Latent Terms, a method revealing that models trained for dense retrieval, whether single- or multi-vector, learn representations that can trivially be decomposed into retrieval-ready sparse features. When trained on frozen…

Information Retrieval · Computer Science 2026-05-29 Benjamin Clavié , Sean Lee , Aamir Shakir , Makoto P. Kato

Understanding and Reducing Metadata-Driven Host Overheads in Sampling-Based GNN Training

Modern deep learning workloads increasingly exhibit dynamic, metadata-driven execution, where runtime-generated information determines memory provisioning and kernel launch decisions. In sampling-based graph neural network (GNN) training,…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-29 Yidong Gong , Saima Afrin , Yuchen Ma , Guannan Wang , Bin Ren , Pradeep Kumar

ACE: Anisotropy-Controllable Embedding for LLM-enhanced Sequential Recommendation

Recent advances in the LLM-as-Extractor paradigm leverage large language models (LLMs) to transfer semantically rich item embeddings into sequential recommendation (SR) backbones. However, LLM-generated embeddings often suffer from strong…

Information Retrieval · Computer Science 2026-05-29 Dongcheol Lee , Hye-young Kim , Jongwuk Lee

UniNote: A Unified Embedding Model for Multimodal Representation and Ranking

Item-to-Item (I2I) retrieval is a fundamental part of modern content platforms, supporting critical industrial workflows from recommendation engines to content auditing. While multimodal embedding methods have advanced general retrieval,…

Information Retrieval · Computer Science 2026-05-29 Jinghan Zhao , Wenwei Jin , Anqi Li , Jintao Tong , Luya Mo , Jiawei Li , Bin Li , Yao Hu

CrossAlpha: An Annual-Report Benchmark for Cross-Market Factor Research

Cross-market factor research studies whether firm-level signals from one or more markets can predict returns in a target market, but existing public benchmarks do not support cross-market disclosure-to-return evaluation. Building such a…

Information Retrieval · Computer Science 2026-05-29 Qian Wang , Zhongyi Tong , Nuo Chen , Zhaomin Wu , Bingsheng He

On the Practice of Scaling Search Conversion Rate Prediction

Scaling a Search Conversion Rate (CVR) prediction model, especially in high-traffic environments, presents a challenge: superior model quality needs to be balanced with strict constraints on training cost and serving latency. This paper…

Information Retrieval · Computer Science 2026-05-29 James Pak , Jyun-Yu Jiang , Fan Zhang , Sen Wang , Taekmin Kim , Henry Tsai , Vijay Rajaram , Juexin Lin , Mohitdeep Singh , Alessandro Magnani , Johnny Chen , Qian Zhao , Rao Fu , Zhirong Liang , Jordan Gilliland , Winter Jiao