Computer Science

Zero-Scan Data Quality: Leveraging Table Format Metadata for Continuous Observability at Scale

Modern table formats such as Apache Iceberg compute and store metadata-commit timestamps, record counts, and column-level statistics such as null counts and value bounds at write time as part of file writing. These statistics serve query…

Databases · Computer Science 2026-05-29 Mohit Verma , Shantanu Rawat , Christian Bush , Sumedh Sakdeo , Lokesh Amarnath Ravindranathan , Dwarak Bakshi

The Missing Dimensions in Geo-Distributed Database Evaluation

Geo-distributed OLTP databases are widely deployed across cloud regions, yet current evaluation practices do not cover the challenges of this aspect. Existing benchmarks assume stable network conditions; they lack explicit settings for data…

Databases · Computer Science 2026-05-29 Oto Mraz , Kyriakos Psarakis , George Christodoulou , Paris Carbone , Asterios Katsifodimos

Neural Network Verification using Partial Multi-Neuron Relaxation

The increasing integration of deep neural networks in critical systems has spawned a theoretical and practical interest in formally guaranteeing safety properties about their behavior. To achieve this, contemporary verification algorithms…

Logic in Computer Science · Computer Science 2026-05-29 Ido Shmuel , Guy Katz

A Rust-to-Lean Verification Pipeline with AI Provers: An Experience Report

We describe a verification pipeline that takes production Rust cryptographic code and produces machine-checked correctness proofs in Lean 4. The pipeline combines three components: symbolic extraction tools (Charon and Aeneas, or Hax) that…

Logic in Computer Science · Computer Science 2026-05-29 Natalia Klaus , Palina Tolmach , Juan Conejero

Audio Jailbreaks in Large Audio-Language Models: Taxonomy, Attack-Defense Analysis, and Cost-Aware Evaluation

Large Audio Language Models (LALMs) expand jailbreak risks from token-level prompting to the full speech perception-to-reasoning pipeline, where unsafe behavior can be induced through semantics, acoustic style, signal artifacts, or internal…

Sound · Computer Science 2026-05-29 Bo-Han Feng , Yu-Hsuan Li Liang , Chien-Feng Liu , You-Hsuan Chang , Yun-Nung Chen

HoliTok:A Coutinuous Holistic Tokenization with Robust Dual Capabilities of Speech Generation and Understanding

Unified speech foundation models require a holistic tokenization space that is both learnable by language models and decodable into high-quality waveforms. Existing speech tokenizers, however, often fail to satisfy these requirements…

Sound · Computer Science 2026-05-29 Bohan Li , Shi Lian , Hankun Wang , Yiwei Guo , Yu Xi , Zhihan Li , Da Zheng , Colin Zhang , Kai Yu

Strong (D)QBF Dependency Schemes via Pure Paths with Applications to Proof Checking

Certification for Quantified Boolean Formulas (QBF) and Dependency Quantified Boolean Formulas (DQBF) is an ongoing challenge. Recent proof complexity work has shown that the majority of QBF and DQBF techniques can be p-simulated by using…

Logic in Computer Science · Computer Science 2026-05-29 Leroy Chew , Tomáš Peitl

Towards Reliable Agentic Progressive Text-to-Visualization with Verification Rules

Text-to-Visualization (Text-to-Vis) translates natural language queries into visualization query languages, enabling non-expert users to perform data analysis. However, most existing methods follow a one-shot paradigm that requires users to…

Databases · Computer Science 2026-05-29 Wenxin Xu , Chen Jason Zhang , Xiaoyong Wei , Haoyang Li , Hwanhee Kim , Yuanfeng Song , Raymond Chi-Wing Wong

COMET: Concept Space Dissection of the Modality Gap in Audio-Text Multimodal Contrastive Embeddings

Contrastive Language-Audio Pretraining (CLAP) models are widely used for audio understanding and support modality-agnostic condition swapping in many zero-shot applications. However, their performance is heavily affected by the modality gap…

Sound · Computer Science 2026-05-29 Yonggang Zhu , Liting Gao , Aidong Men , Wenwu Wang

Audio Deepfake Detection with Half-Truth Localisation Using Cross-Attentive Feature Fusion

Audio deepfake detection is well-studied as a binary problem, but partially manipulated speech, where a short synthesised segment is spliced into an otherwise genuine utterance, poses a harder and more realistic threat. Detecting such…

Sound · Computer Science 2026-05-29 S. Sutharya , Remya K. Sasi

Unifying Semantic Path Order and Weighted Path Order

Monotonic semantic path orders and weighted path orders are powerful reduction orders for proving termination of term rewrite systems. In this paper we present their simple unification as reduction orders and reduction pairs. We also…

Logic in Computer Science · Computer Science 2026-05-29 Teppei Saito , Nao Hirokawa

ChildVox: A Speech, Audio, and Large Audio-Language Model Benchmark in Understanding and Characterizing Sound across Childhood

We present ChildVox, a novel benchmark for characterizing the diverse acoustic signals through which children communicate. Specifically, ChildVox follows the full developmental trajectory from birth through school age, covering…

Sound · Computer Science 2026-05-29 Tiantian Feng , Anfeng Xu , Xuan Shi , Aditya Kommineni , Shakhrul Iman Siam , Megan Micheletti , Zhonghao Shi , Helen Tager-Flusberg , Mi Zhang , Lynn K. Perry , Catherine Lord , Daniel Messinger , Shrikanth Narayanan

One Ring to Shuffle Them All: Scalable Intra-Process Data Redistribution with Ring-Buffer Shuffle in Redpanda Oxla

As server CPUs scale to dozens and now hundreds of cores per socket, parallel query engines must rethink how they redistribute data between threads. Partitioned operators such as hash joins and aggregations require frequent data…

Databases · Computer Science 2026-05-29 Adam Szymański , Tyler Akidau

ScanTwin: Simulating Performance Regressions Without Access to Tenant Data

In cloud data platforms, developers often encounter performance regressions that occur in specific tenant datasets. However, due to confidentiality constraints, they cannot access the original data, which makes it difficult to reproduce…

Databases · Computer Science 2026-05-29 Donghyun Sohn , Jennie Rogers

IORM: Hierarchical I/O Governance for Thousands of Consolidated Databases on Oracle Exadata

Oracle Exadata consolidates thousands of tenant databases onto shared storage infrastructure deployed at hundreds of customer sites worldwide. Oracle Multitenant architecture enables this extreme density, with thousands of tenant databases…

Databases · Computer Science 2026-05-29 Rajarshi Chowdhury , Akshay Shah , Zakaria Alrmaih , Chenhao Guo , Anubhav Singh , Sue Lee

mstlo: Efficient Online Monitoring of Signal Temporal Logic

We present mstlo (mistletoe), a Rust library for high-performance online monitoring of signal temporal logic (STL), with Python bindings. The library provides: (i) a unified interface for multiple STL semantics, including Robust…

Logic in Computer Science · Computer Science 2026-05-29 Andreas Kaag Thomsen , Niels Viggo Stark Madsen , Valdemar Tang Evans , Thomas David Wright , Lukas Esterle , Peter Gorm Larsen

Keep the Proof State Live: Snapshotting for Efficient Tactic Search in Lean 4

Automated theorem proving systems built on Lean 4 increasingly rely on parallel tactic search over partially specified proofs, such as those generated by Draft-Sketch-Prove (DSP) pipelines. In current systems, each search branch…

Logic in Computer Science · Computer Science 2026-05-29 Austin Shen , Yunong Shi

EVA-Bench: A New End-to-end Framework for Evaluating Voice Agents

Voice agents, artificial intelligence systems that conduct spoken conversations to complete tasks, are increasingly deployed across enterprise applications. However, no existing benchmark jointly addresses two core evaluation challenges:…

Sound · Computer Science 2026-05-29 Tara Bogavelli , Gabrielle Gauthier Melançon , Katrina Stankiewicz , Oluwanifemi Bamgbose , Fanny Riols , Hoang H. Nguyen , Raghav Mehndiratta , Lindsay Devon Brin , Joseph Marinier , Hari Subramani , Anil Madamala , Sridhar Krishna Nemala , Srinivas Sunkara

MedMosaic: A Challenging Large Scale Benchmark of Diverse Medical Audio

Medical audio data is difficult to collect due to privacy regulations and high annotation costs arising from domain expertise. Thus, existing benchmarks tend to underrepresent complex medical audio scenarios. To address this challenge, we…

Sound · Computer Science 2026-05-29 Harshit Rajgarhia , Shuubham Ojha , Asif Shaik , Akhil Pothanapalli , Rachuri Lokesh , Abhishek Mukherji , Prasanna Desikan

BEAT: Tokenizing and Generating Symbolic Music by Uniform Temporal Steps

Tokenizing music to fit the general framework of language models is a compelling challenge, especially considering the diverse symbolic structures in which music can be represented (e.g., sequences, grids, and graphs). To date, most…

Sound · Computer Science 2026-05-29 Lekai Qian , Haoyu Gu , Jingwei Zhao , Ziyu Wang