Computer Science

Zero-Scan Data Quality: Leveraging Table Format Metadata for Continuous Observability at Scale

Modern table formats such as Apache Iceberg compute and store metadata-commit timestamps, record counts, and column-level statistics such as null counts and value bounds at write time as part of file writing. These statistics serve query…

Databases · Computer Science 2026-05-29 Mohit Verma , Shantanu Rawat , Christian Bush , Sumedh Sakdeo , Lokesh Amarnath Ravindranathan , Dwarak Bakshi

The Missing Dimensions in Geo-Distributed Database Evaluation

Geo-distributed OLTP databases are widely deployed across cloud regions, yet current evaluation practices do not cover the challenges of this aspect. Existing benchmarks assume stable network conditions; they lack explicit settings for data…

Databases · Computer Science 2026-05-29 Oto Mraz , Kyriakos Psarakis , George Christodoulou , Paris Carbone , Asterios Katsifodimos

Towards Reliable Agentic Progressive Text-to-Visualization with Verification Rules

Text-to-Visualization (Text-to-Vis) translates natural language queries into visualization query languages, enabling non-expert users to perform data analysis. However, most existing methods follow a one-shot paradigm that requires users to…

Databases · Computer Science 2026-05-29 Wenxin Xu , Chen Jason Zhang , Xiaoyong Wei , Haoyang Li , Hwanhee Kim , Yuanfeng Song , Raymond Chi-Wing Wong

RTP-LLM: High-Performance Alibaba LLM Inference Engine

Large Language Models (LLMs) have revolutionized AI applications, but deploying them at scale presents significant challenges. We present RTP-LLM, a high-performance inference engine for industrial-scale LLM deployment, successfully…

Operating Systems · Computer Science 2026-05-29 Boyu Tan , Jiarui Guo , Zongwei Lv , Hanbo Sun , Tong Yang , Kan Liu , Xinfei Shi , Zetao Hu , Yaxin Yu , Chi Zhang , Jianning Zhang , Xi Yang , Wei Zhang , Bo Cai , Silu Zhou , Xiyu Wang , Na He , Yinghao Yu , Wending Bao , Guiyang Huang , Yuxing Yuan , Juncheng Yin , Nan Wang , Lin Yang , Zechao Zhang , Lu Chen , Guoding Li , Tao Lan , Lin Qu

Nucleolus Computation by Non-Zero-Constrained Optimization

We extend the list of games where the nucleolus is computable in polynomial time. Based on the classical MPS scheme, nucleolus computation can be reduced to the problem of finding a coalition with minimum excess that does not belong to a…

Computer Science and Game Theory · Computer Science 2026-05-29 Daniel Ebert , Antonia Ellerbrock

Bridging Semantics and Strategy: A Dual-Stream Graph Network for Equitable Negotiation Forecasting

Forecasting outcomes in mixed-motive negotiations requires integrating explicit linguistic cues with latent strategic constraints, such as budgets and alternatives. Existing computational models often fail to adapt to varying task…

Computer Science and Game Theory · Computer Science 2026-05-29 Moirangthem Tiken Singh

One Ring to Shuffle Them All: Scalable Intra-Process Data Redistribution with Ring-Buffer Shuffle in Redpanda Oxla

As server CPUs scale to dozens and now hundreds of cores per socket, parallel query engines must rethink how they redistribute data between threads. Partitioned operators such as hash joins and aggregations require frequent data…

Databases · Computer Science 2026-05-29 Adam Szymański , Tyler Akidau

ScanTwin: Simulating Performance Regressions Without Access to Tenant Data

In cloud data platforms, developers often encounter performance regressions that occur in specific tenant datasets. However, due to confidentiality constraints, they cannot access the original data, which makes it difficult to reproduce…

Databases · Computer Science 2026-05-29 Donghyun Sohn , Jennie Rogers

IORM: Hierarchical I/O Governance for Thousands of Consolidated Databases on Oracle Exadata

Oracle Exadata consolidates thousands of tenant databases onto shared storage infrastructure deployed at hundreds of customer sites worldwide. Oracle Multitenant architecture enables this extreme density, with thousands of tenant databases…

Databases · Computer Science 2026-05-29 Rajarshi Chowdhury , Akshay Shah , Zakaria Alrmaih , Chenhao Guo , Anubhav Singh , Sue Lee

E2E: Efficient Filtered AKNN Search via Adaptive Termination

Approximate k-Nearest Neighbor (AKNN) search is widely used in vector databases. When vectors carry additional attributes (e.g., labels or numerical values), filtered AKNN search retrieves the nearest vectors to a query vector under…

Databases · Computer Science 2026-05-29 Wenxuan Xia , Mingyu Yang , Wentao Li , Wei Wang

Envy-Free Allocation of Indivisible Goods via Noisy Queries

We introduce a problem of fairly allocating indivisible goods (items) in which the agents' valuations cannot be observed directly, but instead can only be accessed via noisy queries. In the two-agent setting with Gaussian noise and bounded…

Computer Science and Game Theory · Computer Science 2026-05-29 Zihan Li , Yan Hao Ling , Jonathan Scarlett , Warut Suksompong

Grain Theory: Type-Level Granularity Correctness in Data Pipelines

Data transformation correctness is a fundamental challenge in data engineering: how can we verify that pipelines produce correct results before executing on production data? Existing practice relies on iterative testing over materialized…

Databases · Computer Science 2026-05-29 Nikos Karayannidis

Redbench: Workload Synthesis From Cloud Traces

Workload traces from cloud data warehouse providers reveal that standard benchmarks such as TPC-H and TPC-DS fail to capture key characteristics of real-world workloads, including query repetition and string-heavy queries. In this paper, we…

Databases · Computer Science 2026-05-29 Johannes Wehrstein , Roman Heinrich , Mihail Stoian , Skander Krid , Martin Stemmer , Andreas Kipf , Carsten Binnig , Muhammad El-Hindi

Approximate Proportionality in Online Fair Division

We study the online fair division problem, where indivisible goods arrive sequentially and must be allocated immediately and irrevocably. Prior work establishes strong impossibility results for approximating classic notions such as…

Computer Science and Game Theory · Computer Science 2026-05-29 Davin Choo , Winston Fu , Derek Khu , Tzeh Yuan Neoh , Tze-Yang Poon , Nicholas Teh

MVP-Shapley: Feature-based Modeling for Evaluating the Most Valuable Player in Basketball

The burgeoning growth of the esports and multiplayer online gaming community has highlighted the critical importance of evaluating the Most Valuable Player (MVP). The establishment of an explainable and practical MVP evaluation method is…

Computer Science and Game Theory · Computer Science 2026-05-29 Haifeng Sun , Yu Xiong , Runze Wu , Kai Wang , Lan Zhang , Changjie Fan , Shaojie Tang , Xiang-Yang Li

Online Fair Division with Additional Information

We study the problem of fairly allocating indivisible goods to agents in an online setting, where goods arrive sequentially and must be allocated irrevocably. Focusing on the popular fairness notions of envy-freeness, proportionality, and…

Computer Science and Game Theory · Computer Science 2026-05-29 Tzeh Yuan Neoh , Jannik Peters , Nicholas Teh

Is Your LLM Overcharging You? Tokenization, Transparency, and Incentives

State-of-the-art large language models require specialized hardware and substantial energy to operate. As a consequence, cloud-based services that provide access to large language models have become very popular. In these services, the…

Computer Science and Game Theory · Computer Science 2026-05-29 Ander Artola Velasco , Stratis Tsirtsis , Nastaran Okati , Manuel Gomez-Rodriguez

CLVR Ordering of Transactions on AMMs

This paper introduces a trade ordering rule that aims to reduce intra-block price volatility in Automated Market Maker (AMM) powered decentralized exchanges. The ordering rule introduced here, Clever Look-ahead Volatility Reduction (CLVR),…

Computer Science and Game Theory · Computer Science 2026-05-29 Robert McLaughlin , Nir Chemaya , Dingyue Liu , Dahlia Malkhi

Towards Cost-effective LLMs Routing with Batch Prompting

Large Language Model (LLM) serving systems must balance task performance against monetary cost. Two prominent optimization techniques have emerged independently: LLM routing, which directs each query to the most cost-effective model in a…

Databases · Computer Science 2026-05-28 Haotian Xu , Kangfei Zhao , Jiadong Xie

Are Diffusion Language Models Good Database Analysts?

Recent advancements in large language models (LLMs) have significantly improved Natural Language to SQL (NL2SQL) tasks, yet most NL2SQL systems continue to rely on the autoregressive (AR) paradigm. The highly structured nature of SQL makes…

Databases · Computer Science 2026-05-28 Peixian Ma , Xialie Zhuang , Jiantao Tan , Changlun Li , Ruirui Chen , Chengwei Qin