数据库 — Scifaro

OnPair: Short Strings Compression for Fast Random Access

We present OnPair, a dictionary-based compression algorithm designed to meet the needs of in-memory database systems that require both high compression and fast random access. Existing methods either achieve strong compression ratios at…

数据库 · 计算机科学 2025-08-05 Francesco Gargiulo , Rossano Venturini

Marlin: Efficient Coordination for Autoscaling Cloud DBMS (Extended Version)

Modern cloud databases are shifting from converged architectures to storage disaggregation, enabling independent scaling and billing of compute and storage. However, cloud databases still rely on external, converged coordination services…

数据库 · 计算机科学 2025-08-05 Wenjie Hu , Guanzhou Hu , Mahesh Balakrishnan , Xiangyao Yu

DBAIOps: A Reasoning LLM-Enhanced Database Operation and Maintenance System using Knowledge Graphs

The operation and maintenance (O&M) of database systems is critical to ensuring system availability and performance, typically requiring expert experience (e.g., identifying metric-to-anomaly relations) for effective diagnosis and recovery.…

数据库 · 计算机科学 2025-08-05 Wei Zhou , Peng Sun , Xuanhe Zhou , Qianglei Zang , Ji Xu , Tieying Zhang , Guoliang Li , Fan Wu

Terabyte-Scale Analytics in the Blink of an Eye

For the past two decades, the DB community has devoted substantial research to take advantage of cheap clusters of machines for distributed data analytics -- we believe that we are at the beginning of a paradigm shift. The scaling laws and…

数据库 · 计算机科学 2025-08-05 Bowen Wu , Wei Cui , Carlo Curino , Matteo Interlandi , Rathijit Sen

Don't Persist All : Efficient Persistent Data Structures

Data structures used in software development have inbuilt redundancy to improve software reliability and to speed up performance. Examples include a Doubly Linked List which allows a faster deletion due to the presence of the previous…

数据库 · 计算机科学 2025-08-05 Pratyush Mahapatra , Mark D. Hill , Michael M. Swift

Meaningful Data Erasure in the Presence of Dependencies

Data regulations like GDPR require systems to support data erasure but leave the definition of "erasure" open to interpretation. This ambiguity makes compliance challenging, especially in databases where data dependencies can lead to erased…

数据库 · 计算机科学 2025-08-04 Vishal Chakraborty , Youri Kaminsky , Sharad Mehrotra , Felix Naumann , Faisal Nawab , Primal Pappachan , Mohammad Sadoghi , Nalini Venkatasubramanian

Cost-Effective, Low Latency Vector Search with Azure Cosmos DB

Vector indexing enables semantic search over diverse corpora and has become an important interface to databases for both users and AI agents. Efficient vector search requires deep optimizations in database systems. This has motivated a new…

数据库 · 计算机科学 2025-08-04 Nitish Upreti , Harsha Vardhan Simhadri , Hari Sudan Sundar , Krishnan Sundaram , Samer Boshra , Balachandar Perumalswamy , Shivam Atri , Martin Chisholm , Revti Raman Singh , Greg Yang , Tamara Hass , Nitesh Dudhey , Subramanyam Pattipaka , Mark Hildebrand , Magdalen Manohar , Jack Moffitt , Haiyang Xu , Naren Datha , Suryansh Gupta , Ravishankar Krishnaswamy , Prashant Gupta , Abhishek Sahu , Hemeswari Varada , Sudhanshu Barthwal , Ritika Mor , James Codella , Shaun Cooper , Kevin Pilch , Simon Moreno , Aayush Kataria , Santosh Kulkarni , Neil Deshpande , Amar Sagare , Dinesh Billa , Zishan Fu , Vipul Vishal

Data-CASE: Grounding Data Regulations for Compliant Data Processing Systems

Data regulations, such as GDPR, are increasingly being adopted globally to protect against unsafe data management practices. Such regulations are, often ambiguous (with multiple valid interpretations) when it comes to defining the expected…

数据库 · 计算机科学 2025-08-04 Vishal Chakraborty , Stacy Ann-Elvy , Sharad Mehrotra , Faisal Nawab , Mohammad Sadoghi , Shantanu Sharma , Nalini Venkatsubhramanian , Farhan Saeed

DataLens: Enhancing Dataset Discovery via Network Topologies

The rapid growth of publicly available textual resources, such as lexicons and domain-specific corpora, presents challenges in efficiently identifying relevant resources. While repositories are emerging, they often lack advanced search and…

数据库 · 计算机科学 2025-08-01 Anaïs Ollagnier , Aline Menin

AutoIndexer: A Reinforcement Learning-Enhanced Index Advisor Towards Scaling Workloads

Efficiently selecting indexes is fundamental to database performance optimization, particularly for systems handling large-scale analytical workloads. While deep reinforcement learning (DRL) has shown promise in automating index selection…

数据库 · 计算机科学 2025-08-01 Taiyi Wang , Eiko Yoneki

SAM: A Stability-Aware Cache Manager for Multi-Tenant Embedded Databases

The co-location of multiple database instances on resource constrained edge nodes creates significant cache contention, where traditional schemes are inefficient and unstable under dynamic workloads. To address this, we present SAM(a…

数据库 · 计算机科学 2025-08-01 Haoran Zhang , Decheng Zuo , Yu Yan , Zhiyu Liang , Hongzhi Wang

Jelly: a Fast and Convenient RDF Serialization Format

Existing RDF serialization formats such as Turtle, N-Quads, and JSON-LD are widely used for communication and storage in knowledge graph and Semantic Web applications. However, they suffer from limitations in performance, compression ratio,…

数据库 · 计算机科学 2025-08-01 Piotr Sowinski , Karolina Bogacka , Anastasiya Danilenka , Nikita Kozlov

Systematic Evaluation of Knowledge Graph Repair with Large Language Models

We present a systematic approach for evaluating the quality of knowledge graph repairs with respect to constraint violations defined in shapes constraint language (SHACL). Current evaluation methods rely on \emph{ad hoc} datasets, which…

数据库 · 计算机科学 2025-07-31 Tung-Wei Lin , Gabe Fierro , Han Li , Tianzhen Hong , Pierluigi Nuzzo , Alberto Sangiovanni-Vinentelli

Scalability, Availability, Reproducibility and Extensibility in Islamic Database Systems

With the widespread of software systems and applications that serve the Islamic knowledge domain, several concerns arise. Authenticity and accuracy of the databases that back up these systems are questionable. With the excitement that some…

数据库 · 计算机科学 2025-07-31 Umar Siddiqui , Habiba Youssef , Adel Sabour , Mohamed Ali

Compact Answers to Temporal Path Queries

We study path-based graph queries that, in addition to navigation through edges, also perform navigation through time. This allows asking questions about the dynamics of networks, like traffic movement, cause-effect relationships, or the…

数据库 · 计算机科学 2025-07-31 Muhammad Adnan , Diego Calvanese , Julien Corman , Anton Dignös , Werner Nutt , Ognjen Savković

CleANN: Efficient Full Dynamism in Graph-based Approximate Nearest Neighbor Search

Approximate nearest neighbor search (ANNS) has become a quintessential algorithmic problem for various other foundational data tasks for AI workloads. Graph-based ANNS indexes have superb empirical trade-offs in indexing cost, query…

数据库 · 计算机科学 2025-07-31 Ziyu Zhang , Yuanhao Wei , Joshua Engels , Julian Shun

Properties for Paths in Graph Databases

This paper presents a formalism for defining properties of paths in graph databases, which can be used to restrict the number of solutions to navigational queries. In particular, our formalism allows us to define quantitative properties…

数据库 · 计算机科学 2025-07-31 Fernando Orejas , Elvira Pino , Renzo Angles , Edelmira Pasarella , Nikos Milonakis

AgileDART: An Agile and Scalable Edge Stream Processing Engine

Edge applications generate a large influx of sensor data on massive scales, and these massive data streams must be processed shortly to derive actionable intelligence. However, traditional data processing systems are not well-suited for…

数据库 · 计算机科学 2025-07-31 Cheng-Wei Ching , Xin Chen , Chaeeun Kim , Tongze Wang , Dong Chen , Dilma Da Silva , Liting Hu

Ranking Methods for Skyline Queries

{Multi-criteria decision analysis in databases has been actively studied, especially through the Skyline operator. Yet, few approaches offer a relevant comparison of Pareto optimal, or Skyline, points for high cardinality result sets. We…

数据库 · 计算机科学 2025-07-30 Mickaël Martin-Nevot , Lotfi Lakhal

Digitalizing Uncertain Information

The paper sketches some initial results from an ongoing project to develop an ontology-based digital form for representing uncertain information. We frame this work as a journey from lower to higher levels of digital maturity across a…

数据库 · 计算机科学 2025-07-30 Chris Partridge , Andrew Mitchell , Andreas Cola