数据库
We present OnPair, a dictionary-based compression algorithm designed to meet the needs of in-memory database systems that require both high compression and fast random access. Existing methods either achieve strong compression ratios at…
Modern cloud databases are shifting from converged architectures to storage disaggregation, enabling independent scaling and billing of compute and storage. However, cloud databases still rely on external, converged coordination services…
The operation and maintenance (O&M) of database systems is critical to ensuring system availability and performance, typically requiring expert experience (e.g., identifying metric-to-anomaly relations) for effective diagnosis and recovery.…
For the past two decades, the DB community has devoted substantial research to take advantage of cheap clusters of machines for distributed data analytics -- we believe that we are at the beginning of a paradigm shift. The scaling laws and…
Data structures used in software development have inbuilt redundancy to improve software reliability and to speed up performance. Examples include a Doubly Linked List which allows a faster deletion due to the presence of the previous…
Data regulations like GDPR require systems to support data erasure but leave the definition of "erasure" open to interpretation. This ambiguity makes compliance challenging, especially in databases where data dependencies can lead to erased…
Vector indexing enables semantic search over diverse corpora and has become an important interface to databases for both users and AI agents. Efficient vector search requires deep optimizations in database systems. This has motivated a new…
Data regulations, such as GDPR, are increasingly being adopted globally to protect against unsafe data management practices. Such regulations are, often ambiguous (with multiple valid interpretations) when it comes to defining the expected…
The rapid growth of publicly available textual resources, such as lexicons and domain-specific corpora, presents challenges in efficiently identifying relevant resources. While repositories are emerging, they often lack advanced search and…
Efficiently selecting indexes is fundamental to database performance optimization, particularly for systems handling large-scale analytical workloads. While deep reinforcement learning (DRL) has shown promise in automating index selection…
The co-location of multiple database instances on resource constrained edge nodes creates significant cache contention, where traditional schemes are inefficient and unstable under dynamic workloads. To address this, we present SAM(a…
Existing RDF serialization formats such as Turtle, N-Quads, and JSON-LD are widely used for communication and storage in knowledge graph and Semantic Web applications. However, they suffer from limitations in performance, compression ratio,…
We present a systematic approach for evaluating the quality of knowledge graph repairs with respect to constraint violations defined in shapes constraint language (SHACL). Current evaluation methods rely on \emph{ad hoc} datasets, which…
With the widespread of software systems and applications that serve the Islamic knowledge domain, several concerns arise. Authenticity and accuracy of the databases that back up these systems are questionable. With the excitement that some…
We study path-based graph queries that, in addition to navigation through edges, also perform navigation through time. This allows asking questions about the dynamics of networks, like traffic movement, cause-effect relationships, or the…
Approximate nearest neighbor search (ANNS) has become a quintessential algorithmic problem for various other foundational data tasks for AI workloads. Graph-based ANNS indexes have superb empirical trade-offs in indexing cost, query…
This paper presents a formalism for defining properties of paths in graph databases, which can be used to restrict the number of solutions to navigational queries. In particular, our formalism allows us to define quantitative properties…
Edge applications generate a large influx of sensor data on massive scales, and these massive data streams must be processed shortly to derive actionable intelligence. However, traditional data processing systems are not well-suited for…
{Multi-criteria decision analysis in databases has been actively studied, especially through the Skyline operator. Yet, few approaches offer a relevant comparison of Pareto optimal, or Skyline, points for high cardinality result sets. We…
The paper sketches some initial results from an ongoing project to develop an ontology-based digital form for representing uncertain information. We frame this work as a journey from lower to higher levels of digital maturity across a…