数据库
Benchmarking is crucial for evaluating a DBMS, yet existing benchmarks often fail to reflect the varied nature of user workloads. As a result, there is increasing momentum toward creating databases that incorporate real-world user data to…
Two-phase locking (2PL) is a fundamental and widely used concurrency control protocol. It regulates concurrent access to database data by following a specific sequence of acquiring and releasing locks during transaction execution, thereby…
Recent deployments of learned query optimizers use expensive neural networks and ad-hoc search policies. To address these issues, we introduce \textsc{LimeQO}, a framework for offline query optimization leveraging low-rank learning to…
An efficient data structure is fundamental to meeting the growing demands in dynamic graph processing. However, the dual requirements for graph computation efficiency (with contiguous structures) and graph update efficiency (with linked…
Nearest neighbour search over dense vector collections has important applications in information retrieval, retrieval augmented generation (RAG), and content ranking. Performing efficient search over large vector collections is a well…
This article introduces the Data Retrieval Web Engine (also referred to as doctor web), a flexible and modular tool for extracting structured data from web pages using a simple query language. We discuss the engineering challenges addressed…
In today's fast-paced digital world, data has become a critical asset for enterprises across various industries. However, the exponential growth of data presents significant challenges in managing and utilizing the vast amounts of…
Machines need data and metadata to be machine-actionable and FAIR (findable, accessible, interoperable, reusable) to manage increasing data volumes. Knowledge graphs and ontologies are key to this, but their use is hampered by high access…
Relational databases, organized into tables connected by primary-foreign key relationships, are a common format for organizing data. Making predictions on relational data often involves transforming them into a flat tabular format through…
Stardog is a commercial Knowledge Graph platform built on top of an RDF graph database whose primary means of communication is a standardized graph query language called SPARQL. This paper describes our journey of developing a more…
The proliferation of small files in data lakes poses significant challenges, including degraded query performance, increased storage costs, and scalability bottlenecks in distributed storage systems. Log-structured table formats (LSTs) such…
Range-filtering approximate $k$-nearest neighbor (RFAKNN) search takes as input a vector and a numeric value, returning $k$ points from a database of $N$ high-dimensional points. The returned points must satisfy two criteria: their numeric…
With the rapid development of big data and artificial intelligence technologies, the demand for effective processing and retrieval of vector data is growing. Against this backdrop, I have developed the Bhakti vector database, aiming to…
Conformance checking techniques aim to provide diagnostics on the conformity between process models and event data. Conventional methods, such as trace alignments, assume strict total ordering of events, leading to inaccuracies when…
Business intelligence (BI) transforms large volumes of data within modern organizations into actionable insights for informed decision-making. Recently, large language model (LLM)-based agents have streamlined the BI workflow by…
Tabular reasoning involves interpreting natural language queries about tabular data, which presents a unique challenge of combining language understanding with structured data analysis. Existing methods employ either textual reasoning,…
Acyclic conjunctive queries form the backbone of most analytical workloads, and have been extensively studied in the literature from both theoretical and practical angles. However, there is still a large divide between theory and practice.…
The integration of Large Language Models (LLMs) with Knowledge Graphs (KGs) offers significant synergistic potential for knowledge-driven applications. One possible integration is the interpretation and generation of formal languages, such…
Cardinality estimation and conjunctive query evaluation are two of the most fundamental problems in database query processing. Recent work proposed, studied, and implemented a robust and practical information-theoretic cardinality…
This paper introduces Web3DB, a decentralized relational database management system (RDBMS) designed to align with the principles of Web 3.0, addressing critical shortcomings of traditional centralized DBMS, such as data privacy, security…