数据库
Database Management System (DBMS) is designed to help store and process large collections of data, and is incredibly flexible to perform various kinds of optimizations as long as it achieves serializability with a high-level interface…
There have been a flurry of recent proposals on learned benefit estimators for index tuning. Although these learned estimators show promising improvement over what-if query optimizer calls in terms of the accuracy of estimated index…
Connectivity query processing is a fundamental problem in graph processing. Given an undirected graph and two query vertices, the problem aims to identify whether they are connected via a path. Given frequent edge updates in real graph…
When interactively exploring video data, video-native querying involves consuming query results as videos, including steps such as compilation of extracted video clips or data overlays. These video-native queries are bottlenecked by…
Data preparation aims to denoise raw datasets, uncover cross-dataset relationships, and extract valuable insights from them, which is essential for a wide range of data-centric applications. Driven by (i) rising demands for…
AI agents are increasingly the primary consumers of data, operating continuously to make concurrent, irreversible decisions. Traditional data systems designed for human analysis cycles become correctness bottlenecks under this operating…
With the increasing use of RDF graphs, storing and querying such data using SPARQL remains a critical problem. Current mainstream solutions rely on cloud-based data management architectures, but often suffer from performance bottlenecks in…
In this paper we propose an approach for executing data transformations near- or in-storage on intelligent storage systems. The currently prevailing approach of extracting the data and then transforming it to a target format suffers…
Resolution of complex SQL issues persists as a significant bottleneck in real-world database applications. Current Large Language Models (LLMs), while adept at text-to-SQL translation, have not been rigorously evaluated on the more…
Buildings generate heterogeneous data across their lifecycle, yet integrating these data remains a critical unsolved challenge. Despite three decades of standardization efforts, over 40 metadata schemas now span the building lifecycle, with…
Functional dependencies (FDs) are fundamental integrity constraints in relational databases, but discovering them under incremental updates remains challenging. While static algorithms are inefficient due to full re-execution, incremental…
The advancement of mobile computing devices and positioning technologies has led to an explosive growth of spatio-temporal data managed in databases. Representative queries over such data include range queries, nearest neighbor queries, and…
Accurate disease diagnosis depends on effective collaboration between medical specialties, yet departments often use distinct data systems and proprietary formats. This heterogeneity hinders joint analysis and integration of complementary…
Dynamic graphs model many real-world applications, and as their sizes grow, efficiently storing and updating them becomes critical. We present RadixGraph, a fast and memory-efficient data structure for dynamic graph storage. RadixGraph…
Data lakes are massive repositories of raw and heterogeneous data, designed to meet the requirements of modern data storage. Nonetheless, this same philosophy increases the complexity of performing discovery tasks to find relevant data for…
Billboard Advertising has emerged as an effective out-of-home advertising technique, where the goal is to select a limited number of slots and play advertisement content there, with the hope that it will be observed by many people and,…
Spatial natural language interface to database systems provide non-expert users with convenient access to spatial data through natural language queries. However, the scarcity of high-quality spatial natural language query corpora limits the…
Graph Machine Learning (GML) with Graph Databases (GDBs) has gained significant relevance in recent years, due to its ability to handle complex interconnected data and apply ML techniques using Graph Data Science (GDS). However, a critical…
Approximate Nearest Neighbor Search (ANNS) plays a critical role in applications such as search engines, recommender systems, and RAG for LLMs. Vector quantization (VQ), a crucial technique for ANNS, is commonly used to reduce space…
Materials synthesis procedures are predominantly documented as narrative text in protocols and lab notebooks, rendering them inaccessible to conventional structured data optimization. This language-native nature poses a critical challenge…