数据库
Natural language (NL) interfaces to databases broaden access to heterogeneous data but often yield many ambiguous intermediate logical plans (ILPs) due to uncertain operator scope and predicate semantics. Many candidates are infeasible…
Intelligent Transportation Systems increasingly depend on heterogeneous data from roadside cameras, UAV imagery, LiDAR, and in-vehicle sensors, yet the lack of unified data standards, model interfaces, and evaluation protocols across these…
Mobility data science offers insights into the complex interconnections of spatial data of moving objects and their surroundings, often based on a combination of vector and raster data. For example, mobility traces are usually in vector…
Understanding micro-architectural behavior is profound in efficiently using hardware resources. Recent work has shown that, despite being aggressively optimized for modern hardware, in-memory online transaction processing (OLTP) systems…
The $k$-defective clique model relaxes the strict completeness constraint of the traditional clique by allowing up to $k$ missing edges, providing a robust formulation for detecting cohesive structures in noisy graphs. Consequently, the…
Many applications process a stream of tuples over a window duration, and require the results within a specified deadline after the end of the window. For such scenarios, processing tuples intermittently (in batches) instead of eagerly…
Approximate nearest neighbor (ANN) graph indices such as HNSW and Vamana construct their edge topology in full-precision or high-fidelity quantized metric spaces, relegating binary quantization (BQ) to a post-hoc distance estimator during…
Context retrieval systems for LLM inference face a critical challenge: high retrieval latency creates a fundamental tension between waiting for complete context (poor time-to-first-token) and proceeding without it (reduced quality).…
The Ethereum state database uses Merkle Patricia Trie (MPT), which suffers from large witness proof sizes and high storage overhead. Verkle Tries have been proposed as a replacement, offering witness proofs below 150 bytes through vector…
Vector search underpins modern AI applications by supporting approximate nearest neighbor (ANN) queries over high-dimensional embeddings in tasks like retrieval-augmented generation (RAG), recommendation systems, and multimodal search.…
The rise of compound AI serving that integrates multiple operators in a pipeline enables end-user applications such as generative AI-powered meeting companions, autonomous driving, and immersive gaming. These workloads span diverse…
Relational databases store much of the world's structured information, and they are essential for driving complex predictive applications. However, deep learning progress on relational data remains limited, as conventional approaches…
Modeling data lineage in relational databases remains a challenging problem, particularly in scenarios involving incomplete or missing dependencies between database objects. In this paper, we propose a novel ontology for relational database…
Vector search (VS) is now available in most database engines. However, while vector search is a common feature in AI/ML/LLMs where the dominant computing platforms are GPUs, existing database engines operate on CPUs even when implementing…
Retrieval-Augmented Generation (RAG) improves reliability of large language models by incorporating external knowledge, but the retrieval process can introduce bias that propagates to generated outputs. This issue is particularly…
Managing large-scale vector datasets with disk-resident graph approximate nearest neighbor search (ANNS) systems incurs substantial storage overhead due to the co-location of vector data and auxiliary index metadata, which prevents the…
We investigate the notion of sufficient explanation, and a sufficiency-degree as attribution score for database tuples in relation to query answering. We also investigate and exploit connections with database repairs as used for dealing…
Join ordering is the NP-hard problem of selecting the most efficient order in which to evaluate joins (conjunctive, binary operators) in a database query. Because query execution performance critically depends on this choice, join ordering…
Relational data stored in RDBMS is foundational to many real-world applications across domains such as e-commerce, finance, and sociality. While deep neural networks (DNNs) have achieved strong performance on tabular data with a single…
Relational databases excel at structured data analysis, but real-world queries increasingly require capabilities beyond standard SQL, such as semantically matching entities across inconsistent names, extracting information not explicitly…