数据库
k-approximate nearest neighbor search (k-ANNS) in high-dimensional vector spaces is a fundamental problem across many fields. With the advent of vector databases and retrieval-augmented generation, k-ANNS has garnered increasing attention.…
Current distributed data fabrics lack a rigorous mathematical foundation, often relying on ad-hoc architectures that struggle with consistency, lineage, and scale. We propose a mathematical framework for data fabrics, unifying heterogeneous…
This paper envisions a quantum database (Qute) that treats quantum computation as a first-class execution option. Unlike prior simulation-based methods that either run quantum algorithms on classical machines or adapt existing databases for…
Quantum computing has shown promise for solving complex optimization problems in databases, such as join ordering and index selection. Prior work often submits formulated problems directly to black-box quantum or quantum-inspired solvers…
Large language models (LLMs) have emerged as powerful tools for natural language table reasoning, where there are two main categories of methods. Prompt-based approaches rely on language-only inference or one-pass program generation without…
Large language model (LLM) merging has become a key technique in modern LLM development pipelines, enabling the integration of multiple task- or domain-specific expert models without retraining. However, as the number of experts grows,…
Large Language Models (LLMs) have changed the way natural language processing works, but it is still hard to store and manage prompts efficiently in production environments. This paper presents LoPace (Lossless Optimized Prompt Accurate…
The purpose of predictive modeling on relational data is to predict future or missing values in a relational database, for example, future purchases of a user, risk of readmission of the patient, or the likelihood that a financial…
Databases are the most critical assets for enterprises, and yet they remain largely inaccessible to people who make the most important decisions. In this paper, we describe the Tursio search platform that builds an abstraction layer, aka…
This study examines the extent to which structural constraints specified in conceptual schemas are represented after transformation to logical schemas. Focusing on the conceptual-to-logical mapping, an Entity-Relationship (ER) model…
NoSQL databases have been widely adopted in big data analytics, geospatial applications, and healthcare services, due to their flexibility and scalability. However, querying NoSQL databases requires specialized technical expertise, creating…
Identifying the most frequent induced subgraph of size $k$ in a target graph is a fundamental graph mining problem with direct implications for Web-related data mining and social network analysis. Despite its importance, finding the most…
In the era of large language models, Text-to-SQL, as a natural language interface for databases, is playing an increasingly important role. The sota Text-to-SQL models have achieved impressive accuracy, but their performance critically…
Modern property graph database query languages such as Cypher, PGQL, GSQL, and the standard GQL draw inspiration from the formalism of regular path queries (RPQs). In order to output walks explicitly, they depart from the classical and…
Modeling vessel activity at sea is critical for a wide range of applications, including route planning, transportation logistics, maritime safety, and environmental monitoring. Over the past two decades, the Automatic Identification System…
Retrieval-Augmented Generation (RAG) applications increasingly rely on Filtered Approximate Nearest Neighbor Search (FANNS) to combine semantic retrieval with metadata constraints. While algorithmic innovations for FANNS have been proposed,…
Table learning, which lies at the intersection of machine learning and modern database systems, has recently attracted growing attention. However, existing table learning frameworks typically require explicit data export and extensive…
Electronic health records (EHRs) are central to modern healthcare delivery and research; yet, many researchers lack the database expertise necessary to write complex SQL queries or generate effective visualizations, limiting efficient data…
Knowledge Graphs (KGs) store structured factual knowledge by linking entities through relationships, crucial for many applications. These applications depend on the KG's factual accuracy, so verifying facts is essential, yet challenging.…
Traditional query optimization relies on cost-based optimizers that estimate execution cost (e.g., runtime, memory, and I/O) using predefined heuristics and statistical models. Improving these heuristics requires substantial engineering…