数据库
A relation consisting of tuples annotated by an element of a monoid K is called a K-relation. A K-database is a collection of K-relations. In this paper, we study entailment of inclusion dependencies over K-databases, where K is a positive…
Entity-Relationship (ER) modeling is commonly taught as a primarily technical activity, despite its central role in shaping how data systems represent people, processes, and institutions. Prior research in participatory design demonstrates…
Filtered ANN search is an increasingly important problem in vector retrieval, yet systems face a difficult trade-off due to the execution order: Pre-filtering (filtering first, then ANN over the passing subset) requires expensive…
Long-horizon agents often compress interaction histories into write-time summaries. This creates a fundamental write-before-query barrier: compression decisions are made before the system knows what a future query will hinge on. As a…
We initiate the study of multi-attribute group fairness in $k$-nearest neighbor ($k$-NN) search over vector databases. Unlike prior work that optimizes efficiency or query filtering, fairness imposes count constraints to ensure proportional…
Approximate $k$ nearest neighbor (AKNN) search in high-dimensional space is a foundational problem in vector databases with widespread applications. Among the numerous AKNN indexes, Proximity Graph-based indexes achieve state-of-the-art…
Text-to-SQL systems powered by Large Language Models have excelled on academic benchmarks but struggle in complex enterprise environments. The primary limitation lies in their reliance on static schema representations, which fails to…
Approximate Nearest Neighbor Search (ANNS) underpins many large-scale data mining and machine learning applications, with efficient retrieval increasingly hinging on GPU acceleration as dataset sizes grow. Although graph-based approaches…
Episode mining is a fundamental problem in analyzing a sequence of numerous events. For discovering strong relationships between events in a complex event sequence, episode rule mining is proposed. However, both the episode and episode…
In pattern mining, sequential rules provide a formal framework to capture the temporal relationships and inferential dependencies between items. However, the discovery process is computationally intensive. To obtain mining results…
Knowledge graphs have been widely adopted in both enterprises, such as the Google Knowledge Graph, and open platforms like Wikidata, to represent domain knowledge and support artificial intelligence applications. They model real-world…
Operational rigor determines whether human-agent collaboration succeeds or fails. Scientific data pipelines need the equivalent of DevOps -- SciOps -- yet common approaches fragment provenance across disconnected systems without…
The preservation of cultural heritage is increasingly transitioning towards data-driven predictive maintenance and "Digital Twin" construction. However, the mechanical constitutive models required for high-fidelity simulations remain…
Natural language to SQL (NL2SQL) translation enables non-expert users to query relational databases through natural language. Recently, NL2SQL agents, powered by the reasoning capabilities of Large Language Models (LLMs), have significantly…
Clustering plays a crucial role in computer science, facilitating data analysis and problem-solving across numerous fields. By partitioning large datasets into meaningful groups, clustering reveals hidden structures and relationships within…
Transaction flow networks are crucial in detecting illicit activities such as wash trading, credit card fraud, cashback arbitrage fraud, and money laundering. \revise{Our collaborator, Grab, a leader in digital payments in Southeast Asia,…
Numerous studies have explored the SQL query refinement problem, where the objective is to minimally modify an input query so that it satisfies a specified set of constraints. However, these works typically target restricted classes of…
Nearest neighbor search on high-dimensional vectors is fundamental in modern AI and database systems. In many real-world applications, queries involve constraints on multiple numeric attributes, giving rise to range-filtering approximate…
Graph streams are rapidly evolving sequences of edges that convey continuously changing relationships among entities, playing a crucial role in domains such as networking, finance, and cybersecurity. Their massive scale and high dynamism…
Document-to-table (Doc2Table) extraction derives structured tables from unstructured documents under a target schema, enabling reliable and verifiable SQL-based data analytics. Although large language models (LLMs) have shown promise in…