数据库
Data provenance (the process of determining the origin and derivation of data outputs) has applications across multiple domains including explaining database query results and auditing scientific workflows. Despite decades of research,…
Multinational enterprises conduct global business that has a demand for geo-distributed transactional databases. Existing state-of-the-art databases adopt a sharded master-follower replication architecture. However, the single-master…
LLMs and MLLMs have become indispensable tools across a wide range of applications. E-commerce, however, poses distinctive challenges -- including intricate domain knowledge, long-tail product evidence, heterogeneous visual data, and the…
We propose graph-grounded optimization: a paradigm in which the decision variables, constraints, and objective coefficients of a real-world optimization problem are sourced from a property knowledge graph (KG) via Cypher queries, rather…
Vector databases increasingly enforce role-based access control, where each top-k approximate nearest neighbor query must return only vectors the querying role is authorized to access. Two extremes bracket the design space. A single global…
Spatial query and analysis results are often directly applied to decision-making processes such as facility location, proximity resource discovery, accessibility analysis, and risk assessment. Therefore, the efficiency of underlying spatial…
The article studies query evaluation in parallel constant time in the CRCW PRAM model. While it is well-known that all relational algebra queries can be evaluated in constant time on an appropriate CRCW PRAM model, this article is…
Existential rules are a prominent formalism to enrich a database with knowledge from the domain of interest, but make even basic reasoning tasks on the resulting knowledge base undecidable. To circumvent this, several classes of rules…
We propose a data-aware candidate selection method for NL2SQL translation based on separating instances and provenance. We implement this approach and evaluate it against three natural baselines on a subset of BIRD-DEV. Experiments show…
Data is critical for the operation of any organization and needs to be protected, especially against attacks that compromise the state of the database. In this paper, we explore an approach based on Byzantine-fault tolerant replicated state…
Electrospinning is a versatile nanofabrication technique whose outcomes emerge from a complex, high-dimensional interplay between solution properties, processing parameters, and environmental conditions. Optimizing this parameter space for…
Uniform sampling and approximate counting are fundamental primitives for modern database applications, ranging from query optimization to approximate query processing. While recent breakthroughs have established optimal sampling and…
We present SpotIt+, an open-source tool for evaluating Text-to-SQL systems via bounded equivalence verification. Given a generated SQL query and the ground truth, SpotIt+ actively searches for database instances that differentiate the two…
SHACL-DS extends SHACL for RDF dataset validation by introducing declarative targeting of named graphs and graph combinations, but has not yet been demonstrated and assessed on a real, large-scale Knowledge Graph (KG). In this paper, we…
One major technical challenge for modern analytical database systems is how to leverage GPU to exploit their massive parallelism and high bandwidth. Yet, existing GPU-driven database engines suffer from inefficiencies caused by frequent…
Large language models have significantly improved natural language interfaces to databases by translating user questions into executable queries. In particular, Text2Cypher focuses on generating Cypher queries for graph databases, enabling…
Graph-based ANNS algorithms have gained increasing research interest and market adoption due to their efficiency and accuracy in retrieval. Existing approaches primarily rely on CPUs for graph index construction and retrieval, but this…
Logs are ubiquitous in modern systems. Unfortunately, their unstructured nature in flat sequences limits understanding of execution behaviors, hindering effective anomaly diagnosis. To address this, Krone introduces a novel hierarchical log…
In applications such as event monitoring, log analysis, and video querying, $w$-event privacy protects individual data within a sliding time window while supporting accurate stream statistics. Existing studies on infinite data streams…
Data preparation is a central and time-consuming stage in data analysis workflows. Traditionally, commercial tools have relied on graphical user interfaces (GUIs) to simplify data preparation, allowing users to define transformations…