数据库
Effective data processing depends on the quality of the underlying data. However, quality issues such as inconsistencies and uncertainties, can significantly impede the processing and subsequent use of data. Despite the centrality of data…
Streaming graph partitioners enable resource-efficient and massively scalable partitioning, but one-pass assignment heuristics are highly sensitive to stream order and often yield substantially higher edge cuts than in-memory methods. We…
Modern cost-based DBMSs frequently exhibit execution instability and tail-latency amplification when high-dimensional relational operations trigger memory-regime transitions such as hash-table spilling and external materialization. We…
We develop a topological lens on relational schema design by encoding functional dependencies (FDs) as simplices of an abstract simplicial complex. This dependency complex exposes multi-attribute interactions and enables homological…
Privately releasing marginals of a tabular dataset is a foundational problem in differential privacy. However, state-of-the-art mechanisms suffer from a computational bottleneck when marginal estimates are reconstructed from noisy…
The Shapes Constraint Language (SHACL) is the W3C Recommendation for validating a single RDF graph. This makes SHACL inadequate for validating data across (named) graphs in an RDF dataset. Existing workarounds, such as graph unions or…
The demand for high-fidelity test data is paramount in industrial settings where access to production data is largely restricted. Traditional data generation methods often fall short, struggling with low-fidelity and the ability to model…
Symmetric searchable encryption (SSE) for geo-textual data has attracted significant attention. However, existing schemes rely on task-specific, incompatible indices for isolated specific secure queries (e.g., range or k-nearest neighbor…
Regular path queries (RPQs) are fundamental for path-constrained reachability analysis, and more complex variants such as conjunctive regular path queries (CRPQs) are increasingly used in graph analytics. Evaluating these queries is…
The rapid advancement of large language models (LLMs) has spurred the emergence of data agents, autonomous systems designed to orchestrate Data + AI ecosystems for tackling complex data-related tasks. However, the term "data agent"…
Industrial IoT ecosystems bring together sensors, machines and smart devices operating collaboratively across industrial environments. These systems generate large volumes of heterogeneous, high-velocity data streams that require…
Analytical workloads exhibit substantial semantic repetition, yet most production caches key entries by SQL surface form (text or AST), fragmenting reuse across BI tools, notebooks, and NL interfaces. We introduce a safety-first middleware…
Climate change impacts a broad spectrum of human resources and activities, necessitating the use of climate models to project long-term effects and inform mitigation and adaptation strategies. These models generate multiple datasets by…
Federated transaction management has long been used as a method to virtually integrate multiple databases from a transactional perspective, ensuring consistency across the databases. Modern approaches manage transactions on top of a…
Enterprises rely on RDF knowledge graphs and SPARQL to expose operational data through natural language interfaces, yet public KGQA benchmarks do not reflect proprietary schemas, prefixes, or query distributions. We present PIPE-RDF, a…
Recent advances in tabular in-context learning (ICL) show that a single pretrained model can adapt to new prediction tasks from a small set of labeled examples, avoiding per-task training and heavy tuning. However, many real-world tasks…
Large Language Models (LLMs) are now good enough at coding that developers can describe intent in plain language and let the tool produce the first code draft, a workflow increasingly built into tools like GitHub Copilot, Cursor, and…
We introduce SQL-Exchange, a framework for mapping SQL queries across different database schemas by preserving the source query structure while adapting domain-specific elements to align with the target schema. We investigate the conditions…
Sorting and binary searching a dense array can be considered the simplest and most space efficient form of indexing. This holds especially on GPUs as they exhibit exceptional sorting performance. However, the popular opinion is that such a…
Graph databases have become essential tools for managing complex and interconnected data, which is common in areas like social networks, bioinformatics, and recommendation systems. Unlike traditional relational databases, graph databases…