数据库
Secure Multi-Party Computation (MPC) enables collaborative analytics without exposing private data. However, OLAP queries under MPC remain prohibitively slow due to oblivious execution and padding of intermediate results with filler tuples.…
Generative artificial intelligence (AI), exemplified by the release of GPT-3.5 in 2022, has significantly advanced the potential applications of large language models (LLMs), including in the realms of ontology development and knowledge…
Most NoSQL systems are schema-on-read: data can be stored without first having to declare a Schema that imposes a structure. This schemaless feature offers flexibility to evolve data-intensive applications when data frequently change.…
The Database field is undergoing significant changes. Although relational systems are still predominant, the interest in NoSQL systems is continuously increasing. In this scenario, polyglot persistence is envisioned as the database…
We investigate the fine-grained complexity of direct access to Conjunctive Query (CQ) answers according to their position, ordered by the minimum (or maximum) value between attributes. We further use the tools we develop to explore a wealth…
Dataset availability and quality remain critical challenges in machine learning, especially in domains where data are scarce, expensive to acquire, or constrained by privacy regulations. Fields such as healthcare, biomedical research, and…
Spatiotemporal data play a key role for mobility-based applications and are their produced volume is growing continuously, among others, due to the increased availability of IoT devices. When working with spatiotemporal data, developers…
GQL has recently emerged as the standard query language over graph databases (particularly, the property graph model). Indeed, this is analogous to the role of SQL for relational databases. Unlike SQL, however, fundamental problems…
The rise of Large Language Models (LLMs) has accelerated the long-standing goal of enabling natural language querying over complex, hybrid databases. Yet, this ambition exposes a dual challenge: reasoning jointly over structured,…
Efficient data processing is increasingly vital, with query optimizers playing a fundamental role in translating SQL queries into optimal execution plans. Traditional cost-based optimizers, however, often generate suboptimal plans due to…
The inherent connectivity and dependency of graph-structured data, combined with its unique topology-driven access patterns, pose fundamental challenges to conventional data replication and request routing strategies in geo-distributed…
The growing number of moving Internet-of-Things (IoT) devices has led to a surge in moving object data, powering applications such as traffic routing, hotspot detection, or weather forecasting. When managing such data, spatial database…
The last few years have witnessed a spate of data protection regulations in conjunction with an ever-growing appetite for data usage in large businesses, which presents significant challenges for businesses to maintain compliance. To…
Tuning database management systems (DBMSs) is challenging due to trillions of possible configurations and evolving workloads. Recent advances in tuning have led to breakthroughs in optimizing over the possible configurations. However, due…
Approximate Nearest Neighbor Search (ANNS) in high-dimensional space is an essential operator in many online services, such as information retrieval and recommendation. Indices constructed by the state-of-the-art ANNS algorithms must be…
Spatio-temporal data captures complex dynamics across both space and time, yet traditional visualizations are complex, require domain expertise and often fail to resonate with broader audiences. Here, we propose MapMuse, a…
In many industrial settings, users wish to ask questions in natural language, the answers to which require assembling information from diverse structured data sources. With the advent of Large Language Models (LLMs), applications can now…
The fragmentation of obstetric information across electronic health record modules, device repositories, and laboratory systems, as it is common in hospitals, hinders both intrapartum care and reproducible research. In this work, we present…
Modern applications commonly leverage large, multi-modal foundation models. These applications often feature complex workflows that demand the storage and usage of similar models in multiple precisions. A straightforward approach is to…
Cloud data lakes provide a modern solution for managing large volumes of data. The fundamental principle behind these systems is the separation of compute and storage layers. In this architecture, inexpensive cloud storage is utilized for…