数据库
The availability of a dataset for validation and verification purposes of novel data-driven strategies and/or hybrid physics-data approaches is currently one of the most pressing challenges in the engineering field. Data ownership,…
The integration of Large Language Models (LLMs) into data analytics has unlocked powerful capabilities for reasoning over bulk structured and unstructured data. However, existing systems typically rely on either DataFrame primitives, which…
Various real-world applications rely on in-memory dynamic graphs that must efficiently handle frequent updates while supporting low-latency analytics on evolving structures. Achieving both objectives remains challenging due to the trade-off…
Large language models (LLMs) hold potential for mental healthcare applications, particularly in cognitive behavioral therapy (CBT)-based counseling, where reward models play a critical role in aligning LLMs with preferred therapeutic…
Optimization tasks over relational data, such as clustering, often suffer from the prohibitive cost of join operations, which are necessary to access the full dataset. While geometric data structures like BBD trees yield fast approximation…
Defect phase diagrams provide a unified description of crystal defect states for materials design and are central to the scientific objectives of the Collaborative Research Centre (CRC) 1394. Their construction requires the systematic…
Sparse vector Maximum Inner Product Search (MIPS) is crucial in multi-path retrieval for Retrieval-Augmented Generation (RAG). Recent inverted index-based and graph-based algorithms have achieved high search accuracy with practical…
Large Language Models (LLMs) have demonstrated impressive ability in generation and reasoning tasks but struggle with handling up-to-date knowledge, leading to inaccuracies or hallucinations. Retrieval-Augmented Generation (RAG) mitigates…
Earth science datasets are growing rapidly in both volume and structural complexity. They increasingly contain richly labelled data with heterogeneous metadata and complex internal constraints that impose dependencies between variables and…
When faced with data problems, many data workers cannot articulate their information need precisely enough for software to help. Although LLMs interpret natural-language requests, they behave brittly when intent is under-specified, e.g.,…
Neural text-to-SQL models, which translate natural language questions (NLQs) into SQL queries given a database schema, have achieved remarkable performance. However, database schemas frequently evolve to meet new requirements. Such schema…
Interactive visualization is a common tool for exploring large open-data repositories, where users quickly explore datasets across diverse domains. When it comes to large-scale spatial data, many existing tools rely on server-side rendering…
Significant research effort has been devoted to improving the performance of join processing in the massively parallel computation model, where the goal is to evaluate a query with the minimum possible data transfer between machines.…
Multi-model databases are designed to store, manage, and query data in various models, such as relational, hierarchical, and graph data, simultaneously. In this paper, we provide a theoretical basis for querying categorical databases. We…
Extracting actionable insights from structured databases in regulated industries, such as credit unions, is often hindered by complex schemas, legacy systems, and stringent data governance requirements. We present Tursio, a secure,…
We present the first principled and systematic study of the expressive power of property graph constraint languages, focused on the recent PG-Keys language, set to inform the upcoming revision of the GQL standard. To this end, we position…
The integration of Large Language Models (LLMs) into scientific discovery is currently hindered by the Implicit Context problem, where governing equations extracted from literature contain invisible thermodynamic assumptions (e.g.,…
Rankings play a crucial role in decision-making. However, if minor changes to items significantly alter their rankings, the quality of the decisions being made can be compromised. The stability of ranking is a measure used to assess how…
This paper addresses one of the fundamental open questions in the realm of existential rules: the conjecture on the finite controllability of bounded derivation depth rule sets (bdd $\Rightarrow$ fc). We take a step toward a positive…
Modern AI and vector search are rapidly converging, forming a promising research frontier in intelligent information systems. On one hand, advances in AI have substantially improved the semantic accuracy and efficiency of vector search,…