数据库
Understanding the dynamic transition of motifs in temporal graphs is essential for revealing how graph structures evolve over time, identifying critical patterns, and predicting future behaviors, yet existing methods often focus on…
Spreadsheet manipulation software are widely used for data management and analysis of tabular data, yet the creation of conditional formatting (CF) rules remains a complex task requiring technical knowledge and experience with specific…
SQL query rewriting aims to reformulate a query into a more efficient form while preserving equivalence. Most existing methods rely on predefined rewrite rules. However, such rule-based approaches face fundamental limitations: (1) fixed…
Text-to-SQL automatically translates natural language queries to SQL, allowing non-technical users to retrieve data from databases without specialized SQL knowledge. Despite the success of advanced LLM-based Text-to-SQL approaches on…
Combining multi-criteria decision analysis and trend reversal discovery make it possible to extract globally optimal, or non-dominated, data in relation to several criteria, and then to observe their evolution according to a decision-making…
In this paper, we present ASPEN+, which extends an existing ASP-based system, ASPEN,for collective entity resolution with two important functionalities: support for local merges and new optimality criteria for preferred solutions. Indeed,…
Real-world trajectories are often sparse with low-sampling rates (i.e., long intervals between consecutive GPS points) and misaligned with road networks, yet many applications demand high-quality data for optimal performance. To improve…
Process Mining has been widely adopted by businesses and has been shown to help organizations analyze and optimize their processes. However, so far, little attention has gone into the cross-organizational comparison of processes, since many…
In the era of cloud computing and AI, data owners outsource ubiquitous vectors to the cloud, which furnish approximate $k$-nearest neighbors ($k$-ANNS) services to users. To protect data privacy against the untrusted server,…
Large Language Models (LLMs) have recently demonstrated strong capabilities in translating natural language into database queries, especially when dealing with complex graph-structured data. However, real-world queries often contain…
Cardinality estimation is a fundamental task in database management systems, aiming to predict query results accurately without executing the queries. However, existing techniques either achieve low estimation accuracy or incur high…
Modern computing systems, such as HDFS and Spark, produce vast quantities of logs that developers use for tasks like anomaly detection and error analysis. To simplify log analysis, template generation methods have been proposed to…
The integration of event and tracking data has become essential for advanced analysis in soccer. However, synchronizing these two modalities remains a significant challenge due to temporal and spatial inaccuracies in manually recorded event…
Approximate nearest neighbor search (ANNS) in high-dimensional vector spaces has a wide range of real-world applications. Numerous methods have been proposed to handle ANNS efficiently, while graph-based indexes have gained prominence due…
Probabilistic filters are approximate set membership data structures that represent a set of keys in small space, and answer set membership queries without false negative answers, but with a certain allowed false positive probability. Such…
Ecological research increasingly relies on integrating heterogeneous datasets and knowledge to explain and predict complex phenomena. Yet, differences in data types, terminology, and documentation often hinder interoperability, reuse, and…
Today, two major trends are shaping the evolution of ML systems. First, modern AI systems are becoming increasingly complex, often integrating components beyond the model itself. A notable example is Retrieval-Augmented Generation (RAG),…
Relational databases (RDBs) have become the industry standard for storing massive and heterogeneous data. However, despite the widespread use of RDBs across various fields, the inherent structure of relational databases hinders their…
Efficient and effective data discovery is critical for many modern applications in machine learning and data science. One major bottleneck to the development of a general-purpose data discovery tool is the absence of an expressive formal…
Existing query languages for data discovery exhibit system-driven designs that emphasize database features and functionality over user needs. We propose a re-prioritization of the client through an introduction of a language-driven approach…