数据库 — Scifaro

Efficient Discovery of Motif Transition Process for Large-Scale Temporal Graphs

Understanding the dynamic transition of motifs in temporal graphs is essential for revealing how graph structures evolve over time, identifying critical patterns, and predicting future behaviors, yet existing methods often focus on…

数据库 · 计算机科学 2025-08-19 Zhiyuan Zheng , Jianpeng Qi , Jiantao Li , Guoqing Chao , Junyu Dong , Yanwei Yu

Tabularis Formatus: Predictive Formatting for Tables

Spreadsheet manipulation software are widely used for data management and analysis of tabular data, yet the creation of conditional formatting (CF) rules remains a complex task requiring technical knowledge and experience with specific…

数据库 · 计算机科学 2025-08-18 Mukul Singh , José Cambronero , Sumit Gulwani , Vu Le , Gust Verbruggen

E3-Rewrite: Learning to Rewrite SQL for Executability, Equivalence,and Efficiency

SQL query rewriting aims to reformulate a query into a more efficient form while preserving equivalence. Most existing methods rely on predefined rewrite rules. However, such rule-based approaches face fundamental limitations: (1) fixed…

数据库 · 计算机科学 2025-08-18 Dongjie Xu , Yue Cui , Weijie Shi , Qingzhi Ma , Hanghui Guo , Jiaming Li , Yao Zhao , Ruiyuan Zhang , Shimin Di , Jia Zhu , Kai Zheng , Jiajie Xu

EllieSQL: Cost-Efficient Text-to-SQL with Complexity-Aware Routing

Text-to-SQL automatically translates natural language queries to SQL, allowing non-technical users to retrieve data from databases without specialized SQL knowledge. Despite the success of advanced LLM-based Text-to-SQL approaches on…

数据库 · 计算机科学 2025-08-18 Yizhang Zhu , Runzhi Jiang , Boyan Li , Nan Tang , Yuyu Luo

Emerging Skycube

Combining multi-criteria decision analysis and trend reversal discovery make it possible to extract globally optimal, or non-dominated, data in relation to several criteria, and then to observe their evolution according to a decision-making…

数据库 · 计算机科学 2025-08-15 Mickaël Martin Nevot

Advances in Logic-Based Entity Resolution: Enhancing ASPEN with Local Merges and Optimality Criteria

In this paper, we present ASPEN+, which extends an existing ASP-based system, ASPEN,for collective entity resolution with two important functionalities: support for local merges and new optimality criteria for preferred solutions. Indeed,…

数据库 · 计算机科学 2025-08-15 Zhliang Xiang , Meghyn Bienvenu , Gianluca Cima , Víctor Gutiérrez-Basulto , Yazmín Ibáñez-García

Efficient Methods for Accurate Sparse Trajectory Recovery and Map Matching

Real-world trajectories are often sparse with low-sampling rates (i.e., long intervals between consecutive GPS points) and misaligned with road networks, yet many applications demand high-quality data for optimal performance. To improve…

数据库 · 计算机科学 2025-08-15 Wei Tian , Jieming Shi , Man Lung Yiu

Cross-Organizational Analysis of Parliamentary Processes: A Case Study

Process Mining has been widely adopted by businesses and has been shown to help organizations analyze and optimize their processes. However, so far, little attention has gone into the cross-organizational comparison of processes, since many…

数据库 · 计算机科学 2025-08-15 Paul-Julius Hillmann , Stephan A. Fahrenkrog-Petersen , Jan Mendling

Privacy-Preserving Approximate Nearest Neighbor Search on High-Dimensional Data

In the era of cloud computing and AI, data owners outsource ubiquitous vectors to the cloud, which furnish approximate $k$-nearest neighbors ($k$-ANNS) services to users. To protect data privacy against the untrusted server,…

数据库 · 计算机科学 2025-08-15 Yingfan Liu , Yandi Zhang , Jiadong Xie , Hui Li , Jeffrey Xu Yu , Jiangtao Cui

AmbiGraph-Eval: Can LLMs Effectively Handle Ambiguous Graph Queries?

Large Language Models (LLMs) have recently demonstrated strong capabilities in translating natural language into database queries, especially when dealing with complex graph-structured data. However, real-world queries often contain…

数据库 · 计算机科学 2025-08-14 Yuchen Tian , Kaixin Li , Hao Chen , Ziyang Luo , Hongzhan Lin , Sebastian Schelter , Lun Du , Jing Ma

A Lightweight Learned Cardinality Estimation Model

Cardinality estimation is a fundamental task in database management systems, aiming to predict query results accurately without executing the queries. However, existing techniques either achieve low estimation accuracy or incur high…

数据库 · 计算机科学 2025-08-14 Yaoyu Zhu , Jintao Zhang , Guoliang Li , Jianhua Feng

LLMLog: Advanced Log Template Generation via LLM-driven Multi-Round Annotation

Modern computing systems, such as HDFS and Spark, produce vast quantities of logs that developers use for tasks like anomaly detection and error analysis. To simplify log analysis, template generation methods have been proposed to…

数据库 · 计算机科学 2025-08-14 Fei Teng , Haoyang Li , Lei Chen

ELASTIC: Event-Tracking Data Synchronization in Soccer Without Annotated Event Locations

The integration of event and tracking data has become essential for advanced analysis in soccer. However, synchronizing these two modalities remains a significant challenge due to temporal and spatial inaccuracies in manually recorded event…

数据库 · 计算机科学 2025-08-14 Hyunsung Kim , Hoyoung Choi , Sangwoo Seo , Tom Boomstra , Jinsung Yoon , Chanyoung Park

Scalable Graph Indexing using GPUs for Approximate Nearest Neighbor Search

Approximate nearest neighbor search (ANNS) in high-dimensional vector spaces has a wide range of real-world applications. Numerous methods have been proposed to handle ANNS efficiently, while graph-based indexes have gained prominence due…

数据库 · 计算机科学 2025-08-14 Zhonggen Li , Xiangyu Ke , Yifan Zhu , Bocheng Yu , Baihua Zheng , Yunjun Gao

Blocked Bloom Filters with Choices

Probabilistic filters are approximate set membership data structures that represent a set of keys in small space, and answer set membership queries without false negative answers, but with a certain allowed false positive probability. Such…

数据库 · 计算机科学 2025-08-14 Johanna Elena Schmitz , Jens Zentgraf , Sven Rahmann

A Framework for FAIR and CLEAR Ecological Data and Knowledge: Semantic Units for Synthesis and Causal Modelling

Ecological research increasingly relies on integrating heterogeneous datasets and knowledge to explain and predict complex phenomena. Yet, differences in data types, terminology, and documentation often hinder interoperability, reuse, and…

数据库 · 计算机科学 2025-08-13 Lars Vogt , Birgitta König-Ries , Tim Alamenciak , Joshua I. Brian , Carlos Alberto Arnillas , Lotte Korell , Robert Frühstückl , Tina Heger

Vector-Centric Machine Learning Systems: A Cross-Stack Approach

Today, two major trends are shaping the evolution of ML systems. First, modern AI systems are becoming increasingly complex, often integrating components beyond the model itself. A notable example is Retrieval-Augmented Generation (RAG),…

数据库 · 计算机科学 2025-08-13 Wenqi Jiang

Synthesize, Retrieve, and Propagate: A Unified Predictive Modeling Framework for Relational Databases

Relational databases (RDBs) have become the industry standard for storing massive and heterogeneous data. However, despite the widespread use of RDBs across various fields, the inherent structure of relational databases hinders their…

数据库 · 计算机科学 2025-08-13 Ning Li , Kounianhua Du , Han Zhang , Quan Gan , Minjie Wang , David Wipf , Weinan Zhang

Towards General-Purpose Data Discovery: A Programming Languages Approach

Efficient and effective data discovery is critical for many modern applications in machine learning and data science. One major bottleneck to the development of a general-purpose data discovery tool is the absence of an expressive formal…

数据库 · 计算机科学 2025-08-12 Andrew Kang , Yashnil Saha , Sainyam Galhotra

TQL: Towards Type-Driven Data Discovery

Existing query languages for data discovery exhibit system-driven designs that emphasize database features and functionality over user needs. We propose a re-prioritization of the client through an introduction of a language-driven approach…

数据库 · 计算机科学 2025-08-12 Andrew Kang , Sainyam Galhotra