数据库 — Scifaro

Extremal Fitting Problems for Conjunctive Queries

The fitting problem for conjunctive queries (CQs) is the problem to construct a CQ that fits a given set of labeled data examples. When a fitting CQ exists, it is in general not unique. This leads us to proposing natural refinements of the…

数据库 · 计算机科学 2025-09-25 Balder ten Cate , Victor Dalmau , Maurice Funk , Carsten Lutz

Gate-Based and Annealing-Based Quantum Algorithms for the Maximum K-Plex Problem

The $ k $-plex model, which allows each vertex to miss connections with up to $ k $ neighbors, serves as a relaxation of the clique. Its adaptability makes it more suitable for analyzing real-world graphs where noise and imperfect data are…

数据库 · 计算机科学 2025-09-24 Xiaofan Li , Gao Cong , Rui Zhou

A decentralized future for the open-science databases

Continuous and reliable access to curated biological data repositories is indispensable for accelerating rigorous scientific inquiry and fostering reproducible research. Centralized repositories, though widely used, are vulnerable to single…

数据库 · 计算机科学 2025-09-24 Gaurav Sharma , Viorel Munteanu , Nika Mansouri Ghiasi , Jineta Banerjee , Susheel Varma , Luca Foschini , Kyle Ellrott , Onur Mutlu , Dumitru Ciorbă , Roel A. Ophoff , Viorel Bostan , Christopher E Mason , Jason H. Moore , Despoina Sousoni , Arunkumar Krishnan , Christopher E. Mason , Mihai Dimian , Gustavo Stolovitzky , Fabio G. Liberante , Taras K. Oleksyk , Serghei Mangul

Teaching RDM in a smart advanced inorganic lab course and its provision in the DALIA platform

Research data management (RDM) is a key data literacy skill that chemistry students must acquire. Concepts such as the FAIR data principles (Findable, Accessible, Interoperable, Reusable) should be taught and applied in undergraduate…

数据库 · 计算机科学 2025-09-24 Alexander Hoffmann , Jochen Ortmeyer , Fabian Fink , Charles Tapley Hoyt , Jonathan D. Geiger , Paul Kehrein , Torsten Schrade , Sonja Herres-Pawlis

CALL: Context-Aware Low-Latency Retrieval in Disk-Based Vector Databases

Embedding models capture both semantic and syntactic structures of queries, often mapping different queries to similar regions in vector space. This results in non-uniform cluster access patterns in modern disk-based vector databases. While…

数据库 · 计算机科学 2025-09-24 Yeonwoo Jeong , Hyunji Cho , Kyuri Park , Youngjae Kim , Sungyong Park

ExtGraph: A Fast Extraction Method of User-intended Graphs from a Relational Database

Graph analytics is widely used in many fields to analyze various complex patterns. However, in most cases, important data in companies is stored in RDBMS's, and so, it is necessary to extract graphs from relational databases to perform…

数据库 · 计算机科学 2025-09-24 Jeongho Park , Geonho Lee , Min-Soo Kim

From Documents to Database: Failure Modes for Industrial Assets

We propose an interactive system using foundation models and user-provided technical documents to generate Failure Mode and Effects Analyses (FMEA) for industrial equipment. Our system aggregates unstructured content across documents to…

数据库 · 计算机科学 2025-09-23 Duygu Kabakci-Zorlu , Fabio Lorenzi , John Sheehan , Karol Lynch , Bradley Eck

Propuesta de implementaci\'on de cat\'alogos federados para espacios de datos sobre DataHub

In the digital era, data spaces are emerging as key ecosystems for the secure and controlled exchange of information among participants. To achieve this, components such as metadata catalogs and data space connectors are essential. This…

数据库 · 计算机科学 2025-09-23 Carlos Aparicio de Santiago , Pablo Viñuales Esquinas , Irene Plaza Ortiz , Andres Munoz-Arcentales , Gabriel Huecas , Joaquín Salvachúa , Enrique Barra

EPIC: Generative AI Platform for Accelerating HPC Operational Data Analytics

We present EPIC, an AI-driven platform designed to augment operational data analytics. EPIC employs a hierarchical multi-agent architecture where a top-level large language model provides query processing, reasoning and synthesis…

数据库 · 计算机科学 2025-09-23 Ahmad Maroof Karimi , Woong Shin , Jesse Hines , Tirthankar Ghosal , Naw Safrin Sattar , Feiyi Wang

Query Answering under Volume-Based Diversity Functions

When query evaluation produces too many tuples, a new approach in query answering is to retrieve a diverse subset of them. The standard approach for measuring the diversity of a set of tuples is to use a distance function between tuples,…

数据库 · 计算机科学 2025-09-23 Marcelo Arenas , Timo Camillo Merkl , Reinhard Pichler , Cristian Riveros

Attribute Filtering in Approximate Nearest Neighbor Search: An In-depth Experimental Study

With the growing integration of structured and unstructured data, new methods have emerged for performing similarity searches on vectors while honoring structured attribute constraints, i.e., a process known as Filtering Approximate Nearest…

数据库 · 计算机科学 2025-09-23 Mocheng Li , Xiao Yan , Baotong Lu , Yue Zhang , James Cheng , Chenhao Ma

Query, Don't Train: Privacy-Preserving Tabular Prediction from EHR Data via SQL Queries

Electronic health records (EHRs) contain richly structured, longitudinal data essential for predictive modeling, yet stringent privacy regulations (e.g., HIPAA, GDPR) often restrict access to individual-level records. We introduce…

数据库 · 计算机科学 2025-09-23 Josefa Lia Stoisser , Marc Boubnovski Martell , Kaspar Märtens , Lawrence Phillips , Stephen Michael Town , Rory Donovan-Maiye , Julien Fauqueur

TranSQL+: Serving Large Language Models with SQL on Low-Resource Hardware

Deploying Large Language Models (LLMs) on resource-constrained devices remains challenging due to limited memory, lack of GPUs, and the complexity of existing runtimes. In this paper, we introduce TranSQL+, a template-based code generator…

数据库 · 计算机科学 2025-09-23 Wenbo Sun , Qiming Guo , Wenlu Wang , Rihan Hai

The Causal-Effect Score in Data Management

The Causal Effect (CE) is a numerical measure of causal influence of variables on observed results. Despite being widely used in many areas, only preliminary attempts have been made to use CE as an attribution score in data management, to…

数据库 · 计算机科学 2025-09-23 Felipe Azua , Leopoldo Bertossi

Forecasting Electric Vehicle Charging Station Occupancy: Smarter Mobility Data Challenge

The transport sector is a major contributor to greenhouse gas emissions in Europe. Shifting to electric vehicles (EVs) powered by a low-carbon energy mix would reduce carbon emissions. However, to support the development of electric…

数据库 · 计算机科学 2025-09-23 Yvenn Amara-Ouali , Yannig Goude , Nathan Doumèche , Pascal Veyret , Alexis Thomas , Daniel Hebenstreit , Thomas Wedenig , Arthur Satouf , Aymeric Jan , Yannick Deleuze , Paul Berhaut , Sébastien Treguer , Tiphaine Phe-Neau

Utility-based Privacy Preserving Data Mining

With the advent of big data, periodic pattern mining has demonstrated significant value in real-world applications, including smart home systems, healthcare systems, and the medical field. However, advances in network technology have…

数据库 · 计算机科学 2025-09-22 Qingfeng Zhou , Wensheng Gan , Zhenlian Qi , Philip S. Yu

Discovering Top-k Periodic and High-Utility Patterns

With a user-specified minimum utility threshold (minutil), periodic high-utility pattern mining (PHUPM) aims to identify high-utility patterns that occur periodically in a transaction database. A pattern is deemed periodic if its period…

数据库 · 计算机科学 2025-09-22 Qingfeng Zhou , Wensheng Gan , Guoting Chen

Optimization techniques for SQL+ML queries: A performance analysis of real-time feature computation in OpenMLDB

In this study, we optimize SQL+ML queries on top of OpenMLDB, an open-source database that seamlessly integrates offline and online feature computations. The work used feature-rich synthetic dataset experiments in Docker, which acted like…

数据库 · 计算机科学 2025-09-22 Mashkhal A. Sidiq , Aras A. Salih , Samrand M. Hassan

Integrated data-driven biotechnology research environments

In the past few decades, the life sciences have experienced an unprecedented accumulation of data, ranging from genomic sequences and proteomic profiles to heavy-content imaging, clinical assays, and commercial biological products for…

数据库 · 计算机科学 2025-09-22 Rosalia Moreddu

A Case for Computing on Unstructured Data

Unstructured data, such as text, images, audio, and video, comprises the vast majority of the world's information, yet it remains poorly supported by traditional data systems that rely on structured formats for computation. We argue for a…

数据库 · 计算机科学 2025-09-19 Mushtari Sadia , Amrita Roy Chowdhury , Ang Chen