数据库 — Scifaro

Thunderbolt: Concurrent Smart Contract Execution with Non-blocking Reconfiguration for Sharded DAGs

Sharding has emerged as a critical technique for enhancing blockchain system scalability. However, existing sharding approaches face unique challenges when applied to Directed Acyclic Graph (DAG)-based protocols that integrate expressive…

数据库 · 计算机科学 2025-08-25 Junchao Chen , Alberto Sonnino , Lefteris Kokoris-Kogias , Mohammad Sadoghi

PRICE: A Pretrained Model for Cross-Database Cardinality Estimation

Cardinality estimation (CardEst) is essential for optimizing query execution plans. Recent ML-based CardEst methods achieve high accuracy but face deployment challenges due to high preparation costs and lack of transferability across…

数据库 · 计算机科学 2025-08-25 Tianjing Zeng , Junwei Lan , Jiahong Ma , Wenqing Wei , Rong Zhu , Pengfei Li , Bolin Ding , Defu Lian , Zhewei Wei , Jingren Zhou

GoVector: An I/O-Efficient Caching Strategy for High-Dimensional Vector Nearest Neighbor Search

Graph-based high-dimensional vector indices have become a mainstream solution for large-scale approximate nearest neighbor search (ANNS). However, their substantial memory footprint often requires storage on secondary devices, where…

数据库 · 计算机科学 2025-08-22 Yijie Zhou , Shengyuan Lin , Shufeng Gong , Song Yu , Shuhao Fan , Yanfeng Zhang , Ge Yu

Gorgeous: Revisiting the Data Layout for Disk-Resident High-Dimensional Vector Search

Similarity-based vector search underpins many important applications, but a key challenge is processing massive vector datasets (e.g., in TBs). To reduce costs, some systems utilize SSDs as the primary data storage. They employ a proximity…

数据库 · 计算机科学 2025-08-22 Peiqi Yin , Xiao Yan , Qihui Zhou , Hui Li , Xiaolu Li , Lin Zhang , Meiling Wang , Xin Yao , James Cheng

Efficient Cloud-Edge-Device Query Execution Based on Collaborative Scan Operator

In cloud-edge-device (CED) collaborative query (CQ) processing, by leveraging CED collaboration, the advantages of both cloud computing and edge resources can be fully integrated. However, it is difficult to implement collaborative…

数据库 · 计算机科学 2025-08-22 Chunyu Zhao , Hongzhi Wang , Kaixin Zhang , Hongliang Li , Yihan Zhang , Jiawei Zhang , Kunkai Gu , Yuan Tian , Xiangdong Huang , Jingyi Xu

Temporal $k$-Core Query, Revisited

Querying cohesive subgraphs in temporal graphs is essential for understanding the dynamic structure of real-world networks, such as evolving communities in social platforms, shifting hyperlink structures on the Web, and transient…

数据库 · 计算机科学 2025-08-22 Yinyu Liu , Kaiqiang Yu , Shengxin Liu , Cheng Long , Zhaoquan Gu

Random Sampling over Spatial Range Joins

Spatial range joins have many applications, including geographic information systems, location-based social networking services, neuroscience, and visualization. However, joins incur not only expensive computational costs but also too large…

数据库 · 计算机科学 2025-08-22 Daichi Amagata

A DBMS-independent approach for capturing provenance polynomials through query rewriting

In today's data-driven ecosystems, ensuring data integrity, traceability and accountability is important. Provenance polynomials constitute a powerful formalism for tracing the origin and the derivations made to produce database query…

数据库 · 计算机科学 2025-08-21 Paulo Pintor , Rogério Costa , José Moreira

Efficient Size Constraint Community Search over Heterogeneous Information Networks

The goal of community search in heterogeneous information networks (HINs) is to identify a set of closely related target nodes that includes a query target node. In practice, a size constraint is often imposed due to limited resources,…

数据库 · 计算机科学 2025-08-21 Xinjian Zhang , Lu Chen , Chengfei Liu , Rui Zhou , Bo Ning

Accelerating K-Core Computation in Temporal Graphs

We address the problem of enumerating all temporal k-cores given a query time range and a temporal graph, which suffers from poor efficiency and scalability in the state-of-the-art solution. Motivated by an existing concept called core…

数据库 · 计算机科学 2025-08-21 Zhuo Ma , Dong Wen , Hanchen Wang , Wentao Li , Wenjie Zhang , Xuemin Lin

Common Data Format (CDF): A Standardized Format for Match-Data in Football (Soccer)

During football matches, a variety of different parties (e.g., companies) each collect (possibly overlapping) data about the match ranging from basic information (e.g., starting players) to detailed positional data. This data is provided to…

数据库 · 计算机科学 2025-08-21 Gabriel Anzer , Kilian Arnsmeyer , Pascal Bauer , Joris Bekkers , Ulf Brefeld , Jesse Davis , Nicolas Evans , Matthias Kempe , Samuel J Robertson , Joshua Wyatt Smith , Jan Van Haaren

Query Logs Analytics: A Aystematic Literature Review

In the digital era, user interactions with various resources such as databases, data warehouses, websites, and knowledge graphs (KGs) are increasingly mediated through digital platforms. These interactions leave behind digital traces,…

数据库 · 计算机科学 2025-08-20 Dihia Lanasri

Scavenger+: Revisiting Space-Time Tradeoffs in Key-Value Separated LSM-trees

Key-Value Stores (KVS) based on log-structured merge-trees (LSM-trees) are widely used in storage systems but face significant challenges, such as high write amplification caused by compaction. KV-separated LSM-trees address write…

数据库 · 计算机科学 2025-08-20 Jianshun Zhang , Fang Wang , Jiaxin Ou , Yi Wang , Ming Zhao , Sheng Qiu , Junxun Huang , Baoquan Li , Peng Fang , Dan Feng

Scavenger: Better Space-Time Trade-Offs for Key-Value Separated LSM-trees

Key-Value Stores (KVS) implemented with log-structured merge-tree (LSM-tree) have gained widespread acceptance in storage systems. Nonetheless, a significant challenge arises in the form of high write amplification due to the compaction…

数据库 · 计算机科学 2025-08-20 Jianshun Zhang , Fang Wang , Sheng Qiu , Yi Wang , Jiaxin Ou , Junxun Huang , Baoquan Li , Peng Fang , Dan Feng

TabulaX: Leveraging Large Language Models for Multi-Class Table Transformations

The integration of tabular data from diverse sources is often hindered by inconsistencies in formatting and representation, posing significant challenges for data analysts and personal digital assistants. Existing methods for automating…

数据库 · 计算机科学 2025-08-20 Arash Dargahi Nobari , Davood Rafiei

SPARQL in N3: SPARQL CONSTRUCT as a rule language for the Semantic Web (Extended Version)

Reasoning in the Semantic Web (SW) commonly uses Description Logics (DL) via OWL2 DL ontologies, or SWRL for variables and Horn clauses. The Rule Interchange Format (RIF) offers more expressive rules but is defined outside RDF and rarely…

数据库 · 计算机科学 2025-08-19 Dörthe Arndt , William Van Woensel , Dominik Tomaszuk

Evaluating the Quality of Open Building Datasets for Mapping Urban Inequality: A Comparative Analysis Across 5 Cities

While informal settlements lack focused development and are highly dynamic, the quality of spatial data for these places may be uncertain. This study evaluates the quality and biases of AI-generated Open Building Datasets (OBDs) generated…

数据库 · 计算机科学 2025-08-19 Franz Okyere , Meng Lu , Ansgar Brunn

Carry the Tail in Consensus Protocols

We present Carry-the-Tail, the first deterministic atomic broadcast protocol in partial synchrony that, after GST, guarantees a constant fraction of commits by non-faulty leaders against tail-forking attacks, and maintains optimal,…

数据库 · 计算机科学 2025-08-19 Suyash Gupta , Dakai Kang , Dahlia Malkhi , Mohammad Sadoghi

LSM-OPD: Boosting Scan in LSM-Trees by Enabling Direct Computing on Compressed Data

Scan-based operations, such as backstage compaction and value filtering, have emerged as the main bottleneck for LSM-Trees in supporting contemporary data-intensive applications. For slower external storage devices, such as HDD and SATA…

数据库 · 计算机科学 2025-08-19 Jianfeng Huang , Ziyao Wang , Lin Yuan , Jiajie Wen , Yihao Cao , Dongjing Miao , Yong Wang , Jiahao Zhang

DARTH: Declarative Recall Through Early Termination for Approximate Nearest Neighbor Search

Approximate Nearest Neighbor Search (ANNS) presents an inherent tradeoff between performance and recall (i.e., result quality). Each ANNS algorithm provides its own algorithm-dependent parameters to allow applications to influence the…

数据库 · 计算机科学 2025-08-19 Manos Chatzakis , Yannis Papakonstantinou , Themis Palpanas