数据库
Sharding has emerged as a critical technique for enhancing blockchain system scalability. However, existing sharding approaches face unique challenges when applied to Directed Acyclic Graph (DAG)-based protocols that integrate expressive…
Cardinality estimation (CardEst) is essential for optimizing query execution plans. Recent ML-based CardEst methods achieve high accuracy but face deployment challenges due to high preparation costs and lack of transferability across…
Graph-based high-dimensional vector indices have become a mainstream solution for large-scale approximate nearest neighbor search (ANNS). However, their substantial memory footprint often requires storage on secondary devices, where…
Similarity-based vector search underpins many important applications, but a key challenge is processing massive vector datasets (e.g., in TBs). To reduce costs, some systems utilize SSDs as the primary data storage. They employ a proximity…
In cloud-edge-device (CED) collaborative query (CQ) processing, by leveraging CED collaboration, the advantages of both cloud computing and edge resources can be fully integrated. However, it is difficult to implement collaborative…
Querying cohesive subgraphs in temporal graphs is essential for understanding the dynamic structure of real-world networks, such as evolving communities in social platforms, shifting hyperlink structures on the Web, and transient…
Spatial range joins have many applications, including geographic information systems, location-based social networking services, neuroscience, and visualization. However, joins incur not only expensive computational costs but also too large…
In today's data-driven ecosystems, ensuring data integrity, traceability and accountability is important. Provenance polynomials constitute a powerful formalism for tracing the origin and the derivations made to produce database query…
The goal of community search in heterogeneous information networks (HINs) is to identify a set of closely related target nodes that includes a query target node. In practice, a size constraint is often imposed due to limited resources,…
We address the problem of enumerating all temporal k-cores given a query time range and a temporal graph, which suffers from poor efficiency and scalability in the state-of-the-art solution. Motivated by an existing concept called core…
During football matches, a variety of different parties (e.g., companies) each collect (possibly overlapping) data about the match ranging from basic information (e.g., starting players) to detailed positional data. This data is provided to…
In the digital era, user interactions with various resources such as databases, data warehouses, websites, and knowledge graphs (KGs) are increasingly mediated through digital platforms. These interactions leave behind digital traces,…
Key-Value Stores (KVS) based on log-structured merge-trees (LSM-trees) are widely used in storage systems but face significant challenges, such as high write amplification caused by compaction. KV-separated LSM-trees address write…
Key-Value Stores (KVS) implemented with log-structured merge-tree (LSM-tree) have gained widespread acceptance in storage systems. Nonetheless, a significant challenge arises in the form of high write amplification due to the compaction…
The integration of tabular data from diverse sources is often hindered by inconsistencies in formatting and representation, posing significant challenges for data analysts and personal digital assistants. Existing methods for automating…
Reasoning in the Semantic Web (SW) commonly uses Description Logics (DL) via OWL2 DL ontologies, or SWRL for variables and Horn clauses. The Rule Interchange Format (RIF) offers more expressive rules but is defined outside RDF and rarely…
While informal settlements lack focused development and are highly dynamic, the quality of spatial data for these places may be uncertain. This study evaluates the quality and biases of AI-generated Open Building Datasets (OBDs) generated…
We present Carry-the-Tail, the first deterministic atomic broadcast protocol in partial synchrony that, after GST, guarantees a constant fraction of commits by non-faulty leaders against tail-forking attacks, and maintains optimal,…
Scan-based operations, such as backstage compaction and value filtering, have emerged as the main bottleneck for LSM-Trees in supporting contemporary data-intensive applications. For slower external storage devices, such as HDD and SATA…
Approximate Nearest Neighbor Search (ANNS) presents an inherent tradeoff between performance and recall (i.e., result quality). Each ANNS algorithm provides its own algorithm-dependent parameters to allow applications to influence the…