数据库
SQL/PGQ is a new standard that integrates graph querying into relational systems, allowing users to freely switch between graph patterns and SQL. Our experiments show performance gaps between these models, as queries written in both…
In the current era of data-intensive applications, the demand for high-performance, cost-effective storage solutions is paramount. This paper introduces a Space-Performance Cost Model for key-value store, designed to guide cost-effective…
Filtered approximate nearest neighbor search (FANNS), an extension of approximate nearest neighbor search (ANNS) that incorporates scalar filters, has been widely applied to constrained retrieval of vector data. Despite its growing…
The Next Token Prediction paradigm (NTP, for short) lies at the forefront of modern large foundational models that are pre-trained on diverse and large datasets. These models generalize effectively, and have proven to be very successful in…
Estimating the cardinality of the output of a query is a fundamental problem in database query processing. In this article, we overview a recently published contribution that casts the cardinality estimation problem as linear optimization…
Line charts are a valuable tool for data analysis and exploration, distilling essential insights from a dataset. However, access to the underlying dataset behind a line chart is rarely readily available. In this paper, we explore a novel…
We present TODS, an automated Time Series Outlier Detection System for research and industrial applications. TODS is a highly modular system that supports easy pipeline construction. The basic building block of TODS is primitive, which is…
The European Union Deforestation Regulation (EUDR) requires companies to prove their products do not contribute to deforestation, creating a critical demand for precise, asset-level environmental impact data. Current databases lack the…
Data is undoubtedly becoming a commodity like oil, land, and labor in the 21st century. Although there have been many successful marketplaces for data trading, the existing data marketplaces lack consideration of the case where buyers want…
High-resolution energy consumption and emissions datasets are essential for localized policy-making, resource optimization, and climate action planning. They enable municipalities to monitor mitigation strategies and foster engagement among…
Spurred by a number of recent trends, we make the case that the relational database systems should urgently move beyond supporting the basic object-relational model and instead embrace a more abstract data model, specifically, the…
Bloom filters are used in query processing to perform early data reduction and improve query performance. The optimal query plan may be different when Bloom filters are used, indicating the need for Bloom filter-aware query optimization. To…
Index tuning is a time-consuming process. One major performance bottleneck in existing index tuning systems is the large amount of "what-if" query optimizer calls that estimate the cost of a given pair of query and index configuration…
Hypergraphs offer a powerful framework for modeling higher-order interactions that traditional pairwise graphs cannot fully capture. However, practical constraints often lead to their simplification into projected graphs, resulting in…
In recent years, several information-theoretic upper bounds have been introduced on the output size and evaluation cost of database join queries. These bounds vary in their power depending on both the type of statistics on input relations…
Index tuning aims to find the optimal index configuration for an input workload. It is often a time-consuming and resource-intensive process, largely attributed to the huge amount of "what-if" calls made to the query optimizer during…
Query optimization is critical in relational databases. Recently, numerous Learned Query Optimizers (LQOs) have been proposed, demonstrating superior performance over traditional hand-crafted query optimizers after short training periods.…
Space-filling curves (SFC, for short) have been widely applied to index multi-dimensional data, which first maps the data to one dimension, and then a one-dimensional indexing method, e.g., the B-tree indexes the mapped data. Existing SFCs…
Distributed databases, as the core infrastructure software for internet applications, play a critical role in modern cloud services. However, existing distributed databases frequently experience system failures and performance degradation,…
The rapid adoption of AI-powered applications demands high-performance, scalable, and efficient cloud database solutions, as traditional architectures often struggle with AI-driven workloads requiring real-time data access, vector search,…