数据库
Decentralized lending protocols, exemplified by Aave V3, have transformed financial intermediation by enabling permissionless, multi-chain borrowing and lending without intermediaries. Despite managing over $10 billion in total value…
Reinforcement learning has recently been used to enhance index structures, giving rise to reinforcement learning-enhanced spatial indices (RLESIs) that aim to improve query efficiency during index construction. However, their practical…
Traditional DBMSs execute user- or application-provided SQL queries over relational data with strong semantic guarantees and advanced query optimization, but writing complex SQL is hard and focuses only on structured tables. Contemporary…
The proliferation of large language models (LLMs) has accelerated the adoption of agent-based workflows, where multiple autonomous agents reason, invoke functions, and collaborate to compose complex data pipelines. However, current…
We study how modern database systems can leverage the Linux io_uring interface for efficient, low-overhead I/O. io_uring is an asynchronous system call batching interface that unifies storage and network operations, addressing limitations…
The SPARQL query language is the standard method to access knowledge graphs (KGs). However, formulating SPARQL queries is a significant challenge for non-expert users, and remains time-consuming for the experienced ones. Best practices…
Real-world vector embeddings are usually associated with extra labels, such as attributes and keywords. Many applications require the nearest neighbor search that contains specific labels, such as searching for product image embeddings…
The classic problem of exact subgraph matching returns those subgraphs in a large-scale data graph that are isomorphic to a given query graph, which has gained increasing importance in many real-world applications. In this paper, we propose…
Learning models over factorized joins avoids redundant computations by identifying and pre-computing shared cofactors. Previous work has investigated the performance gain when computing cofactors on traditional disk-based database systems.…
Baseline is a platform for richly structured data supporting change in multiple dimensions: mutation over time, collaboration across space, and evolution through design changes. It is built upon Operational Differencing, a new technique for…
Modern database optimizer relies on cardinality estimator, whose accuracy directly affects the optimizer's ability to choose an optimal execution plan. Recent work on data-driven methods has leveraged probabilistic models to achieve higher…
During data analysis, we are often perplexed by certain disparities observed between two groups of interest within a dataset. To better understand an observed disparity, we need explanations that can pinpoint the data regions where the…
Datasets often exhibit violations of expected monotonic trends - for example, higher education level correlating with higher average salary, newer homes being more expensive, or diabetes prevalence increasing with age. We address the…
Concurrency Control (CC) ensuring consistency of updated data is an essential element of OLTP systems. Recently, hybrid transactional/analytical processing (HTAP) systems developed for executing OLTP and OLAP have attracted much attention.…
We present a new implementation of the $D$-basis algorithm called the Small Space which considerably reduces the algorithm's memory usage for data analysis applications. The previous implementation delivers the complete set of implications…
Many managed key-value and NoSQL databases - such as Amazon DynamoDB, Azure Cosmos DB, and Google Cloud Firestore - enforce strict maximum item sizes (e.g., 400 KB in DynamoDB). This constraint imposes significant architectural challenges…
Hierarchical Navigable Small World (HNSW) is widely adopted for approximate nearest neighbor search (ANNS) for its ability to deliver high recall with low latency on large-scale, high-dimensional embeddings. The exploration factor, commonly…
The increasing demand for deep neural inference within database environments has driven the emergence of AI-native DBMSs. However, existing solutions either rely on model-centric designs requiring developers to manually select, configure,…
Operating nodes in an L1 blockchain remains costly despite recent advances in blockchain technology. One of the most resource-intensive components of a node is the blockchain database, also known as StateDB, that manages balances, nonce,…
Proceedings of the 3rd Italian Conference on Big Data and Data Science (ITADATA2024), held in Pisa, Italy, September 17-19, 2024. The Italian Conference on Big Data and Data Science (ITADATA2024) is the annual event supported by the CINI…