Related papers: An LSM-based Tuple Compaction Framework for Apache…

Columnar Formats for Schemaless LSM-based Document Stores

In the last decade, document store database systems have gained more traction for storing and querying large volumes of semi-structured data. However, the flexibility of the document stores' data models has limited their ability to store…

Databases · Computer Science 2021-11-24 Wail Y. Alkowaileet , Michael J. Carey

Efficient Data Ingestion and Query Processing for LSM-Based Storage Systems

In recent years, the Log Structured Merge (LSM) tree has been widely adopted by NoSQL and NewSQL systems for its superior write performance. Despite its popularity, however, most existing work has focused on LSM-based key-value stores with…

Databases · Computer Science 2019-01-08 Chen Luo , Michael J. Carey

Mycelium: A Transformation-Embedded LSM-Tree

Compaction is a necessary, but often costly background process in write-optimized data structures like LSM-trees that reorganizes incoming data that is sequentially appended to logs. In this paper, we introduce Transformation-Embedded…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-06-11 Holly Casaletto , Jeff Lefevre , Aldrin Montana , Peter Alvaro

Constructing and Analyzing the LSM Compaction Design Space (Updated Version)

Log-structured merge (LSM) trees offer efficient ingestion by appending incoming data, and thus, are widely used as the storage layer of production NoSQL data stores. To enable competitive read performance, LSM-trees periodically…

Databases · Computer Science 2022-03-01 Subhadeep Sarkar , Dimitris Staratzis , Zichen Zhu , Manos Athanassoulis

On Performance Stability in LSM-based Storage Systems (Extended Version)

The Log-Structured Merge-Tree (LSM-tree) has been widely adopted for use in modern NoSQL systems for its superior write performance. Despite the popularity of LSM-trees, they have been criticized for suffering from write stalls and large…

Databases · Computer Science 2020-04-14 Chen Luo , Michael J. Carey

AutoComp: Automated Data Compaction for Log-Structured Tables in Data Lakes

The proliferation of small files in data lakes poses significant challenges, including degraded query performance, increased storage costs, and scalability bottlenecks in distributed storage systems. Log-structured table formats (LSTs) such…

Databases · Computer Science 2025-04-08 Anja Gruenheid , Jesús Camacho-Rodríguez , Carlo Curino , Raghu Ramakrishnan , Stanislav Pak , Sumedh Sakdeo , Lenisha Gandhi , Sandeep K. Singhal , Pooja Nilangekar , Daniel J. Abadi

Breaking Down Memory Walls: Adaptive Memory Management in LSM-based Storage Systems (Extended Version)

Log-Structured Merge-trees (LSM-trees) have been widely used in modern NoSQL systems. Due to their out-of-place update design, LSM-trees have introduced memory walls among the memory components of multiple LSM-trees and between the write…

Databases · Computer Science 2020-07-16 Chen Luo , Michael J. Carey

Characterize LSM-tree Compaction Performance via On-Device LLM Inference

Modern key-value storage engines built on Log-Structured Merge-trees (LSM-trees), such as RocksDB and LevelDB, rely heavily on the performance of their compaction operations, which are impacted by a complex set of interdependent…

Performance · Computer Science 2026-02-16 Jiabiao Ding , Yina Lv , Qiao Li , Zhirong Shen , Chun Jason Xue

An IDEA: An Ingestion Framework for Data Enrichment in AsterixDB

Big Data today is being generated at an unprecedented rate from various sources such as sensors, applications, and devices, and it often needs to be enriched based on other reference information to support complex analytical queries.…

Databases · Computer Science 2020-08-18 Xikui Wang , Michael J. Carey

DynaHash: Efficient Data Rebalancing in Apache AsterixDB (Extended Version)

Parallel shared-nothing data management systems have been widely used to exploit a cluster of machines for efficient and scalable data processing. When a cluster needs to be dynamically scaled in or out, data must be efficiently rebalanced.…

Databases · Computer Science 2021-05-25 Chen Luo , Michael J. Carey

Elastic Scheduling of Intermittent Query Processing in a Cluster Environment

Many applications process a stream of tuples over a window duration, and require the results within a specified deadline after the end of the window. For such scenarios, processing tuples intermittently (in batches) instead of eagerly…

Databases · Computer Science 2026-05-19 Saranya Chandrasekaran , S. Sudarshan

Introducing Schema Inference as a Scalable SQL Function [Extended Version]

This paper introduces a novel approach to schema inference as an on-demand function integrated directly within a DBMS, targeting NoSQL databases where schema flexibility can create challenges. Unlike previous methods relying on external…

Databases · Computer Science 2024-11-21 Calvin Dani , Shiva Jahangiri , Thomas Hütter

Re-enabling high-speed caching for LSM-trees

LSM-tree has been widely used in cloud computing systems by Google, Facebook, and Amazon, to achieve high performance for write-intensive workloads. However, in LSM-tree, random key-value queries can experience long latency and low…

Data Structures and Algorithms · Computer Science 2016-06-08 Lei Guo , Dejun Teng , Rubao Lee , Feng Chen , Siyuan Ma , Xiaodong Zhang

RESYSTANCE: Unleashing Hidden Performance of Compaction in LSM-trees via eBPF

The development of high-speed storage devices such as NVMe SSDs has shifted the primary I/O bottleneck from hardware to software. Modern database systems also rely on kernel-based I/O paths, where frequent system call invocations and…

Databases · Computer Science 2026-03-06 Hongsu Byun , Seungjae Lee , Honghyeon Yoo , Myoungjoon Kim , Sungyong Park

Endure: A Robust Tuning Paradigm for LSM Trees Under Workload Uncertainty

Log-Structured Merge trees (LSM trees) are increasingly used as the storage engines behind several data systems, frequently deployed in the cloud. Similar to other database architectures, LSM trees take into account information about the…

Databases · Computer Science 2021-11-04 Andy Huynh , Harshal A. Chaudhari , Evimaria Terzi , Manos Athanassoulis

AFrame: Extending DataFrames for Large-Scale Modern Data Analysis (Extended Version)

Analyzing the increasingly large volumes of data that are available today, possibly including the application of custom machine learning models, requires the utilization of distributed frameworks. This can result in serious productivity…

Databases · Computer Science 2019-08-20 Phanwadee Sinthong , Michael J. Carey

Rethinking LSM-tree based Key-Value Stores: A Survey

LSM-tree is a widely adopted data structure in modern key-value store systems that optimizes write performance in write-heavy applications by using append writes to achieve sequential writes. However, the unpredictability of LSM-tree…

Databases · Computer Science 2025-07-15 Yina Lv , Qiao Li , Quanqing Xu , Congming Gao , Chuanhui Yang , Xiaoli Wang , Chun Jason Xue

Towards Accurate and Efficient Document Analytics with Large Language Models

Unstructured data formats account for over 80% of the data currently stored, and extracting value from such formats remains a considerable challenge. In particular, current approaches for managing unstructured documents do not support…

Databases · Computer Science 2024-05-09 Yiming Lin , Madelon Hulsebos , Ruiying Ma , Shreya Shankar , Sepanta Zeigham , Aditya G. Parameswaran , Eugene Wu

The Forgotten Document-Oriented Database Management Systems: An Overview and Benchmark of Native XML DODBMSes in Comparison with JSON DODBMSes

In the current context of Big Data, a multitude of new NoSQL solutions for storing, managing, and extracting information and patterns from semi-structured data have been proposed and implemented. These solutions were developed to relieve…

Databases · Computer Science 2021-02-05 Ciprian-Octavian Truică , Elena-Simona Apostol , Jérôme Darmont , Torben Bach Pedersen

SynchroStore: A Cost-Based Fine-Grained Incremental Compaction for Hybrid Workloads

This study proposes a novel storage engine, SynchroStore, designed to address the inefficiency of update operations in columnar storage systems based on Log-Structured Merge Trees (LSM-Trees) under hybrid workload scenarios. While columnar…

Databases · Computer Science 2025-03-25 Yinan Zhang , Huiqi Hu , Xuan Zhou