Related papers: Upscaledb: Efficient Integer-Key Compression in a …

Decoding billions of integers per second through vectorization

In many important applications -- such as search engines and relational database systems -- data is stored in the form of arrays of integers. Encoding and, most importantly, decoding of these arrays consumes considerable CPU time.…

Information Retrieval · Computer Science 2021-02-02 Daniel Lemire , Leonid Boytsov

SIMD Compression and the Intersection of Sorted Integers

Sorted lists of integers are commonly used in inverted indexes and database systems. They are often compressed in memory. We can use the SIMD instructions available in common processors to boost the speed of integer compression schemes. Our…

Information Retrieval · Computer Science 2020-04-22 Daniel Lemire , Leonid Boytsov , Nathan Kurz

A General SIMD-based Approach to Accelerating Compression Algorithms

Compression algorithms are important for data oriented tasks, especially in the era of Big Data. Modern processors equipped with powerful SIMD instruction sets, provide us an opportunity for achieving better compression performance.…

Information Retrieval · Computer Science 2015-04-15 Wayne Xin Zhao , Xudong Zhang , Daniel Lemire , Dongdong Shan , Jian-Yun Nie , Hongfei Yan , Ji-Rong Wen

Learned Data Compression: Challenges and Opportunities for the Future

Compressing integer keys is a fundamental operation among multiple communities, such as database management (DB), information retrieval (IR), and high-performance computing (HPC). Recent advances in \emph{learned indexes} have inspired the…

Databases · Computer Science 2024-12-17 Qiyu Liu , Siyuan Han , Jianwei Liao , Jin Li , Jingshu Peng , Jun Du , Lei Chen

Optimizations and Heuristics to improve Compression in Columnar Database Systems

In-memory columnar databases have become mainstream over the last decade and have vastly improved the fast processing of large volumes of data through multi-core parallelism and in-memory compression thereby eliminating the usual…

Databases · Computer Science 2016-09-27 Jayanth Jayanth

Compressed Key Sort and Fast Index Reconstruction

In this paper we propose an index key compression scheme based on the notion of distinction bits by proving that the distinction bits of index keys are sufficient information to determine the sorted order of the index keys correctly. While…

Databases · Computer Science 2020-09-25 Yongsik Kwon , Cheol Ryu , Sang Kyun Cha , Arthur H. Lee , Kunsoo Park , Bongki Moon

On Optimally Partitioning Variable-Byte Codes

The ubiquitous Variable-Byte encoding is one of the fastest compressed representation for integer sequences. However, its compression ratio is usually not competitive with other more sophisticated encoders, especially when the integers to…

Information Retrieval · Computer Science 2022-02-08 Giulio Ermanno Pibiri , Rossano Venturini

Optimal Random Access and Conditional Lower Bounds for 2D Compressed Strings

Compressed indexing is a powerful technique that enables efficient querying over data stored in compressed form, significantly reducing memory usage and often accelerating computation. While extensive progress has been made for…

Data Structures and Algorithms · Computer Science 2025-10-23 Rajat De , Dominik Kempa

Instance-Optimized String Fingerprints

Recent research found that cloud data warehouses are text-heavy. However, their capabilities for efficiently processing string columns remain limited, relying primarily on techniques like dictionary encoding and prefix-based partition…

Databases · Computer Science 2025-07-15 Mihail Stoian , Johannes Thürauf , Andreas Zimmerer , Alexander van Renen , Andreas Kipf

Compression Aware Physical Database Design

Modern RDBMSs support the ability to compress data using methods such as null suppression and dictionary encoding. Data compression offers the promise of significantly reducing storage requirements and improving I/O performance for decision…

Databases · Computer Science 2011-09-06 Hideaki Kimura , Vivek Narasayya , Manoj Syamala

BS-tree: A gapped data-parallel B-tree

We propose BS-tree, an in-memory implementation of the B+-tree that adopts the structure of the disk-based index (i.e., a balanced, multiway tree), setting the node size to a memory block that can be processed fast and in parallel using…

Databases · Computer Science 2025-11-14 Dimitrios Tsitsigkos , Achilleas Michalopoulos , Nikos Mamoulis , Manolis Terrovitis

On the Scalability of Multidimensional Databases

It is commonly accepted in the practice of on-line analytical processing of databases that the multidimensional database organization is less scalable than the relational one. It is easy to see that the size of the multidimensional…

Databases · Computer Science 2011-04-27 István Szépkúti

ZipCache: A DRAM/SSD Cache with Built-in Transparent Compression

As a core component in modern data centers, key-value cache provides high-throughput and low-latency services for high-speed data processing. The effectiveness of a key-value cache relies on its ability of accommodating the needed data.…

Databases · Computer Science 2024-12-13 Rui Xie , Linsen Ma , Alex Zhong , Feng Chen , Tong Zhang

The Future is Sparse: Embedding Compression for Scalable Retrieval in Recommender Systems

Industry-scale recommender systems face a core challenge: representing entities with high cardinality, such as users or items, using dense embeddings that must be accessible during both training and inference. However, as embedding sizes…

Information Retrieval · Computer Science 2025-05-19 Petr Kasalický , Martin Spišák , Vojtěch Vančura , Daniel Bohuněk , Rodrigo Alves , Pavel Kordík

Revisiting Data Compression in Column-Stores

Data compression is widely used in contemporary column-oriented DBMSes to lower space usage and to speed up query processing. Pioneering systems have introduced compression to tackle the disk bandwidth bottleneck by trading CPU processing…

Databases · Computer Science 2021-05-20 Alexander Slesarev , Evgeniy Klyuchikov , Kirill Smirnov , George Chernishev

Embedding Compression in Recommender Systems: A Survey

To alleviate the problem of information explosion, recommender systems are widely deployed to provide personalized information filtering services. Usually, embedding tables are employed in recommender systems to transform high-dimensional…

Information Retrieval · Computer Science 2024-08-07 Shiwei Li , Huifeng Guo , Xing Tang , Ruiming Tang , Lu Hou , Ruixuan Li , Rui Zhang

The Fast Fibonacci Decompression Algorithm

Data compression has been widely applied in many data processing areas. Compression methods use variable-size codes with the shorter codes assigned to symbols or groups of symbols that appear in the data frequently. Fibonacci coding, as a…

Performance · Computer Science 2007-12-19 R. Baca , V. Snasel , J. Platos , M. Kratky , E. El-Qawasmeh

Overview and Prospects of Using Integer Surrogate Keys for Data Warehouse Performance Optimization

The aim of this paper is to examine and demonstrate how integer-based datetime labels (integer surrogate keys for time) can optimize data-warehouse and time-series performance, proposing practical formats and algorithms and validating their…

Databases · Computer Science 2025-11-19 Sviatoslav Stumpf , Vladislav Povyshev

A Soft SIMD Based Energy Efficient Computing Microarchitecture

The ever-increasing size and computational complexity of today's machine-learning algorithms pose an increasing strain on the underlying hardware. In this light, novel and dedicated architectural solutions are required to optimize energy…

Hardware Architecture · Computer Science 2022-12-20 Pengbo Yu , Alexandre Levisse , Mohit Gupta , Evenblij Timon , Giovanni Ansaloni , Francky Catthoor , David Atienza

Converting an Integer to a Decimal String in Under Two Nanoseconds

Converting binary integers to variable-length decimal strings is a fundamental operation in computing. Conventional fast approaches rely on recursive division and small lookup tables. We propose a SIMD-based algorithm that leverages integer…

Data Structures and Algorithms · Computer Science 2026-05-07 Jaël Champagne Gareau , Daniel Lemire