Related papers: On Optimally Partitioning Variable-Byte Codes

Vectorized VByte Decoding

We consider the ubiquitous technique of VByte compression, which represents each integer as a variable length sequence of bytes. The low 7 bits of each byte encode a portion of the integer, and the high bit of each byte is reserved as a…

Information Retrieval · Computer Science 2017-01-17 Jeff Plaisance , Nathan Kurz , Daniel Lemire

Decoding billions of integers per second through vectorization

In many important applications -- such as search engines and relational database systems -- data is stored in the form of arrays of integers. Encoding and, most importantly, decoding of these arrays consumes considerable CPU time.…

Information Retrieval · Computer Science 2021-02-02 Daniel Lemire , Leonid Boytsov

Techniques for Inverted Index Compression

The data structure at the core of large-scale search engines is the inverted index, which is essentially a collection of sorted integer sequences called inverted lists. Because of the many documents indexed by such engines and stringent…

Information Retrieval · Computer Science 2022-02-08 Giulio Ermanno Pibiri , Rossano Venturini

Re-Pair Compression of Inverted Lists

Compression of inverted lists with methods that support fast intersection operations is an active research topic. Most compression schemes rely on encoding differences between consecutive positions with techniques that favor small numbers.…

Information Retrieval · Computer Science 2009-11-18 Francisco Claude , Antonio Farina , Gonzalo Navarro

On Slicing Sorted Integer Sequences

Representing sorted integer sequences in small space is a central problem for large-scale retrieval systems such as Web search engines. Efficient query resolution, e.g., intersection or random access, is achieved by carefully partitioning…

Information Retrieval · Computer Science 2019-07-23 Giulio Ermanno Pibiri

Factorization-based Lossless Compression of Inverted Indices

Many large-scale Web applications that require ranked top-k retrieval such as Web search and online advertising are implemented using inverted indices. An inverted index represents a sparse term-document matrix, where non-zero elements…

Information Retrieval · Computer Science 2015-03-19 George Beskales , Marcus Fontoura , Maxim Gurevich , Sergei Vassilvitskii , Vanja Josifovski

Compressing integer lists with Contextual Arithmetic Trits

Inverted indexes allow to query large databases without needing to search in the database at each query. An important line of research is to construct the most efficient inverted indexes, both in terms of compression ratio and time…

Databases · Computer Science 2025-05-06 Yann Barsamian , André Chailloux

Upscaledb: Efficient Integer-Key Compression in a Key-Value Store using SIMD Instructions

Compression can sometimes improve performance by making more of the data available to the processors faster. We consider the compression of integer keys in a B+-tree index. For this purpose, systems such as IBM DB2 use variable-byte…

Databases · Computer Science 2017-01-18 Daniel Lemire , Christoph Rupp

Time-universal data compression and prediction

Suppose there is a large file which should be transmitted (or stored) and there are several (say, m) admissible data-compressors. It seems natural to try all the compressors and then choose the best, i.e. the one that gives the shortest…

Information Theory · Computer Science 2018-09-11 Boris Ryabko

Optimal Gradient Compression for Distributed and Federated Learning

Communicating information, like gradient vectors, between computing nodes in distributed and federated learning is typically an unavoidable burden, resulting in scalability issues. Indeed, communication might be slow and costly. Recent…

Machine Learning · Computer Science 2020-10-08 Alyazeed Albasyoni , Mher Safaryan , Laurent Condat , Peter Richtárik

Optimal Compression of Unit Norm Vectors in the High Distortion Regime

Motivated by the need for communication-efficient distributed learning, we investigate the method for compressing a unit norm vector into the minimum number of bits, while still allowing for some acceptable level of distortion in recovery.…

Information Theory · Computer Science 2024-02-06 Heng Zhu , Avishek Ghosh , Arya Mazumdar

On optimally partitioning a text to improve its compression

In this paper we investigate the problem of partitioning an input string T in such a way that compressing individually its parts via a base-compressor C gets a compressed output that is shorter than applying C over the entire T at once.…

Data Structures and Algorithms · Computer Science 2009-06-26 Paolo Ferragina , Igor Nitto , Rossano Venturini

Stream VByte: Faster Byte-Oriented Integer Compression

Arrays of integers are often compressed in search engines. Though there are many ways to compress integers, we are interested in the popular byte-oriented integer compression techniques (e.g., VByte or Google's Varint-GB). They are…

Information Retrieval · Computer Science 2017-10-11 Daniel Lemire , Nathan Kurz , Christoph Rupp

Forward Index Compression for Learned Sparse Retrieval

Text retrieval using learned sparse representations of queries and documents has, over the years, evolved into a highly effective approach to search. It is thanks to recent advances in approximate nearest neighbor search-with the emergence…

Information Retrieval · Computer Science 2026-02-06 Sebastian Bruch , Martino Fontana , Franco Maria Nardini , Cosimo Rulli , Rossano Venturini

Optimal Compression for Two-Field Entries in Fixed-Width Memories

Data compression is a well-studied (and well-solved) problem in the setup of long coding blocks. But important emerging applications need to compress data to memory words of small fixed widths. This new setup is the subject of this paper.…

Information Theory · Computer Science 2017-01-12 Ori Rottenstreich , Yuval Cassuto

A New Compression Based Index Structure for Efficient Information Retrieval

Finding desired information from large data set is a difficult problem. Information retrieval is concerned with the structure, analysis, organization, storage, searching, and retrieval of information. Index is the main constituent of an IR…

Information Retrieval · Computer Science 2012-09-26 Md. Abdullah al Mamun , Md. Hanif , Md. Rakib Uddin , Tanvir Ahmed , Md. Mofizul Islam

Rate-Distortion Optimization for Transformer Inference

Transformers achieve superior performance on many tasks, but impose heavy compute and memory requirements during inference. This inference can be made more efficient by partitioning the process across multiple devices, which, in turn,…

Machine Learning · Computer Science 2026-04-21 Anderson de Andrade , Alon Harell , Ivan V. Bajić

Efficient Querying from Weighted Binary Codes

Binary codes are widely used to represent the data due to their small storage and efficient computation. However, there exists an ambiguity problem that lots of binary codes share the same Hamming distance to a query. To alleviate the…

Computer Vision and Pattern Recognition · Computer Science 2020-06-12 Zhenyu Weng , Yuesheng Zhu

Optimal Random Access and Conditional Lower Bounds for 2D Compressed Strings

Compressed indexing is a powerful technique that enables efficient querying over data stored in compressed form, significantly reducing memory usage and often accelerating computation. While extensive progress has been made for…

Data Structures and Algorithms · Computer Science 2025-10-23 Rajat De , Dominik Kempa

Tri de la table de faits et compression des index bitmaps avec alignement sur les mots

Bitmap indexes are frequently used to index multidimensional data. They rely mostly on sequential input/output. Bitmaps can be compressed to reduce input/output costs and minimize CPU usage. The most efficient compression techniques are…

Databases · Computer Science 2008-08-15 Kamel Aouiche , Daniel Lemire , Owen Kaser