Related papers: Upscaledb: Efficient Integer-Key Compression in a …
In many important applications -- such as search engines and relational database systems -- data is stored in the form of arrays of integers. Encoding and, most importantly, decoding of these arrays consumes considerable CPU time.…
Sorted lists of integers are commonly used in inverted indexes and database systems. They are often compressed in memory. We can use the SIMD instructions available in common processors to boost the speed of integer compression schemes. Our…
Compression algorithms are important for data oriented tasks, especially in the era of Big Data. Modern processors equipped with powerful SIMD instruction sets, provide us an opportunity for achieving better compression performance.…
Compressing integer keys is a fundamental operation among multiple communities, such as database management (DB), information retrieval (IR), and high-performance computing (HPC). Recent advances in \emph{learned indexes} have inspired the…
In-memory columnar databases have become mainstream over the last decade and have vastly improved the fast processing of large volumes of data through multi-core parallelism and in-memory compression thereby eliminating the usual…
In this paper we propose an index key compression scheme based on the notion of distinction bits by proving that the distinction bits of index keys are sufficient information to determine the sorted order of the index keys correctly. While…
The ubiquitous Variable-Byte encoding is one of the fastest compressed representation for integer sequences. However, its compression ratio is usually not competitive with other more sophisticated encoders, especially when the integers to…
Compressed indexing is a powerful technique that enables efficient querying over data stored in compressed form, significantly reducing memory usage and often accelerating computation. While extensive progress has been made for…
Recent research found that cloud data warehouses are text-heavy. However, their capabilities for efficiently processing string columns remain limited, relying primarily on techniques like dictionary encoding and prefix-based partition…
Modern RDBMSs support the ability to compress data using methods such as null suppression and dictionary encoding. Data compression offers the promise of significantly reducing storage requirements and improving I/O performance for decision…
We propose BS-tree, an in-memory implementation of the B+-tree that adopts the structure of the disk-based index (i.e., a balanced, multiway tree), setting the node size to a memory block that can be processed fast and in parallel using…
It is commonly accepted in the practice of on-line analytical processing of databases that the multidimensional database organization is less scalable than the relational one. It is easy to see that the size of the multidimensional…
As a core component in modern data centers, key-value cache provides high-throughput and low-latency services for high-speed data processing. The effectiveness of a key-value cache relies on its ability of accommodating the needed data.…
Industry-scale recommender systems face a core challenge: representing entities with high cardinality, such as users or items, using dense embeddings that must be accessible during both training and inference. However, as embedding sizes…
Data compression is widely used in contemporary column-oriented DBMSes to lower space usage and to speed up query processing. Pioneering systems have introduced compression to tackle the disk bandwidth bottleneck by trading CPU processing…
To alleviate the problem of information explosion, recommender systems are widely deployed to provide personalized information filtering services. Usually, embedding tables are employed in recommender systems to transform high-dimensional…
Data compression has been widely applied in many data processing areas. Compression methods use variable-size codes with the shorter codes assigned to symbols or groups of symbols that appear in the data frequently. Fibonacci coding, as a…
The aim of this paper is to examine and demonstrate how integer-based datetime labels (integer surrogate keys for time) can optimize data-warehouse and time-series performance, proposing practical formats and algorithms and validating their…
The ever-increasing size and computational complexity of today's machine-learning algorithms pose an increasing strain on the underlying hardware. In this light, novel and dedicated architectural solutions are required to optimize energy…
Converting binary integers to variable-length decimal strings is a fundamental operation in computing. Conventional fast approaches rely on recursive division and small lookup tables. We propose a SIMD-based algorithm that leverages integer…