Related papers: Sorting improves word-aligned bitmap indexes

Histogram-Aware Sorting for Enhanced Word-Aligned Compression in Bitmap Indexes

Bitmap indexes must be compressed to reduce input/output costs and minimize CPU usage. To accelerate logical operations (AND, OR, XOR) over bitmaps, we use techniques based on run-length encoding (RLE), such as Word-Aligned Hybrid (WAH)…

Databases · Computer Science 2009-01-19 Owen Kaser , Daniel Lemire , Kamel Aouiche

Tri de la table de faits et compression des index bitmaps avec alignement sur les mots

Bitmap indexes are frequently used to index multidimensional data. They rely mostly on sequential input/output. Bitmaps can be compressed to reduce input/output costs and minimize CPU usage. The most efficient compression techniques are…

Databases · Computer Science 2008-08-15 Kamel Aouiche , Daniel Lemire , Owen Kaser

Better bitmap performance with Roaring bitmaps

Bitmap indexes are commonly used in databases and search engines. By exploiting bit-level parallelism, they can significantly accelerate queries. However, they can use much memory, and thus we might prefer compressed bitmap indexes.…

Databases · Computer Science 2016-04-12 Samy Chambi , Daniel Lemire , Owen Kaser , Robert Godin

Reordering Columns for Smaller Indexes

Column-oriented indexes-such as projection or bitmap indexes-are compressed by run-length encoding to reduce storage and increase speed. Sorting the tables improves compression. On realistic data sets, permuting the columns in the right…

Databases · Computer Science 2015-03-13 Daniel Lemire , Owen Kaser

Consistently faster and smaller compressed bitmaps with Roaring

Compressed bitmap indexes are used in databases and search engines. Many bitmap compression techniques have been proposed, almost all relying primarily on run-length encoding (RLE). However, on unsorted data, we can get superior performance…

Databases · Computer Science 2018-03-05 Daniel Lemire , Gregory Ssi-Yan-Kai , Owen Kaser

CONCISE: Compressed 'n' Composable Integer Set

Bit arrays, or bitmaps, are used to significantly speed up set operations in several areas, such as data warehousing, information retrieval, and data mining, to cite a few. However, bitmaps usually use a large storage space, thus requiring…

Data Structures and Algorithms · Computer Science 2015-03-14 Alessandro Colantonio , Roberto Di Pietro

Reordering Rows for Better Compression: Beyond the Lexicographic Order

Sorting database tables before compressing them improves the compression rate. Can we do better than the lexicographical order? For minimizing the number of runs in a run-length encoding compression scheme, the best approaches to…

Databases · Computer Science 2014-02-04 Daniel Lemire , Owen Kaser , Eduardo Gutarra

Faster Radix Sort via Virtual Memory and Write-Combining

Sorting algorithms are the deciding factor for the performance of common operations such as removal of duplicates or database sort-merge joins. This work focuses on 32-bit integer keys, optionally paired with a 32-bit value. We present a…

Data Structures and Algorithms · Computer Science 2010-09-07 Jan Wassenberg , Peter Sanders

Hash sort: A linear time complexity multiple-dimensional sort algorithm

Sorting and hashing are two completely different concepts in computer science, and appear mutually exclusive to one another. Hashing is a search method using the data as a key to map to the location within memory, and is used for rapid…

Data Structures and Algorithms · Computer Science 2007-05-23 William F. Gilreath

Communication-Efficient String Sorting

There has been surprisingly little work on algorithms for sorting strings on distributed-memory parallel machines. We develop efficient algorithms for this problem based on the multi-way merging principle. These algorithms inspect only…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-01-24 Timo Bingmann , Peter Sanders , Matthias Schimek

Engineering Rank/Select Data Structures for Large-Alphabet Strings

Large-alphabet strings are common in scenarios such as information retrieval and natural-language processing. The efficient storage and processing of such strings usually introduces several challenges that are not witnessed in…

Data Structures and Algorithms · Computer Science 2024-05-03 Diego Arroyuelo , Gabriel Carmona , Héctor Larrañaga , Francisco Riveros , Carlos Eugenio Rojas-Morales , Erick Sepúlveda

Compressed bitmap indexes: beyond unions and intersections

Compressed bitmap indexes are used to speed up simple aggregate queries in databases. Indeed, set operations like intersections, unions and complements can be represented as logical operations (AND,OR,NOT) that are ideally suited for…

Databases · Computer Science 2016-01-11 Owen Kaser , Daniel Lemire

Image Compression Using Proposed Enhanced Run Length Encoding Algorithm

In this paper, we will present p roposed enhance process of image compression by using RLE algorithm. This proposed yield to decrease the size of compressing image, but the original method used primarily for compressing a binary images…

Multimedia · Computer Science 2018-04-03 Ali H. Husseen Al-nuaimi , Shyamaa Shakir Al-juboori , R. J. Mohammed

Robust and Efficient Sorting with Offset-Value Coding

Sorting and searching are large parts of database query processing, e.g., in the forms of index creation, index maintenance, and index lookup; and comparing pairs of keys is a substantial part of the effort in sorting and searching. We have…

Databases · Computer Science 2022-09-20 Thanh Do , Goetz Graefe

A New Compression Based Index Structure for Efficient Information Retrieval

Finding desired information from large data set is a difficult problem. Information retrieval is concerned with the structure, analysis, organization, storage, searching, and retrieval of information. Index is the main constituent of an IR…

Information Retrieval · Computer Science 2012-09-26 Md. Abdullah al Mamun , Md. Hanif , Md. Rakib Uddin , Tanvir Ahmed , Md. Mofizul Islam

Threshold and Symmetric Functions over Bitmaps

Bitmap indexes are routinely used to speed up simple aggregate queries in databases. Set operations such as intersections, unions and complements can be represented as logical operations (AND, OR, NOT). However, less is known about the…

Databases · Computer Science 2016-11-16 Owen Kaser , Daniel Lemire

Improving Run Length Encoding by Preprocessing

The Run Length Encoding (RLE) compression method is a long standing simple lossless compression scheme which is easy to implement and achieves a good compression on input data which contains repeating consecutive symbols. In its pure form…

Data Structures and Algorithms · Computer Science 2021-04-01 Sven Fiergolla , Petra Wolf

Efficient sorting, duplicate removal, grouping, and aggregation

Database query processing requires algorithms for duplicate removal, grouping, and aggregation. Three algorithms exist: in-stream aggregation is most efficient by far but requires sorted input; sort-based aggregation relies on external…

Databases · Computer Science 2022-09-27 Thanh Do , Goetz Graefe , Jeffrey Naughton

Compressed Key Sort and Fast Index Reconstruction

In this paper we propose an index key compression scheme based on the notion of distinction bits by proving that the distinction bits of index keys are sufficient information to determine the sorted order of the index keys correctly. While…

Databases · Computer Science 2020-09-25 Yongsik Kwon , Cheol Ryu , Sang Kyun Cha , Arthur H. Lee , Kunsoo Park , Bongki Moon

Engineering Faster Sorters for Small Sets of Items

Sorting a set of items is a task that can be useful by itself or as a building block for more complex operations. The more sophisticated and fast sorting algorithms become asymptotically, the less efficient they are for small sets of items…

Data Structures and Algorithms · Computer Science 2019-08-23 Jasper Marianczuk