Related papers: CONCISE: Compressed 'n' Composable Integer Set
Bitmap indexes must be compressed to reduce input/output costs and minimize CPU usage. To accelerate logical operations (AND, OR, XOR) over bitmaps, we use techniques based on run-length encoding (RLE), such as Word-Aligned Hybrid (WAH)…
Bitmap indexes must be compressed to reduce input/output costs and minimize CPU usage. To accelerate logical operations (AND, OR, XOR) over bitmaps, we use techniques based on run-length encoding (RLE), such as Word-Aligned Hybrid (WAH)…
Bitmap indexes are commonly used in databases and search engines. By exploiting bit-level parallelism, they can significantly accelerate queries. However, they can use much memory, and thus we might prefer compressed bitmap indexes.…
Compressed bitmap indexes are used in databases and search engines. Many bitmap compression techniques have been proposed, almost all relying primarily on run-length encoding (RLE). However, on unsorted data, we can get superior performance…
Compressed bitmap indexes are used to speed up simple aggregate queries in databases. Indeed, set operations like intersections, unions and complements can be represented as logical operations (AND,OR,NOT) that are ideally suited for…
Bitmap indexes are frequently used to index multidimensional data. They rely mostly on sequential input/output. Bitmaps can be compressed to reduce input/output costs and minimize CPU usage. The most efficient compression techniques are…
In scientific fields such as quantum computing, physics, chemistry, and machine learning, high dimensional data are typically represented using sparse tensors. Tensor contraction is a popular operation on tensors to exploit meaning or alter…
Perfect hash functions can potentially be used to compress data in connection with a variety of data management tasks. Though there has been considerable work on how to construct good perfect hash functions, there is a gap between theory…
We present a data structure that stores a sequence $s[1..n]$ over alphabet $[1..\sigma]$ in $n\Ho(s) + o(n)(\Ho(s){+}1)$ bits, where $\Ho(s)$ is the zero-order entropy of $s$. This structure supports the queries \access, \rank\ and \select,…
Tries are popular data structures for storing a set of strings, where common prefixes are represented by common root-to-node paths. Over fifty years of usage have produced many variants and implementations to overcome some of their…
Important workloads, such as machine learning and graph analytics applications, heavily involve sparse linear algebra operations. These operations use sparse matrix compression as an effective means to avoid storing zeros and performing…
Embedding image features into a binary Hamming space can improve both the speed and accuracy of large-scale query-by-example image retrieval systems. Supervised hashing aims to map the original features to compact binary codes in a manner…
The biggest cost of computing with large matrices in any modern computer is related to memory latency and bandwidth. The average latency of modern RAM reads is 150 times greater than a clock step of the processor. Throughput is a little…
Embedding tables are used by machine learning systems to work with categorical features. In modern Recommendation Systems, these tables can be very large, necessitating the development of new methods for fitting them in memory, even during…
We live in a data-driven era that involves the generation, collection and processing of a massive amount of data. This data often contains valuable intellectual property and sensitive user information that must be safeguarded. There is a…
Binary embeddings provide efficient and powerful ways to perform operations on large scale data. However binary embedding typically requires long codes in order to preserve the discriminative power of the input space. Thus binary coding…
Binary Convolutional Neural Networks (CNNs) can significantly reduce the number of arithmetic operations and the size of memory storage, which makes the deployment of CNNs on mobile or embedded systems more promising. However, the accuracy…
Rank and select queries on bitmaps are essential building bricks of many compressed data structures, including text indexes, membership and range supporting spatial data structures, compressed graphs, and more. Theoretically considered yet…
Compressive sensing (CS) has been widely used for the data gathering in wireless sensor networks for the purpose of reducing the communication overhead recent years. In this paper, we first show that with simple modification, 1-bit…
Spaced seeds are important tools for similarity search in bioinformatics, and using several seeds together often significantly improves their performance. With existing approaches, however, for each seed we keep a separate linear-size data…