Related papers: Robust and Efficient Sorting with Offset-Value Cod…
Recent work shows how offset-value coding speeds up database query execution, not only sorting but also duplicate removal and grouping (aggregation) in sorted streams, order-preserving exchange (shuffle), merge join, and more. It already…
Bitmap indexes must be compressed to reduce input/output costs and minimize CPU usage. To accelerate logical operations (AND, OR, XOR) over bitmaps, we use techniques based on run-length encoding (RLE), such as Word-Aligned Hybrid (WAH)…
We introduce a new family of compressed data structures to efficiently store and query large string dictionaries in main memory. Our main technique is a combination of hierarchical Front-coding with ideas from longest-common-prefix…
Sorting is a fundamental operation in various applications and a traditional research topic in computer science. Improving the performance of sorting operations can have a significant impact on many application domains. For high-performance…
Database query processing requires algorithms for duplicate removal, grouping, and aggregation. Three algorithms exist: in-stream aggregation is most efficient by far but requires sorted input; sort-based aggregation relies on external…
The internet contains large amounts of low-quality content, yet users expect web search engines to deliver high-quality, relevant results. The abundant presence of low-quality pages can negatively impact retrieval and crawling processes by…
In this paper we propose an index key compression scheme based on the notion of distinction bits by proving that the distinction bits of index keys are sufficient information to determine the sorted order of the index keys correctly. While…
It has been shown in the indexing literature that there is an essential difference between prefix/range searches on the one hand, and predecessor/rank searches on the other hand, in that the former provably allows faster query resolution.…
Sorting is a fundamental operation in computer science and is a bottleneck in many important fields. Sorting is critical to database applications, online search and indexing,biomedical computing, and many other applications. The explosive…
Similarity search approaches based on graph walks have recently attained outstanding speed-accuracy trade-offs, taking aside the memory requirements. In this paper, we revisit these approaches by considering, additionally, the memory…
Sorting and hashing are two completely different concepts in computer science, and appear mutually exclusive to one another. Hashing is a search method using the data as a key to map to the location within memory, and is used for rapid…
Sorting is one of the most basic algorithms, and developing highly parallel sorting programs is becoming increasingly important in high-performance computing because the number of CPU cores per node in modern supercomputers tends to…
In the age of big data, sorting is an indispensable operation for DBMSes and similar systems. Having data sorted can help produce query plans with significantly lower run times. It also can provide other benefits like having non-blocking…
Order-preserving pattern matching was introduced recently but it has already attracted much attention. Given a reference sequence and a pattern, we want to locate all substrings of the reference sequence whose elements have the same…
The quest for efficient sorting is ongoing, and we will explore a graph-based stable sorting strategy, in particular employing comparison graphs. We use the topological sort to map the comparison graph to a linear domain, and we can…
Column-oriented indexes-such as projection or bitmap indexes-are compressed by run-length encoding to reduce storage and increase speed. Sorting the tables improves compression. On realistic data sets, permuting the columns in the right…
The data structure at the core of large-scale search engines is the inverted index, which is essentially a collection of sorted integer sequences called inverted lists. Because of the many documents indexed by such engines and stringent…
Sorting is a fundamental and well studied problem that has been studied extensively. Sorting plays an important role in the area of databases, as many queries can be served much faster if the relations are first sorted. One of the most…
Representing sorted integer sequences in small space is a central problem for large-scale retrieval systems such as Web search engines. Efficient query resolution, e.g., intersection or random access, is achieved by carefully partitioning…
Finding desired information from large data set is a difficult problem. Information retrieval is concerned with the structure, analysis, organization, storage, searching, and retrieval of information. Index is the main constituent of an IR…