Related papers: Improving Seek Time for Column Store Using MMH Alg…
Memristive in-memory sorting has been proposed recently to improve hardware sorting efficiency. Using iterative in-memory min computations, data movements between memory and external processing units can be eliminated for improved latency…
In column-oriented query processing, a materialization strategy determines when lightweight positions (row IDs) are translated into tuples. It is an important part of column-store architecture, since it defines the class of supported query…
Fast nearest neighbor searching is becoming an increasingly important tool in solving many large-scale problems. Recently a number of approaches to learning data-dependent hash functions have been developed. In this work, we propose a…
The storage manager, as a key component of the database system, is responsible for organizing, reading, and delivering data to the execution engine for processing. According to the data serving mechanism, existing storage managers are…
Data compression is widely used in contemporary column-oriented DBMSes to lower space usage and to speed up query processing. Pioneering systems have introduced compression to tackle the disk bandwidth bottleneck by trading CPU processing…
Hashing methods aim to learn a set of hash functions which map the original features to compact binary codes with similarity preserving in the Hamming space. Hashing has proven a valuable tool for large-scale information retrieval. We…
The increase in the rate of data is much higher than the increase in the speed of computers, which results in a heavy emphasis on search algorithms in research literature. Searching an item in ordered list is an efficient operation in data…
The problem of fast items retrieval from a fixed collection is often encountered in most computer science areas, from operating system components to databases and user interfaces. We present an approach based on hash tables that focuses on…
This paper focuses on data structures for multi-core reachability, which is a key component in model checking algorithms and other verification methods. A cornerstone of an efficient solution is the storage of visited states. In related…
Recursive queries and recursive derived tables constitute an important part of the SQL standard. Their efficient processing is important for many real-life applications that rely on graph or hierarchy traversal. Position-enabled…
Similarity search is critical for many database applications, including the increasingly popular online services for Content-Based Multimedia Retrieval (CBMR). These services, which include image search engines, must handle an overwhelming…
Metaheuristic search methods have proven to be essential tools for tackling complex optimization challenges, but their full potential is often constrained by conventional algorithmic frameworks. In this paper, we introduce a novel approach…
Services and applications based on the Memento Aggregator can suffer from slow response times due to the federated search across web archives performed by the Memento infrastructure. In an effort to decrease the response times, we…
Column-oriented database systems have been a real game changer for the industry in recent years. Highly tuned and performant systems have evolved that provide users with the possibility of answering ad hoc queries over large datasets in an…
With the SAP HANA database, SAP offers a high-performance in-memory hybrid-store database. Hybrid-store databases---that is, databases supporting row- and column-oriented data management---are getting more and more prominent. While the…
A standard design pattern found in many concurrent data structures, such as hash tables or ordered containers, is alternation of parallelizable sections that incur no data conflicts and critical sections that must run sequentially and are…
Scalable ordered maps must ensure that range queries, which operate over many consecutive keys, provide intuitive semantics (e.g., linearizability) without degrading the performance of concurrent insertions and removals. These goals are…
Users of MapReduce often run into performance problems when they scale up their workloads. Many of the problems they encounter can be overcome by applying techniques learned from over three decades of research on parallel DBMSs. However,…
Perfect hash functions can potentially be used to compress data in connection with a variety of data management tasks. Though there has been considerable work on how to construct good perfect hash functions, there is a gap between theory…
Consistent hashing is used in distributed systems and networking applications to spread data evenly and efficiently across a cluster of nodes. In this paper, we present MementoHash, a novel consistent hashing algorithm that eliminates known…