Related papers: SIMD-Optimized Search Over Sorted Data
The increase in the rate of data is much higher than the increase in the speed of computers, which results in a heavy emphasis on search algorithms in research literature. Searching an item in ordered list is an efficient operation in data…
To index the increasing volume of data, modern data indexes are typically stored on SSDs and cached in DRAM. However, searching such an index has resulted in significant I/O traffic due to limited access locality and inefficient cache…
Sorted Table Search Procedures are the quintessential query-answering tool, with widespread usage that now includes also Web Applications, e.g, Search Engines (Google Chrome) and ad Bidding Systems (AppNexus). Speeding them up, at very…
With the advancement of machine learning and deep learning, vector search becomes instrumental to many information retrieval systems, to search and find best matches to user queries based on their semantic similarities.These online services…
Many applications in Bayesian statistics are extremely computationally intensive. However, they are often inherently parallel, making them prime targets for modern massively parallel processors. Multi-core and distributed computing is…
Modern processors have instructions to process 16 bytes or more at once. These instructions are called SIMD, for single instruction, multiple data. Recent advances have leveraged SIMD instructions to accelerate parsing of common Internet…
The introduction of Single Instruction Multiple Data (SIMD) instructions in mainstream CPUs has enabled modern database engines to leverage data parallelism by performing more computation with a single instruction, resulting in a reduced…
Learned Indexes are a novel approach to search in a sorted table. A model is used to predict an interval in which to search into and a Binary Search routine is used to finalize the search. They are quite effective. For the final stage,…
Sorted lists of integers are commonly used in inverted indexes and database systems. They are often compressed in memory. We can use the SIMD instructions available in common processors to boost the speed of integer compression schemes. Our…
Given an array $X$ of $N+1$ strictly ordered floating point numbers and a floating point number $z$ in the interval $[X_0,X_N)$, a common problem is to find the index $i$ of the interval $[X_{i},X_{i+1})$ containing $z$. This problem arises…
Most real-world data can be modeled as heterogeneous information networks (HINs) consisting of vertices of multiple types and their relationships. Search for similar vertices of the same type in large HINs, such as bibliographic networks…
We propose Partition Dimensions Across (PDX), a data layout for vectors (e.g., embeddings) that, similar to PAX [6], stores multiple vectors in one block, using a vertical layout for the dimensions (Figure 1). PDX accelerates exact and…
Compression can sometimes improve performance by making more of the data available to the processors faster. We consider the compression of integer keys in a B+-tree index. For this purpose, systems such as IBM DB2 use variable-byte…
Compression algorithms are important for data oriented tasks, especially in the era of Big Data. Modern processors equipped with powerful SIMD instruction sets, provide us an opportunity for achieving better compression performance.…
Efficient indexing and searching of high dimensional data has been an area of active research due to the growing exploitation of high dimensional data and the vulnerability of traditional search methods to the curse of dimensionality. This…
This work presents a highly optimized implementation of PAC-DB, a recent and promising database privacy model. We prove that our SIMD-PAC-DB can compute the same privatized answer with just a single query, instead of the 128 stochastic…
Compared to other techniques, particle swarm optimization is more frequently utilized because of its ease of use and low variability. However, it is complicated to find the best possible solution in the search space in large-scale…
In data lakes, information on the same subject is often fragmented across multiple tables. Table union search aims to find the top-k tables that can be unioned with a query table to extend it with more rows, without relying on metadata or…
Query workloads and database schemas in OLAP applications are becoming increasingly complex. Moreover, the queries and the schemas have to continually \textit{evolve} to address business requirements. During such repetitive transitions, the…
We show how full-text search based on inverted indices can be accelerated by clustering the documents without losing results (SeCluD -- SEarch with CLUstered Documents). We develop a fast multilevel clustering algorithm that explicitly uses…