Related papers: SIMD-Optimized Search Over Sorted Data

Efficient hybrid search algorithm on ordered datasets

The increase in the rate of data is much higher than the increase in the speed of computers, which results in a heavy emphasis on search algorithms in research literature. Searching an item in ordered list is an efficient operation in data…

Data Structures and Algorithms · Computer Science 2017-08-04 Adnan Saher Mohammed , Şahin Emrah Amrahov , Fatih V. Çelebi

Search-in-Memory (SiM): Reliable, Versatile, and Efficient Data Matching in SSD's NAND Flash Memory Chip for Data Indexing Acceleration

To index the increasing volume of data, modern data indexes are typically stored on SSDs and cached in DRAM. However, searching such an index has resulted in significant I/O traffic due to limited access locality and inefficient cache…

Hardware Architecture · Computer Science 2024-08-05 Yun-Chih Chen , Yuan-Hao Chang , Tei-Wei Kuo

Learning from Data to Speed-up Sorted Table Search Procedures: Methodology and Practical Guidelines

Sorted Table Search Procedures are the quintessential query-answering tool, with widespread usage that now includes also Web Applications, e.g, Search Engines (Google Chrome) and ad Bidding Systems (AppNexus). Speeding them up, at very…

Machine Learning · Computer Science 2020-07-31 Domenico Amato , Giosué Lo Bosco , Raffaele Giancarlo

Zoom: SSD-based Vector Search for Optimizing Accuracy, Latency and Memory

With the advancement of machine learning and deep learning, vector search becomes instrumental to many information retrieval systems, to search and find best matches to user queries based on their semantic similarities.These online services…

Computer Vision and Pattern Recognition · Computer Science 2018-09-13 Minjia Zhang , Yuxiong He

Vector operations for accelerating expensive Bayesian computations -- a tutorial guide

Many applications in Bayesian statistics are extremely computationally intensive. However, they are often inherently parallel, making them prime targets for modern massively parallel processors. Multi-core and distributed computing is…

Computation · Statistics 2021-05-10 David J. Warne , Scott A. Sisson , Christopher Drovandi

Scanning HTML at Tens of Gigabytes per Second on ARM Processors

Modern processors have instructions to process 16 bytes or more at once. These instructions are called SIMD, for single instruction, multiple data. Recent advances have leveraged SIMD instructions to accelerate parsing of common Internet…

Data Structures and Algorithms · Computer Science 2025-06-05 Daniel Lemire

SIMD-ified R-tree Query Processing and Optimization

The introduction of Single Instruction Multiple Data (SIMD) instructions in mainstream CPUs has enabled modern database engines to leverage data parallelism by performing more computation with a single instruction, resulting in a reduced…

Databases · Computer Science 2023-12-27 Yeasir Rayhan , Walid G. Aref

Standard Vs Uniform Binary Search and Their Variants in Learned Static Indexing: The Case of the Searching on Sorted Data Benchmarking Software Platform

Learned Indexes are a novel approach to search in a sorted table. A model is used to predict an interval in which to search into and a Binary Search routine is used to finalize the search. They are quite effective. For the final stage,…

Data Structures and Algorithms · Computer Science 2022-09-20 Domenico Amato , Giosuè Lo Bosco , Raffaele Giancarlo

SIMD Compression and the Intersection of Sorted Integers

Sorted lists of integers are commonly used in inverted indexes and database systems. They are often compressed in memory. We can use the SIMD instructions available in common processors to boost the speed of integer compression schemes. Our…

Information Retrieval · Computer Science 2020-04-22 Daniel Lemire , Leonid Boytsov , Nathan Kurz

Fast and Vectorizable Alternative to Binary Search in O(1) Applicable to a Wide Domain of Sorted Arrays of Floating Point Numbers

Given an array $X$ of $N+1$ strictly ordered floating point numbers and a floating point number $z$ in the interval $[X_0,X_N)$, a common problem is to find the index $i$ of the interval $[X_{i},X_{i+1})$ containing $z$. This problem arises…

Data Structures and Algorithms · Computer Science 2017-12-01 Fabio Cannizzo

Meta-Path Guided Embedding for Similarity Search in Large-Scale Heterogeneous Information Networks

Most real-world data can be modeled as heterogeneous information networks (HINs) consisting of vertices of multiple types and their relationships. Search for similar vertices of the same type in large HINs, such as bibliographic networks…

Social and Information Networks · Computer Science 2016-11-01 Jingbo Shang , Meng Qu , Jialu Liu , Lance M. Kaplan , Jiawei Han , Jian Peng

PDX: A Data Layout for Vector Similarity Search

We propose Partition Dimensions Across (PDX), a data layout for vectors (e.g., embeddings) that, similar to PAX [6], stores multiple vectors in one block, using a vertical layout for the dimensions (Figure 1). PDX accelerates exact and…

Databases · Computer Science 2025-03-07 Leonardo Kuffo , Elena Krippner , Peter Boncz

Upscaledb: Efficient Integer-Key Compression in a Key-Value Store using SIMD Instructions

Compression can sometimes improve performance by making more of the data available to the processors faster. We consider the compression of integer keys in a B+-tree index. For this purpose, systems such as IBM DB2 use variable-byte…

Databases · Computer Science 2017-01-18 Daniel Lemire , Christoph Rupp

A General SIMD-based Approach to Accelerating Compression Algorithms

Compression algorithms are important for data oriented tasks, especially in the era of Big Data. Modern processors equipped with powerful SIMD instruction sets, provide us an opportunity for achieving better compression performance.…

Information Retrieval · Computer Science 2015-04-15 Wayne Xin Zhao , Xudong Zhang , Daniel Lemire , Dongdong Shan , Jian-Yun Nie , Hongfei Yan , Ji-Rong Wen

Efficient indexing and searching of high dimensional data has been an area of active research due to the growing exploitation of high dimensional data and the vulnerability of traditional search methods to the curse of dimensionality. This…

Information Retrieval · Computer Science 2015-05-13 Yu Zhong

SIMD-PAC-DB: Pretty Performant PAC Privacy

This work presents a highly optimized implementation of PAC-DB, a recent and promising database privacy model. We prove that our SIMD-PAC-DB can compute the same privatized answer with just a single query, instead of the 128 stochastic…

Databases · Computer Science 2026-03-20 Ilaria Battiston , Dandan Yuan , Xiaochen Zhu , Peter Boncz

Improved discrete particle swarm optimization using Bee Algorithm and multi-parent crossover method (Case study: Allocation problem and benchmark functions)

Compared to other techniques, particle swarm optimization is more frequently utilized because of its ease of use and low variability. However, it is complicated to find the best possible solution in the search space in large-scale…

Neural and Evolutionary Computing · Computer Science 2024-03-19 Hamed Zibaei , Mohammad Saadi Mesgari

Efficient and Effective Table-Centric Table Union Search in Data Lakes

In data lakes, information on the same subject is often fragmented across multiple tables. Table union search aims to find the top-k tables that can be unioned with a query table to extend it with more rows, without relying on metadata or…

Databases · Computer Science 2026-03-19 Yongkang Sun , Zhihao Ding , Huiqiang Wang , Reynold Cheng , Jieming Shi

Optimizing Index Deployment Order for Evolving OLAP (Extended Version)

Query workloads and database schemas in OLAP applications are becoming increasingly complex. Moreover, the queries and the schemas have to continually \textit{evolve} to address business requirements. During such repetitive transitions, the…

Databases · Computer Science 2015-03-19 Hideaki Kimura , Carleton Coffrin , Alexander Rasin , Stanley B. Zdonik

Faster Exact Search using Document Clustering

We show how full-text search based on inverted indices can be accelerated by clustering the documents without losing results (SeCluD -- SEarch with CLUstered Documents). We develop a fast multilevel clustering algorithm that explicitly uses…

Information Retrieval · Computer Science 2014-11-06 Jonathan Dimond , Peter Sanders