David Simcha — Scifaro

Large-Scale Graph Building in Dynamic Environments: Low Latency and High Quality

Learning and constructing large-scale graphs has attracted attention in recent decades, resulting in a rich literature that introduced various systems, tools, and algorithms. Grale is one of such tools that is designed for offline…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-07-15 Filipe Miguel Gonçalves de Almeida , CJ Carey , Hendrik Fichtenberger , Jonathan Halcrow , Silvio Lattanzi , André Linhares , Tao Meng , Ashkan Norouzi-Fard , Nikos Parotsidis , Bryan Perozzi , David Simcha

SOAR: Improved Indexing for Approximate Nearest Neighbor Search

This paper introduces SOAR: Spilling with Orthogonality-Amplified Residuals, a novel data indexing technique for approximate nearest neighbor (ANN) search. SOAR extends upon previous approaches to ANN search, such as spill trees, that…

Machine Learning · Computer Science 2024-04-02 Philip Sun , David Simcha , Dave Dopson , Ruiqi Guo , Sanjiv Kumar

Scaling Hierarchical Agglomerative Clustering to Billion-sized Datasets

Hierarchical Agglomerative Clustering (HAC) is one of the oldest but still most widely used clustering methods. However, HAC is notoriously hard to scale to large data sets as the underlying complexity is at least quadratic in the number of…

Machine Learning · Computer Science 2021-05-26 Baris Sumengen , Anand Rajagopalan , Gui Citovsky , David Simcha , Olivier Bachem , Pradipta Mitra , Sam Blasiak , Mason Liang , Sanjiv Kumar

Accelerating Large-Scale Inference with Anisotropic Vector Quantization

Quantization based techniques are the current state-of-the-art for scaling maximum inner product search to massive databases. Traditional approaches to quantization aim to minimize the reconstruction error of the database points. Based on…

Machine Learning · Computer Science 2020-12-08 Ruiqi Guo , Philip Sun , Erik Lindgren , Quan Geng , David Simcha , Felix Chern , Sanjiv Kumar

Local Orthogonal Decomposition for Maximum Inner Product Search

Inverted file and asymmetric distance computation (IVFADC) have been successfully applied to approximate nearest neighbor search and subsequently maximum inner product search. In such a framework, vector quantization is used for coarse…

Machine Learning · Computer Science 2019-03-26 Xiang Wu , Ruiqi Guo , Sanjiv Kumar , David Simcha

Efficient Inner Product Approximation in Hybrid Spaces

Many emerging use cases of data mining and machine learning operate on large datasets with data from heterogeneous sources, specifically with both sparse and dense components. For example, dense deep neural network embedding vectors are…

Machine Learning · Computer Science 2019-03-22 Xiang Wu , Ruiqi Guo , David Simcha , Dave Dopson , Sanjiv Kumar

Quantization based Fast Inner Product Search

We propose a quantization based approach for fast approximate Maximum Inner Product Search (MIPS). Each database vector is quantized in multiple subspaces via a set of codebooks, learned directly by minimizing the inner product quantization…

Artificial Intelligence · Computer Science 2015-09-07 Ruiqi Guo , Sanjiv Kumar , Krzysztof Choromanski , David Simcha