Related papers: Almost Linear Time Consistent Mode Estimation and …

A Distributed and Approximated Nearest Neighbors Algorithm for an Efficient Large Scale Mean Shift Clustering

In this paper we target the class of modal clustering methods where clusters are defined in terms of the local modes of the probability density function which generates the data. The most well-known modal clustering method is the k-means…

Machine Learning · Computer Science 2022-03-04 Gaël Beck , Tarn Duong , Mustapha Lebbah , Hanane Azzag , Christophe Cérin

Fast Locality Sensitive Hashing with Theoretical Guarantee

Locality-sensitive hashing (LSH) is an effective randomized technique widely used in many machine learning tasks. The cost of hashing is proportional to data dimensions, and thus often the performance bottleneck when dimensionality is high…

Machine Learning · Computer Science 2023-09-28 Zongyuan Tan , Hongya Wang , Bo Xu , Minjie Luo , Ming Du

Kernelized Locality-Sensitive Hashing for Semi-Supervised Agglomerative Clustering

Large scale agglomerative clustering is hindered by computational burdens. We propose a novel scheme where exact inter-instance distance calculation is replaced by the Hamming distance between Kernelized Locality-Sensitive Hashing (KLSH)…

Machine Learning · Computer Science 2013-01-17 Boyi Xie , Shuheng Zheng

Density Sensitive Hashing

Nearest neighbors search is a fundamental problem in various research fields like machine learning, data mining and pattern recognition. Recently, hashing-based approaches, e.g., Locality Sensitive Hashing (LSH), are proved to be effective…

Information Retrieval · Computer Science 2012-05-15 Yue Lin , Deng Cai , Cheng Li

Range-efficient consistent sampling and locality-sensitive hashing for polygons

Locality-sensitive hashing (LSH) is a fundamental technique for similarity search and similarity estimation in high-dimensional spaces. The basic idea is that similar objects should produce hash collisions with probability significantly…

Computational Geometry · Computer Science 2017-09-25 Joachim Gudmundsson , Rasmus Pagh

Improving Locality Sensitive Hashing by Efficiently Finding Projected Nearest Neighbors

Similarity search in high-dimensional spaces is an important task for many multimedia applications. Due to the notorious curse of dimensionality, approximate nearest neighbor techniques are preferred over exact searching techniques since…

Databases · Computer Science 2020-10-16 Omid Jafari , Parth Nagarkar , Jonathan Montaño

Experimental Analysis of Locality Sensitive Hashing Techniques for High-Dimensional Approximate Nearest Neighbor Searches

Finding nearest neighbors in high-dimensional spaces is a fundamental operation in many multimedia retrieval applications. Exact tree-based indexing approaches are known to suffer from the notorious curse of dimensionality for…

Databases · Computer Science 2021-02-16 Omid Jafari , Parth Nagarkar

Bayesian Locality Sensitive Hashing for Fast Similarity Search

Given a collection of objects and an associated similarity measure, the all-pairs similarity search problem asks us to find all pairs of objects with similarity greater than a certain user-specified threshold. Locality-sensitive hashing…

Databases · Computer Science 2012-03-29 Venu Satuluri , Srinivasan Parthasarathy

DET-LSH: A Locality-Sensitive Hashing Scheme with Dynamic Encoding Tree for Approximate Nearest Neighbor Search

Locality-sensitive hashing (LSH) is a well-known solution for approximate nearest neighbor (ANN) search in high-dimensional spaces due to its robust theoretical guarantee on query accuracy. Traditional LSH-based methods mainly focus on…

Databases · Computer Science 2026-02-11 Jiuqi Wei , Botao Peng , Xiaodong Lee , Themis Palpanas

On the Consistency of Quick Shift

Quick Shift is a popular mode-seeking and clustering algorithm. We present finite sample statistical consistency guarantees for Quick Shift on mode and cluster recovery under mild distributional assumptions. We then apply our results to…

Machine Learning · Statistics 2017-12-27 Heinrich Jiang

Clustering via Mode Seeking by Direct Estimation of the Gradient of a Log-Density

Mean shift clustering finds the modes of the data probability density by identifying the zero points of the density gradient. Since it does not require to fix the number of clusters in advance, the mean shift has been a popular clustering…

Machine Learning · Statistics 2014-04-22 Hiroaki Sasaki , Aapo Hyvärinen , Masashi Sugiyama

SHADE: Deep Density-based Clustering

Detecting arbitrarily shaped clusters in high-dimensional noisy data is challenging for current clustering methods. We introduce SHADE (Structure-preserving High-dimensional Analysis with Density-based Exploration), the first deep…

Machine Learning · Computer Science 2024-10-10 Anna Beer , Pascal Weber , Lukas Miklautz , Collin Leiber , Walid Durani , Christian Böhm , Claudia Plant

qwLSH: Cache-conscious Indexing for Processing Similarity Search Query Workloads in High-Dimensional Spaces

Similarity search queries in high-dimensional spaces are an important type of queries in many domains such as image processing, machine learning, etc. Since exact similarity search indexing techniques suffer from the well-known curse of…

Databases · Computer Science 2019-07-30 Omid Jafari , John Ossorgin , Parth Nagarkar

Towards a Model for LSH

As data volumes continue to grow, clustering and outlier detection algorithms are becoming increasingly time-consuming. Classical index structures for neighbor search are no longer sustainable due to the "curse of dimensionality". Instead,…

Databases · Computer Science 2021-05-12 Li Wang

A review of mean-shift algorithms for clustering

A natural way to characterize the cluster structure of a dataset is by finding regions containing a high density of data. This can be done in a nonparametric way with a kernel density estimate, whose modes and hence clusters can be found…

Machine Learning · Computer Science 2015-03-03 Miguel Á. Carreira-Perpiñán

A Survey on Locality Sensitive Hashing Algorithms and their Applications

Finding nearest neighbors in high-dimensional spaces is a fundamental operation in many diverse application domains. Locality Sensitive Hashing (LSH) is one of the most popular techniques for finding approximate nearest neighbor searches in…

Databases · Computer Science 2021-02-18 Omid Jafari , Preeti Maurya , Parth Nagarkar , Khandker Mushfiqul Islam , Chidambaram Crushev

PDET-LSH: Scalable In-Memory Indexing for High-Dimensional Approximate Nearest Neighbor Search with Quality Guarantees

Locality-sensitive hashing (LSH) is a well-known solution for approximate nearest neighbor (ANN) search with theoretical guarantees. Traditional LSH-based methods mainly focus on improving the efficiency and accuracy of query phase by…

Databases · Computer Science 2026-03-27 Jiuqi Wei , Xiaodong Lee , Botao Peng , Quanqing Xu , Chuanhui Yang , Themis Palpanas

Automated Clustering of High-dimensional Data with a Feature Weighted Mean Shift Algorithm

Mean shift is a simple interactive procedure that gradually shifts data points towards the mode which denotes the highest density of data points in the region. Mean shift algorithms have been effectively used for data denoising, mode…

Machine Learning · Computer Science 2021-05-11 Saptarshi Chakraborty , Debolina Paul , Swagatam Das

Locality Sensitive Hashing for Set-Queries, Motivated by Group Recommendations

Locality Sensitive Hashing (LSH) is an effective method to index a set of points such that we can efficiently find the nearest neighbors of a query point. We extend this method to our novel Set-query LSH (SLSH), such that it can find the…

Data Structures and Algorithms · Computer Science 2020-04-23 Haim Kaplan , Jay Tenenbaum

Locality-Sensitive Hashing for f-Divergences: Mutual Information Loss and Beyond

Computing approximate nearest neighbors in high dimensional spaces is a central problem in large-scale data mining with a wide range of applications in machine learning and data science. A popular and effective technique in computing…

Machine Learning · Computer Science 2019-10-29 Lin Chen , Hossein Esfandiari , Thomas Fu , Vahab S. Mirrokni