Related papers: Geometric Covering using Random Fields

Kernelized Locality-Sensitive Hashing for Semi-Supervised Agglomerative Clustering

Large scale agglomerative clustering is hindered by computational burdens. We propose a novel scheme where exact inter-instance distance calculation is replaced by the Hamming distance between Kernelized Locality-Sensitive Hashing (KLSH)…

Machine Learning · Computer Science 2013-01-17 Boyi Xie , Shuheng Zheng

CoveringLSH: Locality-sensitive Hashing without False Negatives

We consider a new construction of locality-sensitive hash functions for Hamming space that is \emph{covering} in the sense that is it guaranteed to produce a collision for every pair of vectors within a given radius $r$. The construction is…

Data Structures and Algorithms · Computer Science 2016-01-08 Rasmus Pagh

Local Density Estimation in High Dimensions

An important question that arises in the study of high dimensional vector representations learned from data is: given a set $\mathcal{D}$ of vectors and a query $q$, estimate the number of points within a specified distance threshold of…

Data Structures and Algorithms · Computer Science 2018-09-21 Xian Wu , Moses Charikar , Vishnu Natchu

Small Covers for Near-Zero Sets of Polynomials and Learning Latent Variable Models

Let $V$ be any vector space of multivariate degree-$d$ homogeneous polynomials with co-dimension at most $k$, and $S$ be the set of points where all polynomials in $V$ {\em nearly} vanish. We establish a qualitatively optimal upper bound on…

Machine Learning · Computer Science 2020-12-15 Ilias Diakonikolas , Daniel M. Kane

Kernel K-means clustering of distributional data

We consider the problem of clustering a sample of probability distributions from a random distribution on $\mathbb R^p$. Our proposed partitioning method makes use of a symmetric, positive-definite kernel $k$ and its associated reproducing…

Machine Learning · Statistics 2025-09-23 Amparo Baíllo , Jose R. Berrendero , Martín Sánchez-Signorini

Fast Landmark Subspace Clustering

Kernel methods obtain superb performance in terms of accuracy for various machine learning tasks since they can effectively extract nonlinear relations. However, their time complexity can be rather large especially for clustering tasks. In…

Machine Learning · Statistics 2015-10-29 Xu Wang , Gilad Lerman

Helly-Type Theorems in Property Testing

Helly's theorem is a fundamental result in discrete geometry, describing the ways in which convex sets intersect with each other. If $S$ is a set of $n$ points in $R^d$, we say that $S$ is $(k,G)$-clusterable if it can be partitioned into…

Computational Geometry · Computer Science 2013-12-17 Sourav Chakraborty , Rameshwar Pratap , Sasanka Roy , Shubhangi Saraf

High Dimensional Clustering with $r$-nets

Clustering, a fundamental task in data science and machine learning, groups a set of objects in such a way that objects in the same cluster are closer to each other than to those in other clusters. In this paper, we consider a well-known…

Computational Geometry · Computer Science 2018-11-07 Georgia Avarikioti , Alain Ryser , Yuyi Wang , Roger Wattenhofer

Hypergraph Spectral Clustering in the Weighted Stochastic Block Model

Spectral clustering is a celebrated algorithm that partitions objects based on pairwise similarity information. While this approach has been successfully applied to a variety of domains, it comes with limitations. The reason is that there…

Statistics Theory · Mathematics 2018-05-24 Kwangjun Ahn , Kangwook Lee , Changho Suh

Hashing-Based-Estimators for Kernel Density in High Dimensions

Given a set of points $P\subset \mathbb{R}^{d}$ and a kernel $k$, the Kernel Density Estimate at a point $x\in\mathbb{R}^{d}$ is defined as $\mathrm{KDE}_{P}(x)=\frac{1}{|P|}\sum_{y\in P} k(x,y)$. We study the problem of designing a data…

Data Structures and Algorithms · Computer Science 2018-09-03 Moses Charikar , Paris Siminelakis

Clustering to Given Connectivities

We define a general variant of the graph clustering problem where the criterion of density for the clusters is (high) connectivity. In {\sc Clustering to Given Connectivities}, we are given an $n$-vertex graph $G$, an integer $k$, and a…

Data Structures and Algorithms · Computer Science 2018-04-23 Petr A. Golovach , Dimitrios M. Thilikos

A New Framework for Convex Clustering in Kernel Spaces: Finite Sample Bounds, Consistency and Performance Insights

Convex clustering is a well-regarded clustering method, resembling the similar centroid-based approach of Lloyd's $k$-means, without requiring a predefined cluster count. It starts with each data point as its centroid and iteratively merges…

Machine Learning · Statistics 2026-05-15 Shubhayan Pan , Kushal Bose , Debolina Paul , Saptarshi Chakraborty , Swagatam Das

Range-efficient consistent sampling and locality-sensitive hashing for polygons

Locality-sensitive hashing (LSH) is a fundamental technique for similarity search and similarity estimation in high-dimensional spaces. The basic idea is that similar objects should produce hash collisions with probability significantly…

Computational Geometry · Computer Science 2017-09-25 Joachim Gudmundsson , Rasmus Pagh

Clustering by the Probability Distributions from Extreme Value Theory

Clustering is an essential task to unsupervised learning. It tries to automatically separate instances into coherent subsets. As one of the most well-known clustering algorithms, k-means assigns sample points at the boundary to a unique…

Machine Learning · Computer Science 2022-02-22 Sixiao Zheng , Ke Fan , Yanxi Hou , Jianfeng Feng , Yanwei Fu

A Probabilistic $\ell_1$ Method for Clustering High Dimensional Data

In general, the clustering problem is NP-hard, and global optimality cannot be established for non-trivial instances. For high-dimensional data, distance-based methods for clustering or classification face an additional difficulty, the…

Statistics Theory · Mathematics 2016-04-26 Tsvetan Asamov , Adi Ben-Israel

Approximating Spectral Clustering via Sampling: a Review

Spectral clustering refers to a family of unsupervised learning algorithms that compute a spectral embedding of the original data based on the eigenvectors of a similarity graph. This non-linear transformation of the data is both the key of…

Machine Learning · Computer Science 2019-01-30 Nicolas Tremblay , Andreas Loukas

Spatial Random Sampling: A Structure-Preserving Data Sketching Tool

Random column sampling is not guaranteed to yield data sketches that preserve the underlying structures of the data and may not sample sufficiently from less-populated data clusters. Also, adaptive sampling can often provide accurate low…

Machine Learning · Computer Science 2017-10-11 Mostafa Rahmani , George Atia

Density Sensitive Hashing

Nearest neighbors search is a fundamental problem in various research fields like machine learning, data mining and pattern recognition. Recently, hashing-based approaches, e.g., Locality Sensitive Hashing (LSH), are proved to be effective…

Information Retrieval · Computer Science 2012-05-15 Yue Lin , Deng Cai , Cheng Li

Reverse Nearest Neighbors Search in High Dimensions using Locality-Sensitive Hashing

We investigate the problem of finding reverse nearest neighbors efficiently. Although provably good solutions exist for this problem in low or fixed dimensions, to this date the methods proposed in high dimensions are mostly heuristic. We…

Computational Geometry · Computer Science 2010-11-24 David Arthur , Steve Y. Oudot

Clustering for high-dimension, low-sample size data using distance vectors

In high-dimension, low-sample size (HDLSS) data, it is not always true that closeness of two objects reflects a hidden cluster structure. We point out the important fact that it is not the closeness, but the "values" of distance that…

Machine Learning · Statistics 2013-12-30 Yoshikazu Terada