English
Related papers

Related papers: Geometric Covering using Random Fields

200 papers

Large scale agglomerative clustering is hindered by computational burdens. We propose a novel scheme where exact inter-instance distance calculation is replaced by the Hamming distance between Kernelized Locality-Sensitive Hashing (KLSH)…

Machine Learning · Computer Science 2013-01-17 Boyi Xie , Shuheng Zheng

We consider a new construction of locality-sensitive hash functions for Hamming space that is \emph{covering} in the sense that is it guaranteed to produce a collision for every pair of vectors within a given radius $r$. The construction is…

Data Structures and Algorithms · Computer Science 2016-01-08 Rasmus Pagh

An important question that arises in the study of high dimensional vector representations learned from data is: given a set $\mathcal{D}$ of vectors and a query $q$, estimate the number of points within a specified distance threshold of…

Data Structures and Algorithms · Computer Science 2018-09-21 Xian Wu , Moses Charikar , Vishnu Natchu

Let $V$ be any vector space of multivariate degree-$d$ homogeneous polynomials with co-dimension at most $k$, and $S$ be the set of points where all polynomials in $V$ {\em nearly} vanish. We establish a qualitatively optimal upper bound on…

Machine Learning · Computer Science 2020-12-15 Ilias Diakonikolas , Daniel M. Kane

We consider the problem of clustering a sample of probability distributions from a random distribution on $\mathbb R^p$. Our proposed partitioning method makes use of a symmetric, positive-definite kernel $k$ and its associated reproducing…

Machine Learning · Statistics 2025-09-23 Amparo Baíllo , Jose R. Berrendero , Martín Sánchez-Signorini

Kernel methods obtain superb performance in terms of accuracy for various machine learning tasks since they can effectively extract nonlinear relations. However, their time complexity can be rather large especially for clustering tasks. In…

Machine Learning · Statistics 2015-10-29 Xu Wang , Gilad Lerman

Helly's theorem is a fundamental result in discrete geometry, describing the ways in which convex sets intersect with each other. If $S$ is a set of $n$ points in $R^d$, we say that $S$ is $(k,G)$-clusterable if it can be partitioned into…

Computational Geometry · Computer Science 2013-12-17 Sourav Chakraborty , Rameshwar Pratap , Sasanka Roy , Shubhangi Saraf

Clustering, a fundamental task in data science and machine learning, groups a set of objects in such a way that objects in the same cluster are closer to each other than to those in other clusters. In this paper, we consider a well-known…

Computational Geometry · Computer Science 2018-11-07 Georgia Avarikioti , Alain Ryser , Yuyi Wang , Roger Wattenhofer

Spectral clustering is a celebrated algorithm that partitions objects based on pairwise similarity information. While this approach has been successfully applied to a variety of domains, it comes with limitations. The reason is that there…

Statistics Theory · Mathematics 2018-05-24 Kwangjun Ahn , Kangwook Lee , Changho Suh

Given a set of points $P\subset \mathbb{R}^{d}$ and a kernel $k$, the Kernel Density Estimate at a point $x\in\mathbb{R}^{d}$ is defined as $\mathrm{KDE}_{P}(x)=\frac{1}{|P|}\sum_{y\in P} k(x,y)$. We study the problem of designing a data…

Data Structures and Algorithms · Computer Science 2018-09-03 Moses Charikar , Paris Siminelakis

We define a general variant of the graph clustering problem where the criterion of density for the clusters is (high) connectivity. In {\sc Clustering to Given Connectivities}, we are given an $n$-vertex graph $G$, an integer $k$, and a…

Data Structures and Algorithms · Computer Science 2018-04-23 Petr A. Golovach , Dimitrios M. Thilikos

Convex clustering is a well-regarded clustering method, resembling the similar centroid-based approach of Lloyd's $k$-means, without requiring a predefined cluster count. It starts with each data point as its centroid and iteratively merges…

Machine Learning · Statistics 2026-05-15 Shubhayan Pan , Kushal Bose , Debolina Paul , Saptarshi Chakraborty , Swagatam Das

Locality-sensitive hashing (LSH) is a fundamental technique for similarity search and similarity estimation in high-dimensional spaces. The basic idea is that similar objects should produce hash collisions with probability significantly…

Computational Geometry · Computer Science 2017-09-25 Joachim Gudmundsson , Rasmus Pagh

Clustering is an essential task to unsupervised learning. It tries to automatically separate instances into coherent subsets. As one of the most well-known clustering algorithms, k-means assigns sample points at the boundary to a unique…

Machine Learning · Computer Science 2022-02-22 Sixiao Zheng , Ke Fan , Yanxi Hou , Jianfeng Feng , Yanwei Fu

In general, the clustering problem is NP-hard, and global optimality cannot be established for non-trivial instances. For high-dimensional data, distance-based methods for clustering or classification face an additional difficulty, the…

Statistics Theory · Mathematics 2016-04-26 Tsvetan Asamov , Adi Ben-Israel

Spectral clustering refers to a family of unsupervised learning algorithms that compute a spectral embedding of the original data based on the eigenvectors of a similarity graph. This non-linear transformation of the data is both the key of…

Machine Learning · Computer Science 2019-01-30 Nicolas Tremblay , Andreas Loukas

Random column sampling is not guaranteed to yield data sketches that preserve the underlying structures of the data and may not sample sufficiently from less-populated data clusters. Also, adaptive sampling can often provide accurate low…

Machine Learning · Computer Science 2017-10-11 Mostafa Rahmani , George Atia

Nearest neighbors search is a fundamental problem in various research fields like machine learning, data mining and pattern recognition. Recently, hashing-based approaches, e.g., Locality Sensitive Hashing (LSH), are proved to be effective…

Information Retrieval · Computer Science 2012-05-15 Yue Lin , Deng Cai , Cheng Li

We investigate the problem of finding reverse nearest neighbors efficiently. Although provably good solutions exist for this problem in low or fixed dimensions, to this date the methods proposed in high dimensions are mostly heuristic. We…

Computational Geometry · Computer Science 2010-11-24 David Arthur , Steve Y. Oudot

In high-dimension, low-sample size (HDLSS) data, it is not always true that closeness of two objects reflects a hidden cluster structure. We point out the important fact that it is not the closeness, but the "values" of distance that…

Machine Learning · Statistics 2013-12-30 Yoshikazu Terada
‹ Prev 1 2 3 10 Next ›