English
Related papers

Related papers: Cluster Diffusing Shuffles

200 papers

In many modern applications, there is interest in analyzing enormous data sets that cannot be easily moved across computers or loaded into memory on a single computer. In such settings, it is very common to be interested in clustering.…

Computation · Statistics 2020-05-15 Hanyu Song , Yingjian Wang , David B. Dunson

Many high dimensional vector distances tend to a constant. This is typically considered a negative "contrast-loss" phenomenon that hinders clustering and other machine learning techniques. We reinterpret "contrast-loss" as a blessing.…

Computer Vision and Pattern Recognition · Computer Science 2018-04-10 Wen-Yan Lin , Siying Liu , Jian-Huang Lai , Yasuyuki Matsushita

We address the problem of un-supervised soft-clustering called micro-clustering. The aim of the problem is to enumerate all groups composed of records strongly related to each other, while standard clustering methods separate records at…

Data Structures and Algorithms · Computer Science 2016-06-07 Takeaki Uno , Hiroki Maegawa , Takanobu Nakahara , Yukinobu Hamuro , Ryo Yoshinaka , Makoto Tatsuta

The problem of inhomogeneous cluster densities has been a long-standing issue for distance-based and density-based algorithms in clustering and anomaly detection. These algorithms implicitly assume that all clusters have approximately the…

Machine Learning · Computer Science 2024-01-30 Ye Zhu , Kai Ming Ting , Mark Carman , Maia Angelova

Spectral clustering is one of the most prominent clustering approaches. The distance-based similarity is the most widely used method for spectral clustering. However, people have already noticed that this is not suitable for multi-scale…

Machine Learning · Computer Science 2020-09-11 Hengrui Wang , Yubo Zhang , Mingzhi Chen , Tong Yang

Data mining and knowledge discovery are two important growing research fields in the last two decades due to the abundance of data collected from various sources. The exponentially growing volumes of generated data urge the development of…

Computer Science and Game Theory · Computer Science 2020-07-13 Dalila Kessira , Mohand-Tahar Kechadi

Frequently, randomly organized data is needed to avoid an anomalous operation of other algorithms and computational processes. An analogy is that a deck of cards is ordered within the pack, but before a game of poker or solitaire the deck…

Data Structures and Algorithms · Computer Science 2008-11-24 William F. Gilreath

We present molecular dynamics (MD) simulations results for dense fluids of ultrasoft, fully-penetrable particles. These are a binary mixture and a polydisperse system of particles interacting via the generalized exponential model, which is…

Soft Condensed Matter · Physics 2012-12-24 Daniele Coslovich , Marco Bernabei , Angel J. Moreno

Common clustering algorithms require multiple scans of all the data to achieve convergence, and this is prohibitive when large databases, with data arriving in streams, must be processed. Some algorithms to extend the popular K-means method…

Applications · Statistics 2017-12-22 Giacomo Aletti , Alessandra Micheletti

We propose a simple, projection-based algorithm for clustering mixtures of discrete (Bernoulli) distributions. Unlike previous approaches that rely on coordinate-specific ``combinatorial projections,'' our algorithm is rotationally…

Data Structures and Algorithms · Computer Science 2026-04-28 Pradipta Mitra

A natural way to characterize the cluster structure of a dataset is by finding regions containing a high density of data. This can be done in a nonparametric way with a kernel density estimate, whose modes and hence clusters can be found…

Machine Learning · Computer Science 2015-03-03 Miguel Á. Carreira-Perpiñán

Many algorithms for approximate nearest neighbor search in high-dimensional spaces partition the data into clusters. At query time, in order to avoid exhaustive search, an index selects the few (or a single) clusters nearest to the query…

Computer Vision and Pattern Recognition · Computer Science 2010-09-27 Romain Tavenard , Laurent Amsaleg , Hervé Jégou

We study the clustering problem for mixtures of bounded covariance distributions, under a fine-grained separation assumption. Specifically, given samples from a $k$-component mixture distribution $D = \sum_{i =1}^k w_i P_i$, where each $w_i…

Machine Learning · Computer Science 2023-12-20 Ilias Diakonikolas , Daniel M. Kane , Jasper C. H. Lee , Thanasis Pittas

Diffusion-based re-ranking is a common method used for retrieving instances by performing similarity propagation in a nearest neighbor graph. However, existing techniques that construct the affinity graph based on pairwise instances can…

Machine Learning · Computer Science 2025-01-07 Jifei Luo , Hantao Yao , Changsheng Xu

In this paper, we present a cluster algorithm for the simulation of hard spheres and related systems. In this algorithm, a copy of the configuration is rotated with respect to a randomly chosen pivot point. The two systems are then…

Statistical Mechanics · Physics 2008-02-03 Christophe Dress , Werner Krauth

A computational theory for clustering and a semi-supervised clustering algorithm is presented. Clustering is defined to be the obtainment of groupings of data such that each group contains no anomalies with respect to a chosen grouping…

Machine Learning · Computer Science 2025-07-17 Nassir Mohammad

Cluster analysis which focuses on the grouping and categorization of similar elements is widely used in various fields of research. Inspired by the phenomenon of atomic fission, a novel density-based clustering algorithm is proposed in this…

Machine Learning · Computer Science 2020-04-28 Shizhan Lu

One important tool is the optimal clustering of data into useful categories. Dividing similar objects into a smaller number of clusters is of importance in many applications. These include search engines, monitoring of academic performance,…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-09-21 Gavriel Yarmish , Philip Listowsky , Simon Dexter

The paper proposes a family of communication efficient methods for distributed learning in heterogeneous environments in which users obtain data from one of $K$ different distributions. In the proposed setup, the grouping of users (based on…

Machine Learning · Computer Science 2023-10-24 Aleksandar Armacki , Dragana Bajovic , Dusan Jakovetic , Soummya Kar

The discrete distribution clustering algorithm, namely D2-clustering, has demonstrated its usefulness in image classification and annotation where each object is represented by a bag of weighed vectors. The high computational complexity of…

Machine Learning · Computer Science 2013-02-07 Yu Zhang , James Z. Wang , Jia Li
‹ Prev 1 2 3 10 Next ›