English
Related papers

Related papers: A Simple Algorithm for Clustering Discrete Distrib…

200 papers

The discrete distribution is often used to describe complex instances in machine learning, such as images, sequences, and documents. Traditionally, clustering of discrete distributions (D2C) has been approached using Wasserstein barycenter…

Machine Learning · Computer Science 2024-08-19 Zixiao Wang , Dong Qiao , Jicong Fan

There has been much progress on efficient algorithms for clustering data points generated by a mixture of $k$ probability distributions under the assumption that the means of the distributions are well-separated, i.e., the distance between…

Data Structures and Algorithms · Computer Science 2010-04-13 Amit Kumar , Ravindran Kannan

Clustering mixtures of Gaussian distributions is a fundamental and challenging problem that is ubiquitous in various high-dimensional data processing tasks. While state-of-the-art work on learning Gaussian mixture models has focused…

Machine Learning · Computer Science 2018-03-05 Dan Kushnir , Shirin Jalali , Iraj Saniee

In this note, we provide a refined analysis of Mitra's algorithm \cite{mitra2008clustering} for classifying general discrete mixture distribution models. Built upon spectral clustering \cite{mcsherry2001spectral}, this algorithm offers…

Machine Learning · Computer Science 2024-05-31 Mohamed Seif , Yanxi Chen

Nowadays, huge amounts of data are naturally collected in distributed sites due to different facts and moving these data through the network for extracting useful knowledge is almost unfeasible for either technical reasons or policies.…

Databases · Computer Science 2017-03-30 Lamine M. Aouad , Nhien-An Le-Khac , Tahar Kechadi

We study the clustering problem for mixtures of bounded covariance distributions, under a fine-grained separation assumption. Specifically, given samples from a $k$-component mixture distribution $D = \sum_{i =1}^k w_i P_i$, where each $w_i…

Machine Learning · Computer Science 2023-12-20 Ilias Diakonikolas , Daniel M. Kane , Jasper C. H. Lee , Thanasis Pittas

As a kind of basic machine learning method, clustering algorithms group data points into different categories based on their similarity or distribution. We present a clustering algorithm by finding hyper-planes to distinguish the data…

Computer Vision and Pattern Recognition · Computer Science 2020-04-28 Luhong Diao , Jinying Gao1 , Manman Deng

Clustering is a fundamental problem in data analysis. In differentially private clustering, the goal is to identify $k$ cluster centers without disclosing information on individual data points. Despite significant research progress, the…

Machine Learning · Computer Science 2021-12-30 Edith Cohen , Haim Kaplan , Yishay Mansour , Uri Stemmer , Eliad Tsfadia

Projection Pursuit is a classic exploratory technique for finding interesting projections of a dataset. We propose a method for recovering projections containing either Imbalanced Clusters or a Bernoulli-Rademacher distribution using a…

Machine Learning · Computer Science 2025-07-01 Martin Eppert , Satyaki Mukherjee , Debarghya Ghoshdastidar

Clustering functional data is a challenging task due to intrinsic infinite-dimensionality and the need for stable, data-adaptive partitioning. In this work, we propose a clustering framework based on Random Projections, which simultaneously…

Methodology · Statistics 2025-12-18 Matteo Mori , Laura Anderlucci

Mixture distributions arise in many application areas, for example as marginal distributions or convolutions of distributions. We present a method of constructing an easily tractable discrete mixture distribution as an approximation to a…

Computation · Statistics 2017-02-20 Christian Röver , Tim Friede

In this paper, we present a cluster algorithm for the simulation of hard spheres and related systems. In this algorithm, a copy of the configuration is rotated with respect to a randomly chosen pivot point. The two systems are then…

Statistical Mechanics · Physics 2008-02-03 Christophe Dress , Werner Krauth

Distributed data mining techniques and mainly distributed clustering are widely used in the last decade because they deal with very large and heterogeneous datasets which cannot be gathered centrally. Current distributed clustering…

Databases · Computer Science 2018-02-02 Malika Bendechache , M-Tahar Kechadi

We give an efficient algorithm for robustly clustering of a mixture of two arbitrary Gaussians, a central open problem in the theory of computationally efficient robust estimation, assuming only that the the means of the component Gaussians…

Data Structures and Algorithms · Computer Science 2020-06-02 He Jia , Santosh Vempala

Gaussian mixture model is very useful in many practical problems. Nevertheless, it cannot be directly generalized to non Euclidean spaces. To overcome this problem we present a spherical Gaussian-based clustering approach for partitioning…

Machine Learning · Computer Science 2017-05-08 Marek Śmieja , Jacek Tabor

Efficient extraction of useful knowledge from these data is still a challenge, mainly when the data is distributed, heterogeneous and of different quality depending on its corresponding local infrastructure. To reduce the overhead cost,…

Databases · Computer Science 2017-04-17 Nhien-An Le-Khac , M-Tahar Kechadi

The clustering algorithms that view each object data as a single sample drawn from a certain distribution, Gaussian distribution, for example, has been a hot topic for decades. Many clustering algorithms: such as k-means and spectral…

Machine Learning · Computer Science 2019-10-25 Xiang Wang , Tie Liu

We derive and analyze a generic, recursive algorithm for estimating all splits in a finite cluster tree as well as the corresponding clusters. We further investigate statistical properties of this generic clustering algorithm when it…

Machine Learning · Statistics 2021-11-02 Ingo Steinwart , Bharath K. Sriperumbudur , Philipp Thomann

The discrete distribution clustering algorithm, namely D2-clustering, has demonstrated its usefulness in image classification and annotation where each object is represented by a bag of weighed vectors. The high computational complexity of…

Machine Learning · Computer Science 2013-02-07 Yu Zhang , James Z. Wang , Jia Li

In many modern applications, there is interest in analyzing enormous data sets that cannot be easily moved across computers or loaded into memory on a single computer. In such settings, it is very common to be interested in clustering.…

Computation · Statistics 2020-05-15 Hanyu Song , Yingjian Wang , David B. Dunson
‹ Prev 1 2 3 10 Next ›