English
Related papers

Related papers: Learning Balanced Mixtures of Discrete Distributio…

200 papers

In this paper, we consider the problem of partitioning a small data sample drawn from a mixture of $k$ product distributions. We are interested in the case that individual features are of low average quality $\gamma$, and we want to use as…

Machine Learning · Statistics 2017-11-17 Avrim Blum , Amin Coja-Oghlan , Alan Frieze , Shuheng Zhou

Balanced partitioning is often a crucial first step in solving large-scale graph optimization problems, e.g., in some cases, a big graph can be chopped into pieces that fit on one machine to be processed independently before stitching the…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-12-10 Kevin Aydin , MohammadHossein Bateni , Vahab Mirrokni

In this paper, we consider the problem of partitioning a small data sample of size $n$ drawn from a mixture of $2$ sub-gaussian distributions. Our work is motivated by the application of clustering individuals according to their population…

Statistics Theory · Mathematics 2023-01-05 Shuheng Zhou

This paper studies the problem of estimation from relative measurements in a graph, in which a vector indexed over the nodes has to be reconstructed from pairwise measurements of differences between its components associated to nodes…

Systems and Control · Computer Science 2018-07-27 Chiara Ravazzi , Nelson P. K. Chan , Paolo Frasca

We give an algorithm for learning a mixture of {\em unstructured} distributions. This problem arises in various unsupervised learning scenarios, for example in learning {\em topic models} from a corpus of documents spanning several topics.…

Machine Learning · Computer Science 2013-09-19 Yuval Rabani , Leonard Schulman , Chaitanya Swamy

In this paper, we consider the problem of partitioning a small data sample of size $n$ drawn from a mixture of 2 sub-gaussian distributions in $\R^p$. We consider semidefinite programming relaxations of an integer quadratic program that is…

Machine Learning · Statistics 2025-03-19 Shuheng Zhou

We study the problem of computing approximate minimum edge cuts by distributed algorithms. We use a standard synchronous message passing model where in each round, $O(\log n)$ bits can be transmitted over each edge (a.k.a. the CONGEST…

Data Structures and Algorithms · Computer Science 2013-11-21 Mohsen Ghaffari , Fabian Kuhn

The learning of mixture models can be viewed as a clustering problem. Indeed, given data samples independently generated from a mixture of distributions, we often would like to find the {\it correct target clustering} of the samples…

Machine Learning · Statistics 2022-08-26 Zhaoqiang Liu , Vincent Y. F. Tan

We consider the problem of determining the top-$k$ largest measurements from a dataset distributed among a network of $n$ agents with noisy communication links. We show that this scenario can be cast as a distributed convex optimization…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-12-02 Xu Zhang , Marcos Vasconcelos

We consider the problem of efficiently learning mixtures of a large number of spherical Gaussians, when the components of the mixture are well separated. In the most basic form of this problem, we are given samples from a uniform mixture of…

Data Structures and Algorithms · Computer Science 2017-11-01 Oded Regev , Aravindan Vijayaraghavan

We study the problem of approximating the total variation distance between two mixtures of product distributions over an $n$-dimensional discrete domain. Given two mixtures $\mathbb{P}$ and $\mathbb{Q}$ with $k_1$ and $k_2$ product…

Data Structures and Algorithms · Computer Science 2026-05-06 Weiming Feng , Yucheng Fu , Minji Yang , Anqi Zhang

We study the following distribution clustering problem: Given a hidden partition of $k$ distributions into two groups, such that the distributions within each group are the same, and the two distributions associated with the two clusters…

Data Structures and Algorithms · Computer Science 2025-12-10 Gunjan Kumar , Yash Pote , Jonathan Scarlett

We study the problem of edge partitioning, where the goal is to partition the edge set of a graph into several parts. The replication factor of a vertex $v$ is the number of parts that contain edges incident to $v$. The goal is to minimize…

Discrete Mathematics · Computer Science 2026-05-08 Alexander Yakunin , Andrey Kupavskii , Alexander Sushin , Stanislav Moiseev

We consider the problem of sampling from data defined on the nodes of a weighted graph, where the edge weights capture the data correlation structure. As shown recently, using spectral graph theory one can define a cut-off frequency for the…

Information Theory · Computer Science 2014-11-13 Ilan Shomorony , A. Salman Avestimehr

We study the problem of learning from unlabeled samples very general statistical mixture models on large finite sets. Specifically, the model to be learned, $\vartheta$, is a probability distribution over probability distributions $p$,…

Machine Learning · Computer Science 2015-04-13 Jian Li , Yuval Rabani , Leonard J. Schulman , Chaitanya Swamy

Motivated by performance optimization of large-scale graph processing systems that distribute the graph across multiple machines, we consider the balanced graph partitioning problem. Compared to the previous work, we study the…

Data Structures and Algorithms · Computer Science 2019-02-19 Dmitrii Avdiukhin , Sergey Pupyrev , Grigory Yaroslavtsev

The $K$-nearest neighbors is a basic problem in machine learning with numerous applications. In this problem, given a (training) set of $n$ data points with labels and a query point $p$, we want to assign a label to $p$ based on the labels…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-08-25 Reza Fathi , Anisur Rahaman Molla , Gopal Pandurangan

We give a new algorithm for learning mixtures of $k$ Gaussians (with identity covariance in $\mathbb{R}^n$) to TV error $\varepsilon$, with quasi-polynomial ($O(n^{\text{poly\,log}\left(\frac{n+k}{\varepsilon}\right)})$) time and sample…

Machine Learning · Computer Science 2025-03-05 Khashayar Gatmiry , Jonathan Kelner , Holden Lee

We consider the problem of spherical Gaussian Mixture models with $k \geq 3$ components when the components are well separated. A fundamental previous result established that separation of $\Omega(\sqrt{\log k})$ is necessary and sufficient…

Machine Learning · Computer Science 2020-06-22 Jeongyeol Kwon , Constantine Caramanis

In this paper, we consider the problem of partitioning a small data sample of size $n$ drawn from a mixture of $2$ sub-gaussian distributions. In particular, we design and analyze two computational efficient algorithms to partition data…

Statistics Theory · Mathematics 2024-03-20 Shuheng Zhou
‹ Prev 1 2 3 10 Next ›