Related papers: Learning Balanced Mixtures of Discrete Distributio…
In this paper, we consider the problem of partitioning a small data sample drawn from a mixture of $k$ product distributions. We are interested in the case that individual features are of low average quality $\gamma$, and we want to use as…
Balanced partitioning is often a crucial first step in solving large-scale graph optimization problems, e.g., in some cases, a big graph can be chopped into pieces that fit on one machine to be processed independently before stitching the…
In this paper, we consider the problem of partitioning a small data sample of size $n$ drawn from a mixture of $2$ sub-gaussian distributions. Our work is motivated by the application of clustering individuals according to their population…
This paper studies the problem of estimation from relative measurements in a graph, in which a vector indexed over the nodes has to be reconstructed from pairwise measurements of differences between its components associated to nodes…
We give an algorithm for learning a mixture of {\em unstructured} distributions. This problem arises in various unsupervised learning scenarios, for example in learning {\em topic models} from a corpus of documents spanning several topics.…
In this paper, we consider the problem of partitioning a small data sample of size $n$ drawn from a mixture of 2 sub-gaussian distributions in $\R^p$. We consider semidefinite programming relaxations of an integer quadratic program that is…
We study the problem of computing approximate minimum edge cuts by distributed algorithms. We use a standard synchronous message passing model where in each round, $O(\log n)$ bits can be transmitted over each edge (a.k.a. the CONGEST…
The learning of mixture models can be viewed as a clustering problem. Indeed, given data samples independently generated from a mixture of distributions, we often would like to find the {\it correct target clustering} of the samples…
We consider the problem of determining the top-$k$ largest measurements from a dataset distributed among a network of $n$ agents with noisy communication links. We show that this scenario can be cast as a distributed convex optimization…
We consider the problem of efficiently learning mixtures of a large number of spherical Gaussians, when the components of the mixture are well separated. In the most basic form of this problem, we are given samples from a uniform mixture of…
We study the problem of approximating the total variation distance between two mixtures of product distributions over an $n$-dimensional discrete domain. Given two mixtures $\mathbb{P}$ and $\mathbb{Q}$ with $k_1$ and $k_2$ product…
We study the following distribution clustering problem: Given a hidden partition of $k$ distributions into two groups, such that the distributions within each group are the same, and the two distributions associated with the two clusters…
We study the problem of edge partitioning, where the goal is to partition the edge set of a graph into several parts. The replication factor of a vertex $v$ is the number of parts that contain edges incident to $v$. The goal is to minimize…
We consider the problem of sampling from data defined on the nodes of a weighted graph, where the edge weights capture the data correlation structure. As shown recently, using spectral graph theory one can define a cut-off frequency for the…
We study the problem of learning from unlabeled samples very general statistical mixture models on large finite sets. Specifically, the model to be learned, $\vartheta$, is a probability distribution over probability distributions $p$,…
Motivated by performance optimization of large-scale graph processing systems that distribute the graph across multiple machines, we consider the balanced graph partitioning problem. Compared to the previous work, we study the…
The $K$-nearest neighbors is a basic problem in machine learning with numerous applications. In this problem, given a (training) set of $n$ data points with labels and a query point $p$, we want to assign a label to $p$ based on the labels…
We give a new algorithm for learning mixtures of $k$ Gaussians (with identity covariance in $\mathbb{R}^n$) to TV error $\varepsilon$, with quasi-polynomial ($O(n^{\text{poly\,log}\left(\frac{n+k}{\varepsilon}\right)})$) time and sample…
We consider the problem of spherical Gaussian Mixture models with $k \geq 3$ components when the components are well separated. A fundamental previous result established that separation of $\Omega(\sqrt{\log k})$ is necessary and sufficient…
In this paper, we consider the problem of partitioning a small data sample of size $n$ drawn from a mixture of $2$ sub-gaussian distributions. In particular, we design and analyze two computational efficient algorithms to partition data…