English
Related papers

Related papers: Separating populations with wide data: A spectral …

200 papers

In this paper, we consider the problem of partitioning a small data sample of size $n$ drawn from a mixture of $2$ sub-gaussian distributions. Our work is motivated by the application of clustering individuals according to their population…

Statistics Theory · Mathematics 2023-01-05 Shuheng Zhou

We study the problem of partitioning a small sample of $n$ individuals from a mixture of $k$ product distributions over a Boolean cube $\{0, 1\}^K$ according to their distributions. Each distribution is described by a vector of allele…

Machine Learning · Computer Science 2008-02-21 Shuheng Zhou

In this paper, we consider the problem of partitioning a small data sample of size $n$ drawn from a mixture of $2$ sub-gaussian distributions. In particular, we design and analyze two computational efficient algorithms to partition data…

Statistics Theory · Mathematics 2024-03-20 Shuheng Zhou

Clustering is widely used in different field such as biology, psychology, and economics. Most traditional clustering algorithms are limited to handling datasets that contain either numeric or categorical attributes. However, datasets with…

Databases · Computer Science 2019-07-03 Trupti M. Kodinariya Dr. Prashant R. Makwana

Spectral clustering is sensitive to how graphs are constructed from data particularly when proximal and imbalanced clusters are present. We show that Ratio-Cut (RCut) or normalized cut (NCut) objectives are not tailored to imbalanced data…

Machine Learning · Statistics 2013-09-11 Jing Qian , Venkatesh Saligrama

Let $G$ be a finite group generated by $k$ elements. The well-known product replacement algorithm provides an effective method for sampling generating sets of $G$. We study a refinement of this algorithm that is designed to output…

Group Theory · Mathematics 2025-12-23 Michał Marcinkowski , Piotr Mizerka

High-dimensional clustering analysis is a challenging problem in statistics and machine learning, with broad applications such as the analysis of microarray data and RNA-seq data. In this paper, we propose a new clustering procedure called…

Methodology · Statistics 2022-10-31 Tianqi Liu , Yu Lu , Biqing Zhu , Hongyu Zhao

There has been much progress on efficient algorithms for clustering data points generated by a mixture of $k$ probability distributions under the assumption that the means of the distributions are well-separated, i.e., the distance between…

Data Structures and Algorithms · Computer Science 2010-04-13 Amit Kumar , Ravindran Kannan

Spectral clustering has become a popular technique due to its high performance in many contexts. It comprises three main steps: create a similarity graph between N objects to cluster, compute the first k eigenvectors of its Laplacian matrix…

Data Structures and Algorithms · Computer Science 2016-05-24 Nicolas Tremblay , Gilles Puy , Remi Gribonval , Pierre Vandergheynst

Segmentation of a colour image composed of different kinds of texture regions can be a hard problem, namely to compute for an exact texture fields and a decision of the optimum number of segmentation areas in an image when it contains…

Artificial Intelligence · Computer Science 2007-05-23 Vitorino Ramos , Fernando Muge

In this paper we study variants of the widely used spectral clustering that partitions a graph into k clusters by (1) embedding the vertices of a graph into a low-dimensional space using the bottom eigenvectors of the Laplacian matrix, and…

Data Structures and Algorithms · Computer Science 2017-02-01 Richard Peng , He Sun , Luca Zanetti

Segmentation of a colour image composed of different kinds of texture regions can be a hard problem, namely to compute for an exact texture fields and a decision of the optimum number of segmentation areas in an image when it contains…

Artificial Intelligence · Computer Science 2007-05-23 Vitorino Ramos , Fernando Muge

In this work, a graph partitioning problem in a fixed number of connected components is considered. Given an undirected graph with costs on the edges, the problem consists of partitioning the set of nodes into a fixed number of subsets with…

Optimization and Control · Mathematics 2024-11-12 Mishelle Cordero , Andrés Miniguano-Trujillo , Diego Recalde , Ramiro Torres , Polo Vaca

Clustering in image analysis is a central technique that allows to classify elements of an image. We describe a simple clustering technique that uses the method of similarity matrices. We expand upon recent results in spectral analysis for…

Statistics Theory · Mathematics 2022-03-23 Denis Gaidashev , Ralf Pihlström , Martin Ryner

Clustering is a fundamental problem in data analysis. In differentially private clustering, the goal is to identify $k$ cluster centers without disclosing information on individual data points. Despite significant research progress, the…

Machine Learning · Computer Science 2021-12-30 Edith Cohen , Haim Kaplan , Yishay Mansour , Uri Stemmer , Eliad Tsfadia

The problem of estimating a proportion of objects with particular attribute in a finite population is considered. This paper shows an example of the application of estimation fraction using new proposed sample allocation in a population…

Applications · Statistics 2019-03-19 Dominik Sieradzki , Wojciech Zieliński

We study the following distribution clustering problem: Given a hidden partition of $k$ distributions into two groups, such that the distributions within each group are the same, and the two distributions associated with the two clusters…

Data Structures and Algorithms · Computer Science 2025-12-10 Gunjan Kumar , Yash Pote , Jonathan Scarlett

We introduce a new method for performing clustering with the aim of fitting clusters with different scatters and weights. It is designed by allowing to handle a proportion $\alpha$ of contaminating data to guarantee the robustness of the…

Statistics Theory · Mathematics 2008-12-18 Luis A. García-Escudero , Alfonso Gordaliza , Carlos Matrán , Agustin Mayo-Iscar

In classification problems, the purpose of feature selection is to identify a small, highly discriminative subset of the original feature set. In many applications, the dataset may have thousands of features and only a few dozens of samples…

Machine Learning · Computer Science 2020-08-28 Ludmila I. Kuncheva , Clare E. Matthews , Álvar Arnaiz-González , Juan J. Rodríguez

Given a point set S and an unknown metric d on S, we study the problem of efficiently partitioning S into k clusters while querying few distances between the points. In our model we assume that we have access to one versus all queries that…

Machine Learning · Computer Science 2014-08-12 Konstantin Voevodski , Maria-Florina Balcan , Heiko Roglin , Shang-Hua Teng , Yu Xia
‹ Prev 1 2 3 10 Next ›