Related papers: Clustering with Statistical Error Control
Clustering is part of unsupervised analysis methods that consist in grouping samples into homogeneous and separate subgroups of observations also called clusters. To interpret the clusters, statistical hypothesis testing is often used to…
Clustered standard errors and approximate randomization tests are popular inference methods that allow for dependence within observations. However, they require researchers to know the cluster structure ex ante. We propose a procedure to…
In empirical work it is common to estimate parameters of models and report associated standard errors that account for "clustering" of units, where clusters are defined by factors such as geography. Clustering adjustments are typically…
When scholars suspect units are dependent on each other within clusters but independent of each other across clusters, they employ cluster-robust standard errors (CRSEs). Nevertheless, what to cluster over is sometimes unknown. For…
Clustering is a widely used unsupervised learning method for finding structure in the data. However, the resulting clusters are typically presented without any guarantees on their robustness; slightly changing the used data sample or…
We introduce a new method for performing clustering with the aim of fitting clusters with different scatters and weights. It is designed by allowing to handle a proportion $\alpha$ of contaminating data to guarantee the robustness of the…
Spectral clustering refers to a family of unsupervised learning algorithms that compute a spectral embedding of the original data based on the eigenvectors of a similarity graph. This non-linear transformation of the data is both the key of…
A main task in data analysis is to organize data points into coherent groups or clusters. The stochastic block model is a probabilistic model for the cluster structure. This model prescribes different probabilities for the presence of edges…
Clustering analysis of functional data, which comprises observations that evolve continuously over time or space, has gained increasing attention across various scientific disciplines. Practical applications often involve functional data…
While clustering is ubiquitously used across science and industry, uncertainty in cluster assignments is rarely quantified with rigorous guarantees. We propose a novel conformal inference framework for clustering that returns confidence…
Clustering algorithms are fundamental tools across many fields, with density-based methods offering particular advantages in identifying arbitrarily shaped clusters and handling noise. However, their effectiveness is often limited by the…
Many clustering methods, including k-means, require the user to specify the number of clusters as an input parameter. A variety of methods have been devised to choose the number of clusters automatically, but they often rely on strong…
Spectral clustering is a popular unsupervised learning technique which is able to partition unlabelled data into disjoint clusters of distinct shapes. However, the data under consideration are often experimental data, implying that the data…
With rapidly increasing data, clustering algorithms are important tools for data analytics in modern research. They have been successfully applied to a wide range of domains; for instance, bioinformatics, speech recognition, and financial…
Not long ago primary census data became available to publicity. It opened qualitatively new perspectives not only for researchers in demography and sociology, but also for those people, who somehow face processes occurring in society. In…
The traditional binary classification framework constructs classifiers which may have good accuracy, but whose false positive and false negative error rates are not under users' control. In many cases, one of the errors is more severe and…
We introduce a novel statistical significance-based approach for clustering hierarchical data using semi-parametric linear mixed-effects models designed for responses with laws in the exponential family (e.g., Poisson and Bernoulli). Within…
Clustering is an unsupervised machine learning methodology where unlabeled elements/objects are grouped together aiming to the construction of well-established clusters that their elements are classified according to their similarity. The…
This paper presents and analyzes an approach to cluster-based inference for dependent data. The primary setting considered here is with spatially indexed data in which the dependence structure of observed random variables is characterized…
A computational theory for clustering and a semi-supervised clustering algorithm is presented. Clustering is defined to be the obtainment of groupings of data such that each group contains no anomalies with respect to a chosen grouping…