Related papers: Statistical Inference for Cluster Trees

Tree Index: A New Cluster Evaluation Technique

We introduce a cluster evaluation technique called Tree Index. Our Tree Index algorithm aims at describing the structural information of the clustering rather than the quantitative format of cluster-quality indexes (where the representation…

Machine Learning · Computer Science 2020-03-25 A. H. Beg , Md Zahidul Islam , Vladimir Estivill-Castro

Consensus Tree Estimation with False Discovery Rate Control via Partially Ordered Sets

Connected acyclic graphs (trees) are data objects that hierarchically organize categories. Collections of trees arise in a diverse variety of fields, including evolutionary biology, public health, machine learning, social sciences and…

Methodology · Statistics 2025-12-01 Maria Alejandra Valdez Cabrera , Amy D Willis , Armeen Taeb

Consistent procedures for cluster tree estimation and pruning

For a density $f$ on ${\mathbb R}^d$, a {\it high-density cluster} is any connected component of $\{x: f(x) \geq \lambda\}$, for some $\lambda > 0$. The set of all high-density clusters forms a hierarchy called the {\it cluster tree} of…

Machine Learning · Statistics 2014-06-09 Kamalika Chaudhuri , Sanjoy Dasgupta , Samory Kpotufe , Ulrike von Luxburg

Uncertainty Quantification in Ensembles of Honest Regression Trees using Generalized Fiducial Inference

Due to their accuracies, methods based on ensembles of regression trees are a popular approach for making predictions. Some common examples include Bayesian additive regression trees, boosting and random forests. This paper focuses on…

Methodology · Statistics 2019-11-15 Suofei Wu , Jan Hannig , Thomas C. M. Lee

Clustering with Confidence: Finding Clusters with Statistical Guarantees

Clustering is a widely used unsupervised learning method for finding structure in the data. However, the resulting clusters are typically presented without any guarantees on their robustness; slightly changing the used data sample or…

Machine Learning · Statistics 2017-01-02 Andreas Henelius , Kai Puolamäki , Henrik Boström , Panagiotis Papapetrou

Pruning nearest neighbor cluster trees

Nearest neighbor (k-NN) graphs are widely used in machine learning and data mining applications, and our aim is to better understand what they reveal about the cluster structure of the unknown underlying distribution of points. Moreover, is…

Machine Learning · Statistics 2011-05-06 Samory Kpotufe , Ulrike von Luxburg

Clustering as a measure of the local topology of networks

Usual formulations of the clustering coefficient can be shown to be insufficient in the task of describing the local topology of very simple networks. Motivated by this, we review some alternatives in order to present an extension, the…

Data Analysis, Statistics and Probability · Physics 2007-05-23 Alexandre H. Abdo , A. P. S. de Moura

Composite empirical likelihood for multisample clustered data

In many applications, data cluster. Failing to take the cluster structure into consideration generally leads to underestimated variances of point estimators and inflated type I errors in hypothesis tests. Many circumstance-dependent…

Methodology · Statistics 2025-07-21 Jiahua Chen , Pengfei Li , Yukun Liu , James V. Zidek

Inference for Clustering: Conformal Sets for Cluster Labels

While clustering is ubiquitously used across science and industry, uncertainty in cluster assignments is rarely quantified with rigorous guarantees. We propose a novel conformal inference framework for clustering that returns confidence…

Methodology · Statistics 2026-04-13 YoonHaeng Hur , Anirban Nath , Genevera Allen

To Cluster, or Not to Cluster: An Analysis of Clusterability Methods

Clustering is an essential data mining tool that aims to discover inherent cluster structure in data. For most applications, applying clustering is only appropriate when cluster structure is present. As such, the study of clusterability,…

Machine Learning · Statistics 2018-10-30 A. Adolfsson , M. Ackerman , N. C. Brownstein

Axiomatic opportunities and obstacles for inferring a species tree from gene trees

The reconstruction of a central tendency `species tree' from a large number of conflicting gene trees is a central problem in systematic biology. Moreover, it becomes particularly problematic when taxon coverage is patchy, so that not all…

Populations and Evolution · Quantitative Biology 2014-05-27 Mike Steel , Joel D. Velasco

Consistency of Graphical Model-based Clustering: Robust Clustering using Bayesian Spanning Forest

Mixture model-based frameworks are very popular for statistical inference in clustering. While convenient for producing probabilistic estimates of cluster assignments and uncertainty, they are prone to misspecification, which can lead to…

Statistics Theory · Mathematics 2026-05-15 Yu Zheng , Leo L. Duan , Arkaprava Roy

Bayesian Inference under Cluster Sampling with Probability Proportional to Size

Cluster sampling is common in survey practice, and the corresponding inference has been predominantly design-based. We develop a Bayesian framework for cluster sampling and account for the design effect in the outcome modeling. We consider…

Methodology · Statistics 2020-06-24 Susanna Makela , Yajuan Si , Andrew Gelman

Inference for Dependent Data with Learned Clusters

This paper presents and analyzes an approach to cluster-based inference for dependent data. The primary setting considered here is with spatially indexed data in which the dependence structure of observed random variables is characterized…

Statistics Theory · Mathematics 2022-11-16 Jianfei Cao , Christian Hansen , Damian Kozbur , Lucciano Villacorta

Stability of Density-Based Clustering

High density clusters can be characterized by the connected components of a level set $L(\lambda) = \{x:\ p(x)>\lambda\}$ of the underlying probability density function $p$ generating the data, at some appropriate level $\lambda\geq 0$. The…

Machine Learning · Statistics 2010-11-15 Alessandro Rinaldo , Aarti Singh , Rebecca Nugent , Larry Wasserman

An expressive dissimilarity measure for relational clustering using neighbourhood trees

Clustering is an underspecified task: there are no universal criteria for what makes a good clustering. This is especially true for relational data, where similarity can be based on the features of individuals, the relationships between…

Machine Learning · Statistics 2017-09-29 Sebastijan Dumancic , Hendrik Blockeel

Feature graphs for interpretable unsupervised tree ensembles: centrality, interaction, and application in disease subtyping

Interpretable machine learning has emerged as central in leveraging artificial intelligence within high-stakes domains such as healthcare, where understanding the rationale behind model predictions is as critical as achieving high…

Machine Learning · Computer Science 2024-04-30 Christel Sirocchi , Martin Urschler , Bastian Pfeifer

Uncertainty in phylogenetic tree estimates

Estimating phylogenetic trees is an important problem in evolutionary biology, environmental policy and medicine. Although trees are estimated, their uncertainties are discarded by mathematicians working in tree space. Here we explicitly…

Methodology · Statistics 2017-10-16 Amy D. Willis , Rayna C. Bell

Clustering function: a measure of social influence

A commonly used characteristic of statistical dependence of adjacency relations in real networks, the clustering coefficient, evaluates chances that two neighbours of a given vertex are adjacent. An extension is obtained by considering…

Applications · Statistics 2013-04-29 Mindaugas Bloznelis , Valentas Kurauskas

Quantifying Uncertainty in Random Forests via Confidence Intervals and Hypothesis Tests

This work develops formal statistical inference procedures for machine learning ensemble methods. Ensemble methods based on bootstrapping, such as bagging and random forests, have improved the predictive accuracy of individual trees, but…

Machine Learning · Statistics 2015-09-11 Lucas Mentch , Giles Hooker