English
Related papers

Related papers: Statistical Inference for Cluster Trees

200 papers

We introduce a cluster evaluation technique called Tree Index. Our Tree Index algorithm aims at describing the structural information of the clustering rather than the quantitative format of cluster-quality indexes (where the representation…

Machine Learning · Computer Science 2020-03-25 A. H. Beg , Md Zahidul Islam , Vladimir Estivill-Castro

Connected acyclic graphs (trees) are data objects that hierarchically organize categories. Collections of trees arise in a diverse variety of fields, including evolutionary biology, public health, machine learning, social sciences and…

Methodology · Statistics 2025-12-01 Maria Alejandra Valdez Cabrera , Amy D Willis , Armeen Taeb

For a density $f$ on ${\mathbb R}^d$, a {\it high-density cluster} is any connected component of $\{x: f(x) \geq \lambda\}$, for some $\lambda > 0$. The set of all high-density clusters forms a hierarchy called the {\it cluster tree} of…

Machine Learning · Statistics 2014-06-09 Kamalika Chaudhuri , Sanjoy Dasgupta , Samory Kpotufe , Ulrike von Luxburg

Due to their accuracies, methods based on ensembles of regression trees are a popular approach for making predictions. Some common examples include Bayesian additive regression trees, boosting and random forests. This paper focuses on…

Methodology · Statistics 2019-11-15 Suofei Wu , Jan Hannig , Thomas C. M. Lee

Clustering is a widely used unsupervised learning method for finding structure in the data. However, the resulting clusters are typically presented without any guarantees on their robustness; slightly changing the used data sample or…

Machine Learning · Statistics 2017-01-02 Andreas Henelius , Kai Puolamäki , Henrik Boström , Panagiotis Papapetrou

Nearest neighbor (k-NN) graphs are widely used in machine learning and data mining applications, and our aim is to better understand what they reveal about the cluster structure of the unknown underlying distribution of points. Moreover, is…

Machine Learning · Statistics 2011-05-06 Samory Kpotufe , Ulrike von Luxburg

Usual formulations of the clustering coefficient can be shown to be insufficient in the task of describing the local topology of very simple networks. Motivated by this, we review some alternatives in order to present an extension, the…

Data Analysis, Statistics and Probability · Physics 2007-05-23 Alexandre H. Abdo , A. P. S. de Moura

In many applications, data cluster. Failing to take the cluster structure into consideration generally leads to underestimated variances of point estimators and inflated type I errors in hypothesis tests. Many circumstance-dependent…

Methodology · Statistics 2025-07-21 Jiahua Chen , Pengfei Li , Yukun Liu , James V. Zidek

While clustering is ubiquitously used across science and industry, uncertainty in cluster assignments is rarely quantified with rigorous guarantees. We propose a novel conformal inference framework for clustering that returns confidence…

Methodology · Statistics 2026-04-13 YoonHaeng Hur , Anirban Nath , Genevera Allen

Clustering is an essential data mining tool that aims to discover inherent cluster structure in data. For most applications, applying clustering is only appropriate when cluster structure is present. As such, the study of clusterability,…

Machine Learning · Statistics 2018-10-30 A. Adolfsson , M. Ackerman , N. C. Brownstein

The reconstruction of a central tendency `species tree' from a large number of conflicting gene trees is a central problem in systematic biology. Moreover, it becomes particularly problematic when taxon coverage is patchy, so that not all…

Populations and Evolution · Quantitative Biology 2014-05-27 Mike Steel , Joel D. Velasco

Mixture model-based frameworks are very popular for statistical inference in clustering. While convenient for producing probabilistic estimates of cluster assignments and uncertainty, they are prone to misspecification, which can lead to…

Statistics Theory · Mathematics 2026-05-15 Yu Zheng , Leo L. Duan , Arkaprava Roy

Cluster sampling is common in survey practice, and the corresponding inference has been predominantly design-based. We develop a Bayesian framework for cluster sampling and account for the design effect in the outcome modeling. We consider…

Methodology · Statistics 2020-06-24 Susanna Makela , Yajuan Si , Andrew Gelman

This paper presents and analyzes an approach to cluster-based inference for dependent data. The primary setting considered here is with spatially indexed data in which the dependence structure of observed random variables is characterized…

Statistics Theory · Mathematics 2022-11-16 Jianfei Cao , Christian Hansen , Damian Kozbur , Lucciano Villacorta

High density clusters can be characterized by the connected components of a level set $L(\lambda) = \{x:\ p(x)>\lambda\}$ of the underlying probability density function $p$ generating the data, at some appropriate level $\lambda\geq 0$. The…

Machine Learning · Statistics 2010-11-15 Alessandro Rinaldo , Aarti Singh , Rebecca Nugent , Larry Wasserman

Clustering is an underspecified task: there are no universal criteria for what makes a good clustering. This is especially true for relational data, where similarity can be based on the features of individuals, the relationships between…

Machine Learning · Statistics 2017-09-29 Sebastijan Dumancic , Hendrik Blockeel

Interpretable machine learning has emerged as central in leveraging artificial intelligence within high-stakes domains such as healthcare, where understanding the rationale behind model predictions is as critical as achieving high…

Machine Learning · Computer Science 2024-04-30 Christel Sirocchi , Martin Urschler , Bastian Pfeifer

Estimating phylogenetic trees is an important problem in evolutionary biology, environmental policy and medicine. Although trees are estimated, their uncertainties are discarded by mathematicians working in tree space. Here we explicitly…

Methodology · Statistics 2017-10-16 Amy D. Willis , Rayna C. Bell

A commonly used characteristic of statistical dependence of adjacency relations in real networks, the clustering coefficient, evaluates chances that two neighbours of a given vertex are adjacent. An extension is obtained by considering…

Applications · Statistics 2013-04-29 Mindaugas Bloznelis , Valentas Kurauskas

This work develops formal statistical inference procedures for machine learning ensemble methods. Ensemble methods based on bootstrapping, such as bagging and random forests, have improved the predictive accuracy of individual trees, but…

Machine Learning · Statistics 2015-09-11 Lucas Mentch , Giles Hooker
‹ Prev 1 2 3 10 Next ›