Related papers: Inference for Dependent Data with Learned Clusters

Post-clustering difference testing: valid inference and practical considerations

Clustering is part of unsupervised analysis methods that consist in grouping samples into homogeneous and separate subgroups of observations also called clusters. To interpret the clusters, statistical hypothesis testing is often used to…

Methodology · Statistics 2022-10-25 Benjamin Hivert , Denis Agniel , Rodolphe Thiébaut , Boris P Hejblum

Inference for Cluster Randomized Experiments with Non-ignorable Cluster Sizes

This paper considers the problem of inference in cluster randomized experiments when cluster sizes are non-ignorable. Here, by a cluster randomized experiment, we mean one in which treatment is assigned at the cluster level. By…

Econometrics · Economics 2024-04-11 Federico Bugni , Ivan Canay , Azeem Shaikh , Max Tabord-Meehan

Selective inference for clustering with unknown variance

In many modern statistical problems, the limited available data must be used both to develop the hypotheses to test, and to test these hypotheses-that is, both for exploratory and confirmatory data analysis. Reusing the same dataset for…

Methodology · Statistics 2023-07-24 Youngjoo Yun , Rina Foygel Barber

Model-based clustering for conditionally correlated categorical data

An extension of the latent class model is presented for clustering categorical data by relaxing the classical "class conditional independence assumption" of variables. This model consists in grouping the variables into inter-independent and…

Computation · Statistics 2015-10-01 Matthieu Marbac , Christophe Biernacki , Vincent Vandewalle

Inference in Cluster Randomized Trials with Matched Pairs

This paper studies inference in cluster randomized trials where treatment status is determined according to a "matched pairs" design. Here, by a cluster randomized experiment, we mean one in which treatment is assigned at the level of the…

Econometrics · Economics 2025-08-14 Yuehao Bai , Jizhou Liu , Azeem M. Shaikh , Max Tabord-Meehan

Post-clustering Inference under Dependence

Recent work by Gao et al. (JASA 2022) has laid the foundations for post-clustering inference, establishing a theoretical framework allowing to test for differences between means of estimated clusters. Additionally, they studied the…

Methodology · Statistics 2025-08-15 Javier González-Delgado , Mathis Deronzier , Juan Cortés , Pierre Neuvial

Unsupervised Deep Discriminant Analysis Based Clustering

This work presents an unsupervised deep discriminant analysis for clustering. The method is based on deep neural networks and aims to minimize the intra-cluster discrepancy and maximize the inter-cluster discrepancy in an unsupervised…

Machine Learning · Computer Science 2022-06-13 Jinyu Cai , Wenzhong Guo , Jicong Fan

Panel Data with Unknown Clusters

Clustered standard errors and approximate randomization tests are popular inference methods that allow for dependence within observations. However, they require researchers to know the cluster structure ex ante. We propose a procedure to…

Econometrics · Economics 2022-01-14 Yong Cai

High-Dimensional Inference for Cluster-Based Graphical Models

Motivated by modern applications in which one constructs graphical models based on a very large number of features, this paper introduces a new class of cluster-based graphical models, in which variable clustering is applied as an initial…

Machine Learning · Statistics 2020-06-09 Carson Eisenach , Florentina Bunea , Yang Ning , Claudiu Dinicu

Informed Random Partition Models with Temporal Dependence

Model-based clustering is a powerful tool that is often used to discover hidden structure in data by grouping observational units that exhibit similar response values. Recently, clustering methods have been developed that permit…

Methodology · Statistics 2025-06-24 Sally Paganin , Garritt L. Page , Fernando Andrés Quintana

Information based clustering

In an age of increasingly large data sets, investigators in many different disciplines have turned to clustering as a tool for data analysis and exploration. Existing clustering methods, however, typically depend on several nontrivial…

Quantitative Methods · Quantitative Biology 2009-11-11 Noam Slonim , Gurinder Singh Atwal , Gasper Tkacik , William Bialek

Inference for Clustering: Conformal Sets for Cluster Labels

While clustering is ubiquitously used across science and industry, uncertainty in cluster assignments is rarely quantified with rigorous guarantees. We propose a novel conformal inference framework for clustering that returns confidence…

Methodology · Statistics 2026-04-13 YoonHaeng Hur , Anirban Nath , Genevera Allen

Inference of global clusters from locally distributed data

We consider the problem of analyzing the heterogeneity of clustering distributions for multiple groups of observed data, each of which is indexed by a covariate value, and inferring global clusters arising from observations aggregated over…

Methodology · Statistics 2012-12-06 XuanLong Nguyen

Some Impossibility Results for Inference With Cluster Dependence with Large Clusters

This paper focuses on a setting with observations having a cluster dependence structure and presents two main impossibility results. First, we show that when there is only one large cluster, i.e., the researcher does not have any knowledge…

Econometrics · Economics 2023-06-07 Denis Kojevnikov , Kyungchul Song

Multiscale Fisher's Independence Test for Multivariate Dependence

Identifying dependency in multivariate data is a common inference task that arises in numerous applications. However, existing nonparametric independence tests typically require computation that scales at least quadratically with the sample…

Methodology · Statistics 2021-07-08 Shai Gorsky , Li Ma

Cluster-robust inference with a single treated cluster using the t-test

This paper considers inference when there is a single treated cluster and a fixed number of control clusters, a setting that is common in empirical work, especially in difference-in-differences designs. We use the t-statistic and develop…

Econometrics · Economics 2025-11-11 Chun Pong Lau , Xinran Li

Graphical model-based clustering of categorical data

Clustering multivariate data is a pervasive task in many applied problems, particularly in social studies and life science. Model-based approaches to clustering rely on mixture models, where each mixture component corresponds to the kernel…

Methodology · Statistics 2026-01-22 Laura Ferrini , Federico Castelletti

Independence clustering (without a matrix)

The independence clustering problem is considered in the following formulation: given a set $S$ of random variables, it is required to find the finest partitioning $\{U_1,\dots,U_k\}$ of $S$ into clusters such that the clusters…

Machine Learning · Computer Science 2017-03-21 Daniil Ryabko

Ordering-Free Inference from Locally Dependent Data

This paper focuses on a data-rich environment where the data set has a very large cross-sectional dimension, is likely to exhibit local dependence, and yet is hard to determine the dependence ordering. Such a situation arises, for example,…

Methodology · Statistics 2018-07-03 Kyungchul Song

Semi-supervised clustering methods

Cluster analysis methods seek to partition a data set into homogeneous subgroups. It is useful in a wide variety of applications, including document processing and modern genetics. Conventional clustering methods are unsupervised, meaning…

Methodology · Statistics 2014-07-11 Eric Bair