English
Related papers

Related papers: Inference for Dependent Data with Learned Clusters

200 papers

Clustering is part of unsupervised analysis methods that consist in grouping samples into homogeneous and separate subgroups of observations also called clusters. To interpret the clusters, statistical hypothesis testing is often used to…

Methodology · Statistics 2022-10-25 Benjamin Hivert , Denis Agniel , Rodolphe Thiébaut , Boris P Hejblum

This paper considers the problem of inference in cluster randomized experiments when cluster sizes are non-ignorable. Here, by a cluster randomized experiment, we mean one in which treatment is assigned at the cluster level. By…

Econometrics · Economics 2024-04-11 Federico Bugni , Ivan Canay , Azeem Shaikh , Max Tabord-Meehan

In many modern statistical problems, the limited available data must be used both to develop the hypotheses to test, and to test these hypotheses-that is, both for exploratory and confirmatory data analysis. Reusing the same dataset for…

Methodology · Statistics 2023-07-24 Youngjoo Yun , Rina Foygel Barber

An extension of the latent class model is presented for clustering categorical data by relaxing the classical "class conditional independence assumption" of variables. This model consists in grouping the variables into inter-independent and…

Computation · Statistics 2015-10-01 Matthieu Marbac , Christophe Biernacki , Vincent Vandewalle

This paper studies inference in cluster randomized trials where treatment status is determined according to a "matched pairs" design. Here, by a cluster randomized experiment, we mean one in which treatment is assigned at the level of the…

Econometrics · Economics 2025-08-14 Yuehao Bai , Jizhou Liu , Azeem M. Shaikh , Max Tabord-Meehan

Recent work by Gao et al. (JASA 2022) has laid the foundations for post-clustering inference, establishing a theoretical framework allowing to test for differences between means of estimated clusters. Additionally, they studied the…

Methodology · Statistics 2025-08-15 Javier González-Delgado , Mathis Deronzier , Juan Cortés , Pierre Neuvial

This work presents an unsupervised deep discriminant analysis for clustering. The method is based on deep neural networks and aims to minimize the intra-cluster discrepancy and maximize the inter-cluster discrepancy in an unsupervised…

Machine Learning · Computer Science 2022-06-13 Jinyu Cai , Wenzhong Guo , Jicong Fan

Clustered standard errors and approximate randomization tests are popular inference methods that allow for dependence within observations. However, they require researchers to know the cluster structure ex ante. We propose a procedure to…

Econometrics · Economics 2022-01-14 Yong Cai

Motivated by modern applications in which one constructs graphical models based on a very large number of features, this paper introduces a new class of cluster-based graphical models, in which variable clustering is applied as an initial…

Machine Learning · Statistics 2020-06-09 Carson Eisenach , Florentina Bunea , Yang Ning , Claudiu Dinicu

Model-based clustering is a powerful tool that is often used to discover hidden structure in data by grouping observational units that exhibit similar response values. Recently, clustering methods have been developed that permit…

Methodology · Statistics 2025-06-24 Sally Paganin , Garritt L. Page , Fernando Andrés Quintana

In an age of increasingly large data sets, investigators in many different disciplines have turned to clustering as a tool for data analysis and exploration. Existing clustering methods, however, typically depend on several nontrivial…

Quantitative Methods · Quantitative Biology 2009-11-11 Noam Slonim , Gurinder Singh Atwal , Gasper Tkacik , William Bialek

While clustering is ubiquitously used across science and industry, uncertainty in cluster assignments is rarely quantified with rigorous guarantees. We propose a novel conformal inference framework for clustering that returns confidence…

Methodology · Statistics 2026-04-13 YoonHaeng Hur , Anirban Nath , Genevera Allen

We consider the problem of analyzing the heterogeneity of clustering distributions for multiple groups of observed data, each of which is indexed by a covariate value, and inferring global clusters arising from observations aggregated over…

Methodology · Statistics 2012-12-06 XuanLong Nguyen

This paper focuses on a setting with observations having a cluster dependence structure and presents two main impossibility results. First, we show that when there is only one large cluster, i.e., the researcher does not have any knowledge…

Econometrics · Economics 2023-06-07 Denis Kojevnikov , Kyungchul Song

Identifying dependency in multivariate data is a common inference task that arises in numerous applications. However, existing nonparametric independence tests typically require computation that scales at least quadratically with the sample…

Methodology · Statistics 2021-07-08 Shai Gorsky , Li Ma

This paper considers inference when there is a single treated cluster and a fixed number of control clusters, a setting that is common in empirical work, especially in difference-in-differences designs. We use the t-statistic and develop…

Econometrics · Economics 2025-11-11 Chun Pong Lau , Xinran Li

Clustering multivariate data is a pervasive task in many applied problems, particularly in social studies and life science. Model-based approaches to clustering rely on mixture models, where each mixture component corresponds to the kernel…

Methodology · Statistics 2026-01-22 Laura Ferrini , Federico Castelletti

The independence clustering problem is considered in the following formulation: given a set $S$ of random variables, it is required to find the finest partitioning $\{U_1,\dots,U_k\}$ of $S$ into clusters such that the clusters…

Machine Learning · Computer Science 2017-03-21 Daniil Ryabko

This paper focuses on a data-rich environment where the data set has a very large cross-sectional dimension, is likely to exhibit local dependence, and yet is hard to determine the dependence ordering. Such a situation arises, for example,…

Methodology · Statistics 2018-07-03 Kyungchul Song

Cluster analysis methods seek to partition a data set into homogeneous subgroups. It is useful in a wide variety of applications, including document processing and modern genetics. Conventional clustering methods are unsupervised, meaning…

Methodology · Statistics 2014-07-11 Eric Bair
‹ Prev 1 2 3 10 Next ›