Related papers: Panel Data with Unknown Clusters

Clustering with Confidence: Finding Clusters with Statistical Guarantees

Clustering is a widely used unsupervised learning method for finding structure in the data. However, the resulting clusters are typically presented without any guarantees on their robustness; slightly changing the used data sample or…

Machine Learning · Statistics 2017-01-02 Andreas Henelius , Kai Puolamäki , Henrik Boström , Panagiotis Papapetrou

Spectral Clustering with Variance Information for Group Structure Estimation in Panel Data

Consider a panel data setting where repeated observations on individuals are available. Often it is reasonable to assume that there exist groups of individuals that share similar effects of observed characteristics, but the grouping is…

Methodology · Statistics 2024-02-09 Lu Yu , Jiaying Gu , Stanislav Volgushev

Testing Clustered Equal Predictive Ability with Unknown Clusters

This paper proposes a selective inference procedure for testing equal predictive ability in panel data settings with unknown heterogeneity. The framework allows predictive performance to vary across unobserved clusters and accounts for the…

Econometrics · Economics 2025-07-29 Oguzhan Akgun , Alain Pirotte , Giovanni Urga , Zhenlin Yang

Inference for Dependent Data with Learned Clusters

This paper presents and analyzes an approach to cluster-based inference for dependent data. The primary setting considered here is with spatially indexed data in which the dependence structure of observed random variables is characterized…

Statistics Theory · Mathematics 2022-11-16 Jianfei Cao , Christian Hansen , Damian Kozbur , Lucciano Villacorta

Large covariance matrix estimation with factor-assisted variable clustering

This paper studies the covariance matrix estimation for high-dimensional time series within a new framework that combines low-rank factor and latent variable-specific cluster structures. The popular methods based on assuming the sparse…

Methodology · Statistics 2025-02-25 Dong Li , Xinghao Qiao , Cheng Yu

Clustering with Statistical Error Control

This paper presents a clustering approach that allows for rigorous statistical error control similar to a statistical test. We develop estimators for both the unknown number of clusters and the clusters themselves. The estimators depend on…

Statistics Theory · Mathematics 2017-07-13 Michael Vogt , Matthias Schmid

Inference for Clustering: Conformal Sets for Cluster Labels

While clustering is ubiquitously used across science and industry, uncertainty in cluster assignments is rarely quantified with rigorous guarantees. We propose a novel conformal inference framework for clustering that returns confidence…

Methodology · Statistics 2026-04-13 YoonHaeng Hur , Anirban Nath , Genevera Allen

Cluster randomized trials designed to support generalizable inferences

Background: When planning a cluster randomized trial, evaluators often have access to an enumerated cohort representing the target population of clusters. Practicalities of conducting the trial, such as the need to oversample clusters with…

Methodology · Statistics 2024-09-19 Sarah E. Robertson , Jon A. Steingrimsson , Issa J. Dahabreh

Penalized model-based clustering with cluster-specific diagonal covariance matrices and grouped variables

Clustering analysis is one of the most widely used statistical tools in many emerging areas such as microarray data analysis. For microarray and other high-dimensional data, the presence of many noise variables may mask underlying…

Machine Learning · Statistics 2008-03-26 Benhuai Xie , Wei Pan , Xiaotong Shen

Inference in Cluster Randomized Trials with Matched Pairs

This paper studies inference in cluster randomized trials where treatment status is determined according to a "matched pairs" design. Here, by a cluster randomized experiment, we mean one in which treatment is assigned at the level of the…

Econometrics · Economics 2025-08-14 Yuehao Bai , Jizhou Liu , Azeem M. Shaikh , Max Tabord-Meehan

Estimating the number of clusters using cross-validation

Many clustering methods, including k-means, require the user to specify the number of clusters as an input parameter. A variety of methods have been devised to choose the number of clusters automatically, but they often rely on strong…

Methodology · Statistics 2017-02-10 Wei Fu , Patrick O. Perry

Reclustering: A New Method to Test the Appropriate Level of Clustering

When scholars suspect units are dependent on each other within clusters but independent of each other across clusters, they employ cluster-robust standard errors (CRSEs). Nevertheless, what to cluster over is sometimes unknown. For…

Methodology · Statistics 2025-11-12 Kentaro Fukumoto

Post-clustering difference testing: valid inference and practical considerations

Clustering is part of unsupervised analysis methods that consist in grouping samples into homogeneous and separate subgroups of observations also called clusters. To interpret the clusters, statistical hypothesis testing is often used to…

Methodology · Statistics 2022-10-25 Benjamin Hivert , Denis Agniel , Rodolphe Thiébaut , Boris P Hejblum

A Parameter-free Affinity Based Clustering

Several methods have been proposed to estimate the number of clusters in a dataset; the basic ideal behind all of them has been to study an index that measures inter-cluster separation and intra-cluster cohesion over a range of cluster…

Computer Vision and Pattern Recognition · Computer Science 2016-01-12 Bhaskar Mukhoty , Ruchir Gupta , Y. N. Singh

Composite empirical likelihood for multisample clustered data

In many applications, data cluster. Failing to take the cluster structure into consideration generally leads to underestimated variances of point estimators and inflated type I errors in hypothesis tests. Many circumstance-dependent…

Methodology · Statistics 2025-07-21 Jiahua Chen , Pengfei Li , Yukun Liu , James V. Zidek

Clustering For Point Pattern Data

Clustering is one of the most common unsupervised learning tasks in machine learning and data mining. Clustering algorithms have been used in a plethora of applications across several scientific fields. However, there has been limited…

Machine Learning · Computer Science 2017-02-09 Quang N. Tran , Ba-Ngu Vo , Dinh Phung , Ba-Tuong Vo

Seeking the Truth Beyond the Data. An Unsupervised Machine Learning Approach

Clustering is an unsupervised machine learning methodology where unlabeled elements/objects are grouped together aiming to the construction of well-established clusters that their elements are classified according to their similarity. The…

Machine Learning · Statistics 2023-10-20 Dimitrios Saligkaras , Vasileios E. Papageorgiou

Evaluation of the number of clusters in a data set using $p$-values from Multiple Tests of Hypotheses

This paper proposes a novel, nonparametric, interpoint distance-based measure to investigate whether there exist any groups in a set of given data, and if so then, how many groups are prevailing in total. It is a cluster accuracy index…

Methodology · Statistics 2026-05-21 Soumita Modak

Inference for Cluster Randomized Experiments with Non-ignorable Cluster Sizes

This paper considers the problem of inference in cluster randomized experiments when cluster sizes are non-ignorable. Here, by a cluster randomized experiment, we mean one in which treatment is assigned at the cluster level. By…

Econometrics · Economics 2024-04-11 Federico Bugni , Ivan Canay , Azeem Shaikh , Max Tabord-Meehan

When Can We Trust Cluster-Robust Inference?

It is common when using cross-section or panel data to assign each observation to a cluster and allow for arbitrary patterns of heteroskedasticity and correlation within clusters. For regression models, there are many ways to make…

Econometrics · Economics 2026-04-03 James G. MacKinnon