Related papers: Panel Data with Unknown Clusters
Clustering is a widely used unsupervised learning method for finding structure in the data. However, the resulting clusters are typically presented without any guarantees on their robustness; slightly changing the used data sample or…
Consider a panel data setting where repeated observations on individuals are available. Often it is reasonable to assume that there exist groups of individuals that share similar effects of observed characteristics, but the grouping is…
This paper proposes a selective inference procedure for testing equal predictive ability in panel data settings with unknown heterogeneity. The framework allows predictive performance to vary across unobserved clusters and accounts for the…
This paper presents and analyzes an approach to cluster-based inference for dependent data. The primary setting considered here is with spatially indexed data in which the dependence structure of observed random variables is characterized…
This paper studies the covariance matrix estimation for high-dimensional time series within a new framework that combines low-rank factor and latent variable-specific cluster structures. The popular methods based on assuming the sparse…
This paper presents a clustering approach that allows for rigorous statistical error control similar to a statistical test. We develop estimators for both the unknown number of clusters and the clusters themselves. The estimators depend on…
While clustering is ubiquitously used across science and industry, uncertainty in cluster assignments is rarely quantified with rigorous guarantees. We propose a novel conformal inference framework for clustering that returns confidence…
Background: When planning a cluster randomized trial, evaluators often have access to an enumerated cohort representing the target population of clusters. Practicalities of conducting the trial, such as the need to oversample clusters with…
Clustering analysis is one of the most widely used statistical tools in many emerging areas such as microarray data analysis. For microarray and other high-dimensional data, the presence of many noise variables may mask underlying…
This paper studies inference in cluster randomized trials where treatment status is determined according to a "matched pairs" design. Here, by a cluster randomized experiment, we mean one in which treatment is assigned at the level of the…
Many clustering methods, including k-means, require the user to specify the number of clusters as an input parameter. A variety of methods have been devised to choose the number of clusters automatically, but they often rely on strong…
When scholars suspect units are dependent on each other within clusters but independent of each other across clusters, they employ cluster-robust standard errors (CRSEs). Nevertheless, what to cluster over is sometimes unknown. For…
Clustering is part of unsupervised analysis methods that consist in grouping samples into homogeneous and separate subgroups of observations also called clusters. To interpret the clusters, statistical hypothesis testing is often used to…
Several methods have been proposed to estimate the number of clusters in a dataset; the basic ideal behind all of them has been to study an index that measures inter-cluster separation and intra-cluster cohesion over a range of cluster…
In many applications, data cluster. Failing to take the cluster structure into consideration generally leads to underestimated variances of point estimators and inflated type I errors in hypothesis tests. Many circumstance-dependent…
Clustering is one of the most common unsupervised learning tasks in machine learning and data mining. Clustering algorithms have been used in a plethora of applications across several scientific fields. However, there has been limited…
Clustering is an unsupervised machine learning methodology where unlabeled elements/objects are grouped together aiming to the construction of well-established clusters that their elements are classified according to their similarity. The…
This paper proposes a novel, nonparametric, interpoint distance-based measure to investigate whether there exist any groups in a set of given data, and if so then, how many groups are prevailing in total. It is a cluster accuracy index…
This paper considers the problem of inference in cluster randomized experiments when cluster sizes are non-ignorable. Here, by a cluster randomized experiment, we mean one in which treatment is assigned at the cluster level. By…
It is common when using cross-section or panel data to assign each observation to a cluster and allow for arbitrary patterns of heteroskedasticity and correlation within clusters. For regression models, there are many ways to make…