Related papers: Conditional Selective Inference for the Selected G…

Selective inference for k-means clustering

We consider the problem of testing for a difference in means between clusters of observations identified via k-means clustering. In this setting, classical hypothesis tests lead to an inflated Type I error rate. To overcome this problem, we…

Methodology · Statistics 2022-03-30 Yiqun T. Chen , Daniela M. Witten

Selective Inference for Hierarchical Clustering

Classical tests for a difference in means control the type I error rate when the groups are defined a priori. However, when the groups are instead defined via clustering, then applying a classical test yields an extremely inflated type I…

Methodology · Statistics 2022-11-01 Lucy L. Gao , Jacob Bien , Daniela Witten

Testing Clustered Equal Predictive Ability with Unknown Clusters

This paper proposes a selective inference procedure for testing equal predictive ability in panel data settings with unknown heterogeneity. The framework allows predictive performance to vary across unobserved clusters and accounts for the…

Econometrics · Economics 2025-07-29 Oguzhan Akgun , Alain Pirotte , Giovanni Urga , Zhenlin Yang

Selective inference for multiple pairs of clusters after K-means clustering

If the same data is used for both clustering and for testing a null hypothesis that is formulated in terms of the estimated clusters, then the traditional hypothesis testing framework often fails to control the Type I error. Gao et al.…

Methodology · Statistics 2024-05-28 Youngjoo Yun , Yinqiu He

Testing for a difference in means of a single feature after clustering

For many applications, it is critical to interpret and validate groups of observations obtained via clustering. A common validation approach involves testing differences in feature means between observations in two estimated clusters. In…

Methodology · Statistics 2023-11-29 Yiqun T. Chen , Lucy L. Gao

Selective inference for clustering with unknown variance

In many modern statistical problems, the limited available data must be used both to develop the hypotheses to test, and to test these hypotheses-that is, both for exploratory and confirmatory data analysis. Reusing the same dataset for…

Methodology · Statistics 2023-07-24 Youngjoo Yun , Rina Foygel Barber

Selective Randomization Inference for Adaptive Experiments

Adaptive experiments use preliminary analyses of the data to inform further course of action and are commonly used in many disciplines including medical and social sciences. Because the null hypothesis and experimental design are…

Methodology · Statistics 2026-05-26 Tobias Freidling , Qingyuan Zhao , Zijun Gao

Post-clustering difference testing: valid inference and practical considerations

Clustering is part of unsupervised analysis methods that consist in grouping samples into homogeneous and separate subgroups of observations also called clusters. To interpret the clusters, statistical hypothesis testing is often used to…

Methodology · Statistics 2022-10-25 Benjamin Hivert , Denis Agniel , Rodolphe Thiébaut , Boris P Hejblum

Class Specific Feature Selection for Interval Valued Data Through Interval K-Means Clustering

In this paper, a novel feature selection approach for supervised interval valued features is proposed. The proposed approach takes care of selecting the class specific features through interval K-Means clustering. The kernel of K-Means…

Computer Vision and Pattern Recognition · Computer Science 2017-06-01 D. S. Guru , N. Vinay Kumar

Blocked Clusterwise Regression

A recent literature in econometrics models unobserved cross-sectional heterogeneity in panel data by assigning each cross-sectional unit a one-dimensional, discrete latent type. Such models have been shown to allow estimation and inference…

Econometrics · Economics 2020-01-31 Max Cytrynbaum

Selective inference using randomized group lasso estimators for general models

Selective inference methods are developed for group lasso estimators for use with a wide class of distributions and loss functions. The method includes the use of exponential family distributions, as well as quasi-likelihood modeling for…

Methodology · Statistics 2024-03-28 Yiling Huang , Sarah Pirenne , Snigdha Panigrahi , Gerda Claeskens

Robust Inference Methods for Latent Group Panel Models under Possible Group Non-Separation

This paper presents robust inference methods for general linear hypotheses in linear panel data models with latent group structure in the coefficients. We employ a selective conditional inference approach, deriving the conditional…

Econometrics · Economics 2025-11-25 Oguzhan Akgun , Ryo Okui

Panel Data with Unknown Clusters

Clustered standard errors and approximate randomization tests are popular inference methods that allow for dependence within observations. However, they require researchers to know the cluster structure ex ante. We propose a procedure to…

Econometrics · Economics 2022-01-14 Yong Cai

Selective Inference via Marginal Screening for High Dimensional Classification

Post-selection inference is a statistical technique for determining salient variables after model or variable selection. Recently, selective inference, a kind of post-selection inference framework, has garnered the attention in the…

Methodology · Statistics 2019-06-28 Yuta Umezu , Ichiro Takeuchi

Estimating the number of clusters using cross-validation

Many clustering methods, including k-means, require the user to specify the number of clusters as an input parameter. A variety of methods have been devised to choose the number of clusters automatically, but they often rely on strong…

Methodology · Statistics 2017-02-10 Wei Fu , Patrick O. Perry

Inference conditional on selection: a review

In this article, we review selective inference, a set of techniques for inference when the statistical question asked is a function of the data. This setting often arises in contemporary scientific workflows, where hypotheses and parameters…

Methodology · Statistics 2026-04-14 Anna Neufeld , Ronan Perry , Daniela Witten

Selective Inference for Group-Sparse Linear Models

We develop tools for selective inference in the setting of group sparsity, including the construction of confidence intervals and p-values for testing selected groups of variables. Our main technical result gives the precise distribution of…

Methodology · Statistics 2016-07-28 Fan Yang , Rina Foygel Barber , Prateek Jain , John Lafferty

A Causal Direction Test for Heterogeneous Populations

A probabilistic expert system emulates the decision-making ability of a human expert through a directional graphical model. The first step in building such systems is to understand data generation mechanism. To this end, one may try to…

Methodology · Statistics 2021-09-29 Vahid Partovi Nia , Xinlin Li , Masoud Asgharian , Shoubo Hu , Zhitang Chen , Yanhui Geng

Selective Inference in Propensity Score Analysis

Selective inference (post-selection inference) is a methodology that has attracted much attention in recent years in the fields of statistics and machine learning. Naive inference based on data that are also used for model selection tends…

Methodology · Statistics 2021-11-25 Yoshiyuki Ninomiya , Yuta Umezu , Ichiro Takeuchi

Inference for Forecasting Accuracy: Pooled versus Individual Estimators in High-dimensional Panel Data

Panels with large time $(T)$ and cross-sectional $(N)$ dimensions are a key data structure in social sciences and other fields. A central question in panel data analysis is whether to pool data across individuals or to estimate separate…

Methodology · Statistics 2025-12-18 Tim Kutta , Martin Schumann , Holger Dette