Mitigating dimensionality effects with robust graph constructions for testing
Abstract
Dimensionality effects pose major challenges in high-dimensional and non-Euclidean data analysis. Graph-based two-sample tests and change-point detection are particularly attractive in this context, as they make minimal distributional assumptions and perform well across a wide range of scenarios. These methods rely on similarity graphs constructed from data, with -nearest neighbor graphs and -minimum spanning trees among the most effective and widely used. However, in high-dimensional and non-Euclidean regimes such graphs often produce hubs -- nodes with disproportionately high degrees -- to which graph-based methods are especially sensitive. To mitigate these dimensionality effects, we propose a robust graph construction that is far less prone to hub formation. Incorporating this construction substantially improves the power of graph-based methods across diverse settings. We further establish a theoretical foundation by proving its consistency under fixed alternatives in both low- and high-dimensional regimes. The effectiveness of the approach is demonstrated through real-world applications, including comparisons of correlation matrices for brain regions, gene expression profiles of T cells, and temporal changes in New York City taxi travel patterns.
Cite
@article{arxiv.2307.15205,
title = {Mitigating dimensionality effects with robust graph constructions for testing},
author = {Yejiong Zhu and Hao Chen},
journal= {arXiv preprint arXiv:2307.15205},
year = {2025}
}