English

Kernel Two-Sample Testing via Directional Components Analysis

Methodology 2025-08-21 v2 Statistics Theory Machine Learning Statistics Theory

Abstract

We propose a novel kernel-based two-sample test that leverages the spectral decomposition of the maximum mean discrepancy (MMD) statistic to identify and utilize well-estimated directional components in reproducing kernel Hilbert space (RKHS). Our approach is motivated by the observation that the estimation quality of these components varies significantly, with leading eigen-directions being more reliably estimated in finite samples. By focusing on these directions and aggregating information across multiple kernels, the proposed test achieves higher power and improved robustness, especially in high-dimensional and unbalanced sample settings. We further develop a computationally efficient multiplier bootstrap procedure for approximating critical values, which is theoretically justified and significantly faster than permutation-based alternatives. Extensive simulations and empirical studies on microarray datasets demonstrate that our method maintains the nominal Type I error rate and delivers superior power compared to other existing MMD-based tests.

Keywords

Cite

@article{arxiv.2508.08564,
  title  = {Kernel Two-Sample Testing via Directional Components Analysis},
  author = {Rui Cui and Yuhao Li and Xiaojun Song},
  journal= {arXiv preprint arXiv:2508.08564},
  year   = {2025}
}

Comments

correct some typos in both the manuscript and code