Semi-Supervised U-statistics

Ilmun Kim; Larry Wasserman; Sivaraman Balakrishnan; Matey Neykov

Semi-Supervised U-statistics

Statistics Theory 2024-03-12 v2 Methodology Machine Learning Statistics Theory

Authors: Ilmun Kim , Larry Wasserman , Sivaraman Balakrishnan , Matey Neykov

Abstract

Semi-supervised datasets are ubiquitous across diverse domains where obtaining fully labeled data is costly or time-consuming. The prevalence of such datasets has consistently driven the demand for new tools and methods that exploit the potential of unlabeled data. Responding to this demand, we introduce semi-supervised U-statistics enhanced by the abundance of unlabeled data, and investigate their statistical properties. We show that the proposed approach is asymptotically Normal and exhibits notable efficiency gains over classical U-statistics by effectively integrating various powerful prediction tools into the framework. To understand the fundamental difficulty of the problem, we derive minimax lower bounds in semi-supervised settings and showcase that our procedure is semi-parametrically efficient under regularity conditions. Moreover, tailored to bivariate kernels, we propose a refined approach that outperforms the classical U-statistic across all degeneracy regimes, and demonstrate its optimality properties. Simulation studies are conducted to corroborate our findings and to further demonstrate our framework.

Keywords

statistical inference nonparametric regression signal detection

Cite

@article{arxiv.2402.18921,
  title  = {Semi-Supervised U-statistics},
  author = {Ilmun Kim and Larry Wasserman and Sivaraman Balakrishnan and Matey Neykov},
  journal= {arXiv preprint arXiv:2402.18921},
  year   = {2024}
}

Semi-Supervised U-statistics

Abstract

Keywords

Cite

Related papers