A label-efficient two-sample test

Weizhi Li; Gautam Dasarathy; Karthikeyan Natesan Ramamurthy; Visar Berisha

A label-efficient two-sample test

Machine Learning 2022-07-20 v5 Machine Learning

Authors: Weizhi Li , Gautam Dasarathy , Karthikeyan Natesan Ramamurthy , Visar Berisha

Abstract

Two-sample tests evaluate whether two samples are realizations of the same distribution (the null hypothesis) or two different distributions (the alternative hypothesis). We consider a new setting for this problem where sample features are easily measured whereas sample labels are unknown and costly to obtain. Accordingly, we devise a three-stage framework in service of performing an effective two-sample test with only a small number of sample label queries: first, a classifier is trained with samples uniformly labeled to model the posterior probabilities of the labels; second, a novel query scheme dubbed \emph{bimodal query} is used to query labels of samples from both classes, and last, the classical Friedman-Rafsky (FR) two-sample test is performed on the queried samples. Theoretical analysis and extensive experiments performed on several datasets demonstrate that the proposed test controls the Type I error and has decreased Type II error relative to uniform querying and certainty-based querying. Source code for our algorithms and experimental results is available at \url{https://github.com/wayne0908/Label-Efficient-Two-Sample}.

Keywords

group testing randomized algorithm software testing

Cite

@article{arxiv.2111.08861,
  title  = {A label-efficient two-sample test},
  author = {Weizhi Li and Gautam Dasarathy and Karthikeyan Natesan Ramamurthy and Visar Berisha},
  journal= {arXiv preprint arXiv:2111.08861},
  year   = {2022}
}

Comments

Accepted to the 38th conference on Uncertainty in Artificial Intelligence (UAI2022)

A label-efficient two-sample test

Abstract

Keywords

Cite

Comments

Related papers