English

Two-sample Statistics Based on Anisotropic Kernels

Machine Learning 2018-09-03 v3 Machine Learning Applications Computation

Abstract

The paper introduces a new kernel-based Maximum Mean Discrepancy (MMD) statistic for measuring the distance between two distributions given finitely-many multivariate samples. When the distributions are locally low-dimensional, the proposed test can be made more powerful to distinguish certain alternatives by incorporating local covariance matrices and constructing an anisotropic kernel. The kernel matrix is asymmetric; it computes the affinity between nn data points and a set of nRn_R reference points, where nRn_R can be drastically smaller than nn. While the proposed statistic can be viewed as a special class of Reproducing Kernel Hilbert Space MMD, the consistency of the test is proved, under mild assumptions of the kernel, as long as pqn\|p-q\| \sqrt{n} \to \infty , and a finite-sample lower bound of the testing power is obtained. Applications to flow cytometry and diffusion MRI datasets are demonstrated, which motivate the proposed approach to compare distributions.

Keywords

Cite

@article{arxiv.1709.05006,
  title  = {Two-sample Statistics Based on Anisotropic Kernels},
  author = {Xiuyuan Cheng and Alexander Cloninger and Ronald R. Coifman},
  journal= {arXiv preprint arXiv:1709.05006},
  year   = {2018}
}