English

Two-sample test based on Self-Organizing Maps

Machine Learning 2022-12-20 v1 Neural and Evolutionary Computing

Abstract

Machine-learning classifiers can be leveraged as a two-sample statistical test. Suppose each sample is assigned a different label and that a classifier can obtain a better-than-chance result discriminating them. In this case, we can infer that both samples originate from different populations. However, many types of models, such as neural networks, behave as a black-box for the user: they can reject that both samples originate from the same population, but they do not offer insight into how both samples differ. Self-Organizing Maps are a dimensionality reduction initially devised as a data visualization tool that displays emergent properties, being also useful for classification tasks. Since they can be used as classifiers, they can be used also as a two-sample statistical test. But since their original purpose is visualization, they can also offer insights.

Keywords

Cite

@article{arxiv.2212.08960,
  title  = {Two-sample test based on Self-Organizing Maps},
  author = {Alejandro Álvarez-Ayllón and Manuel Palomo-Duarte and Juan-Manuel Dodero},
  journal= {arXiv preprint arXiv:2212.08960},
  year   = {2022}
}

Comments

27 pages