Binary classification with ambiguous training data

Naoya Otani; Yosuke Otsubo; Tetsuya Koike; Masashi Sugiyama

doi:10.1007/s10994-020-05915-2

Binary classification with ambiguous training data

Machine Learning 2020-11-25 v1

Authors: Naoya Otani , Yosuke Otsubo , Tetsuya Koike , Masashi Sugiyama

View on arXiv ↗ PDF ↗ DOI ↗

Abstract

In supervised learning, we often face with ambiguous (A) samples that are difficult to label even by domain experts. In this paper, we consider a binary classification problem in the presence of such A samples. This problem is substantially different from semi-supervised learning since unlabeled samples are not necessarily difficult samples. Also, it is different from 3-class classification with the positive (P), negative (N), and A classes since we do not want to classify test samples into the A class. Our proposed method extends binary classification with reject option, which trains a classifier and a rejector simultaneously using P and N samples based on the 0-1- $c$ loss with rejection cost $c$ . More specifically, we propose to train a classifier and a rejector under the 0-1- $c$ - $d$ loss using P, N, and A samples, where $d$ is the misclassification penalty for ambiguous samples. In our practical implementation, we use a convex upper bound of the 0-1- $c$ - $d$ loss for computational tractability. Numerical experiments demonstrate that our method can successfully utilize the additional information brought by such A training data.

Keywords

classification positive-unlabeled learning semi-supervised learning

Cite

@article{arxiv.2011.02598,
  title  = {Binary classification with ambiguous training data},
  author = {Naoya Otani and Yosuke Otsubo and Tetsuya Koike and Masashi Sugiyama},
  journal= {arXiv preprint arXiv:2011.02598},
  year   = {2020}
}

Comments

20 pages, 6 figures, accepted at the 12th Asian Conference on Machine Learning (ACML 2020)

Binary classification with ambiguous training data

Abstract

Keywords

Cite

Comments

Related papers