Active Learning from Crowd in Document Screening

Evgeny Krivosheev; Burcu Sayin; Alessandro Bozzon; Zoltán Szlávik

Active Learning from Crowd in Document Screening

Information Retrieval 2020-12-07 v1 Computation and Language Machine Learning

Authors: Evgeny Krivosheev , Burcu Sayin , Alessandro Bozzon , Zoltán Szlávik

Abstract

In this paper, we explore how to efficiently combine crowdsourcing and machine intelligence for the problem of document screening, where we need to screen documents with a set of machine-learning filters. Specifically, we focus on building a set of machine learning classifiers that evaluate documents, and then screen them efficiently. It is a challenging task since the budget is limited and there are countless number of ways to spend the given budget on the problem. We propose a multi-label active learning screening specific sampling technique -- objective-aware sampling -- for querying unlabelled documents for annotating. Our algorithm takes a decision on which machine filter need more training data and how to choose unlabeled items to annotate in order to minimize the risk of overall classification errors rather than minimizing a single filter error. We demonstrate that objective-aware sampling significantly outperforms the state of the art active learning sampling strategies.

Keywords

active learning crowdsourcing data annotation

Cite

@article{arxiv.2012.02297,
  title  = {Active Learning from Crowd in Document Screening},
  author = {Evgeny Krivosheev and Burcu Sayin and Alessandro Bozzon and Zoltán Szlávik},
  journal= {arXiv preprint arXiv:2012.02297},
  year   = {2020}
}

Comments

Crowd Science Workshop at NeurIPS 2020

Active Learning from Crowd in Document Screening

Abstract

Keywords

Cite

Comments

Related papers