English

Weakly-Supervised Temporal Localization via Occurrence Count Learning

Machine Learning 2019-05-20 v1 Sound Audio and Speech Processing Machine Learning

Abstract

We propose a novel model for temporal detection and localization which allows the training of deep neural networks using only counts of event occurrences as training labels. This powerful weakly-supervised framework alleviates the burden of the imprecise and time-consuming process of annotating event locations in temporal data. Unlike existing methods, in which localization is explicitly achieved by design, our model learns localization implicitly as a byproduct of learning to count instances. This unique feature is a direct consequence of the model's theoretical properties. We validate the effectiveness of our approach in a number of experiments (drum hit and piano onset detection in audio, digit detection in images) and demonstrate performance comparable to that of fully-supervised state-of-the-art methods, despite much weaker training requirements.

Keywords

Cite

@article{arxiv.1905.07293,
  title  = {Weakly-Supervised Temporal Localization via Occurrence Count Learning},
  author = {Julien Schroeter and Kirill Sidorov and David Marshall},
  journal= {arXiv preprint arXiv:1905.07293},
  year   = {2019}
}

Comments

Accepted at ICML 2019