Saliency-Driven Versatile Video Coding for Neural Object Detection

Kristian Fischer; Felix Fleckenstein; Christian Herglotz; André Kaup

doi:10.1109/ICASSP39728.2021.9415048

Saliency-Driven Versatile Video Coding for Neural Object Detection

Computer Vision and Pattern Recognition 2022-03-14 v1 Image and Video Processing

Authors: Kristian Fischer , Felix Fleckenstein , Christian Herglotz , André Kaup

View on arXiv ↗ PDF ↗ DOI ↗

Abstract

Saliency-driven image and video coding for humans has gained importance in the recent past. In this paper, we propose such a saliency-driven coding framework for the video coding for machines task using the latest video coding standard Versatile Video Coding (VVC). To determine the salient regions before encoding, we employ the real-time-capable object detection network You Only Look Once~(YOLO) in combination with a novel decision criterion. To measure the coding quality for a machine, the state-of-the-art object segmentation network Mask R-CNN was applied to the decoded frame. From extensive simulations we find that, compared to the reference VVC with a constant quality, up to 29 % of bitrate can be saved with the same detection accuracy at the decoder side by applying the proposed saliency-driven framework. Besides, we compare YOLO against other, more traditional saliency detection methods.

Keywords

saliency detection object detection video retrieval

Cite

@article{arxiv.2203.05944,
  title  = {Saliency-Driven Versatile Video Coding for Neural Object Detection},
  author = {Kristian Fischer and Felix Fleckenstein and Christian Herglotz and André Kaup},
  journal= {arXiv preprint arXiv:2203.05944},
  year   = {2022}
}

Comments

5 pages, 3 figures, 2 tables; Originally submitted at IEEE ICASSP 2021

Saliency-Driven Versatile Video Coding for Neural Object Detection

Abstract

Keywords

Cite

Comments

Related papers