English

Task-Specific Context Decoupling for Object Detection

Computer Vision and Pattern Recognition 2023-03-03 v1

Abstract

Classification and localization are two main sub-tasks in object detection. Nonetheless, these two tasks have inconsistent preferences for feature context, i.e., localization expects more boundary-aware features to accurately regress the bounding box, while more semantic context is preferred for object classification. Exsiting methods usually leverage disentangled heads to learn different feature context for each task. However, the heads are still applied on the same input features, which leads to an imperfect balance between classifcation and localization. In this work, we propose a novel Task-Specific COntext DEcoupling (TSCODE) head which further disentangles the feature encoding for two tasks. For classification, we generate spatially-coarse but semantically-strong feature encoding. For localization, we provide high-resolution feature map containing more edge information to better regress object boundaries. TSCODE is plug-and-play and can be easily incorperated into existing detection pipelines. Extensive experiments demonstrate that our method stably improves different detectors by over 1.0 AP with less computational cost. Our code and models will be publicly released.

Keywords

Cite

@article{arxiv.2303.01047,
  title  = {Task-Specific Context Decoupling for Object Detection},
  author = {Jiayuan Zhuang and Zheng Qin and Hao Yu and Xucan Chen},
  journal= {arXiv preprint arXiv:2303.01047},
  year   = {2023}
}