English

Multi-label Classification with Panoptic Context Aggregation Networks

Computer Vision and Pattern Recognition 2025-12-30 v1

Abstract

Context modeling is crucial for visual recognition, enabling highly discriminative image representations by integrating both intrinsic and extrinsic relationships between objects and labels in images. A limitation in current approaches is their focus on basic geometric relationships or localized features, often neglecting cross-scale contextual interactions between objects. This paper introduces the Deep Panoptic Context Aggregation Network (PanCAN), a novel approach that hierarchically integrates multi-order geometric contexts through cross-scale feature aggregation in a high-dimensional Hilbert space. Specifically, PanCAN learns multi-order neighborhood relationships at each scale by combining random walks with an attention mechanism. Modules from different scales are cascaded, where salient anchors at a finer scale are selected and their neighborhood features are dynamically fused via attention. This enables effective cross-scale modeling that significantly enhances complex scene understanding by combining multi-order and cross-scale context-aware features. Extensive multi-label classification experiments on NUS-WIDE, PASCAL VOC2007, and MS-COCO benchmarks demonstrate that PanCAN consistently achieves competitive results, outperforming state-of-the-art techniques in both quantitative and qualitative evaluations, thereby substantially improving multi-label classification performance.

Keywords

Cite

@article{arxiv.2512.23486,
  title  = {Multi-label Classification with Panoptic Context Aggregation Networks},
  author = {Mingyuan Jiu and Hailong Zhu and Wenchuan Wei and Hichem Sahbi and Rongrong Ji and Mingliang Xu},
  journal= {arXiv preprint arXiv:2512.23486},
  year   = {2025}
}