Adaptive Patch Contrast for Weakly Supervised Semantic Segmentation

Wangyu Wu; Tianhong Dai; Zhenhong Chen; Xiaowei Huang; Jimin Xiao; Fei Ma; Renrong Ouyang

doi:10.1016/j.engappai.2024.109626

Adaptive Patch Contrast for Weakly Supervised Semantic Segmentation

Computer Vision and Pattern Recognition 2024-11-28 v2

Authors: Wangyu Wu , Tianhong Dai , Zhenhong Chen , Xiaowei Huang , Jimin Xiao , Fei Ma , Renrong Ouyang

View on arXiv ↗ PDF ↗ DOI ↗

Abstract

Weakly Supervised Semantic Segmentation (WSSS) using only image-level labels has gained significant attention due to its cost-effectiveness. The typical framework involves using image-level labels as training data to generate pixel-level pseudo-labels with refinements. Recently, methods based on Vision Transformers (ViT) have demonstrated superior capabilities in generating reliable pseudo-labels, particularly in recognizing complete object regions, compared to CNN methods. However, current ViT-based approaches have some limitations in the use of patch embeddings, being prone to being dominated by certain abnormal patches, as well as many multi-stage methods being time-consuming and lengthy in training, thus lacking efficiency. Therefore, in this paper, we introduce a novel ViT-based WSSS method named \textit{Adaptive Patch Contrast} (APC) that significantly enhances patch embedding learning for improved segmentation effectiveness. APC utilizes an Adaptive-K Pooling (AKP) layer to address the limitations of previous max pooling selection methods. Additionally, we propose a Patch Contrastive Learning (PCL) to enhance patch embeddings, thereby further improving the final results. Furthermore, we improve upon the existing multi-stage training framework without CAM by transforming it into an end-to-end single-stage training approach, thereby enhancing training efficiency. The experimental results show that our approach is effective and efficient, outperforming other state-of-the-art WSSS methods on the PASCAL VOC 2012 and MS COCO 2014 dataset within a shorter training duration.

Keywords

weakly supervised semantic segmentation vision transformer image segmentation

Cite

@article{arxiv.2407.10649,
  title  = {Adaptive Patch Contrast for Weakly Supervised Semantic Segmentation},
  author = {Wangyu Wu and Tianhong Dai and Zhenhong Chen and Xiaowei Huang and Jimin Xiao and Fei Ma and Renrong Ouyang},
  journal= {arXiv preprint arXiv:2407.10649},
  year   = {2024}
}

Comments

Accepted by the EAAI Journal

Adaptive Patch Contrast for Weakly Supervised Semantic Segmentation

Abstract

Keywords

Cite

Comments

Related papers