English

Training-Free Dataset Pruning for Instance Segmentation

Computer Vision and Pattern Recognition 2025-03-04 v1 Machine Learning

Abstract

Existing dataset pruning techniques primarily focus on classification tasks, limiting their applicability to more complex and practical tasks like instance segmentation. Instance segmentation presents three key challenges: pixel-level annotations, instance area variations, and class imbalances, which significantly complicate dataset pruning efforts. Directly adapting existing classification-based pruning methods proves ineffective due to their reliance on time-consuming model training process. To address this, we propose a novel Training-Free Dataset Pruning (TFDP) method for instance segmentation. Specifically, we leverage shape and class information from image annotations to design a Shape Complexity Score (SCS), refining it into a Scale-Invariant (SI-SCS) and Class-Balanced (CB-SCS) versions to address instance area variations and class imbalances, all without requiring model training. We achieve state-of-the-art results on VOC 2012, Cityscapes, and COCO datasets, generalizing well across CNN and Transformer architectures. Remarkably, our approach accelerates the pruning process by an average of 1349×\times on COCO compared to the adapted baselines. Source code is available at: https://github.com/he-y/dataset-pruning-for-instance-segmentation

Keywords

Cite

@article{arxiv.2503.00828,
  title  = {Training-Free Dataset Pruning for Instance Segmentation},
  author = {Yalun Dai and Lingao Xiao and Ivor W. Tsang and Yang He},
  journal= {arXiv preprint arXiv:2503.00828},
  year   = {2025}
}

Comments

Accepted by ICLR 2025