English

Benchmarking Feature Upsampling Methods for Vision Foundation Models using Interactive Segmentation

Computer Vision and Pattern Recognition 2025-05-06 v1 Artificial Intelligence Machine Learning

Abstract

Vision Foundation Models (VFMs) are large-scale, pre-trained models that serve as general-purpose backbones for various computer vision tasks. As VFMs' popularity grows, there is an increasing interest in understanding their effectiveness for dense prediction tasks. However, VFMs typically produce low-resolution features, limiting their direct applicability in this context. One way to tackle this limitation is by employing a task-agnostic feature upsampling module that refines VFM features resolution. To assess the effectiveness of this approach, we investigate Interactive Segmentation (IS) as a novel benchmark for evaluating feature upsampling methods on VFMs. Due to its inherent multimodal input, consisting of an image and a set of user-defined clicks, as well as its dense mask output, IS creates a challenging environment that demands comprehensive visual scene understanding. Our benchmarking experiments show that selecting appropriate upsampling strategies significantly improves VFM features quality. The code is released at https://github.com/havrylovv/iSegProbe

Keywords

Cite

@article{arxiv.2505.02075,
  title  = {Benchmarking Feature Upsampling Methods for Vision Foundation Models using Interactive Segmentation},
  author = {Volodymyr Havrylov and Haiwen Huang and Dan Zhang and Andreas Geiger},
  journal= {arXiv preprint arXiv:2505.02075},
  year   = {2025}
}
R2 v1 2026-06-28T23:20:34.552Z