English

Visual Accommodation: Rethinking Image Scale as a Learnable Variable for Object Detection

Computer Vision and Pattern Recognition 2026-05-14 v2 Artificial Intelligence

Abstract

We propose Ciliary-DETR (previous name: Elastic-DETR), a framework for test-time resolution adjustment analogous to biological accommodation. While multi-scale data augmentation improves robustness to scale variation, modern detectors rely on fixed inference resolutions, potentially limiting flexibility and robustness. Similar to the ciliary muscle, we introduce a lightweight scale predictor that dynamically estimates test-time scale factors across a wide range of input scales. The core challenge is that the optimal input scale is inherently unobservable under standard training setups. To address this challenge, we introduce a parametric formulation of desired scaling behavior, leading to loss-driven objectives that guide scale optimization. Overall, our method enables flexible and efficient single-pass inference, bridging the gap between training-time robustness and test-time adaptation.

Keywords

Cite

@article{arxiv.2412.06341,
  title  = {Visual Accommodation: Rethinking Image Scale as a Learnable Variable for Object Detection},
  author = {Daeun Seo and Hoeseok Yang and Sihyeong Park and Hyungshin Kim},
  journal= {arXiv preprint arXiv:2412.06341},
  year   = {2026}
}

Comments

23 pages, 11 figures

R2 v1 2026-06-28T20:27:39.723Z