We propose Ciliary-DETR (previous name: Elastic-DETR), a framework for test-time resolution adjustment analogous to biological accommodation. While multi-scale data augmentation improves robustness to scale variation, modern detectors rely on fixed inference resolutions, potentially limiting flexibility and robustness. Similar to the ciliary muscle, we introduce a lightweight scale predictor that dynamically estimates test-time scale factors across a wide range of input scales. The core challenge is that the optimal input scale is inherently unobservable under standard training setups. To address this challenge, we introduce a parametric formulation of desired scaling behavior, leading to loss-driven objectives that guide scale optimization. Overall, our method enables flexible and efficient single-pass inference, bridging the gap between training-time robustness and test-time adaptation.
@article{arxiv.2412.06341,
title = {Visual Accommodation: Rethinking Image Scale as a Learnable Variable for Object Detection},
author = {Daeun Seo and Hoeseok Yang and Sihyeong Park and Hyungshin Kim},
journal= {arXiv preprint arXiv:2412.06341},
year = {2026}
}