English

Task-Driven Fixation Network: An Efficient Architecture with Fixation Selection

Computer Vision and Pattern Recognition 2025-01-06 v1

Abstract

This paper presents a novel neural network architecture featuring automatic fixation point selection, designed to efficiently address complex tasks with reduced network size and computational overhead. The proposed model consists of: a low-resolution channel that captures low-resolution global features from input images; a high-resolution channel that sequentially extracts localized high-resolution features; and a hybrid encoding module that integrates the features from both channels. A defining characteristic of the hybrid encoding module is the inclusion of a fixation point generator, which dynamically produces fixation points, enabling the high-resolution channel to focus on regions of interest. The fixation points are generated in a task-driven manner, enabling the automatic selection of regions of interest. This approach avoids exhaustive high-resolution analysis of the entire image, maintaining task performance and computational efficiency.

Keywords

Cite

@article{arxiv.2501.01548,
  title  = {Task-Driven Fixation Network: An Efficient Architecture with Fixation Selection},
  author = {Shuguang Wang and Yuanjing Wang},
  journal= {arXiv preprint arXiv:2501.01548},
  year   = {2025}
}

Comments

9 pages, 2 figures, 2 tables