English

Vision Tiny Recursion Model (ViTRM): Parameter-Efficient Image Classification via Recursive State Refinement

Computer Vision and Pattern Recognition 2026-04-02 v2

Abstract

The success of deep learning in computer vision has been driven by models of increasing scale, from deep Convolutional Neural Networks (CNN) to large Vision Transformers (ViT). While effective, these architectures are parameter-intensive and demand significant computational resources, limiting deployment in resource-constrained environments. Inspired by Tiny Recursive Models (TRM), which show that small recursive networks can solve complex reasoning tasks through iterative state refinement, we introduce the \textbf{Vision Tiny Recursion Model (ViTRM)}: a parameter-efficient architecture that replaces the LL-layer ViT encoder with a single tiny kk-layer block (k=3k{=}3) applied recursively NN times. Despite using up to 6×6 \times and 84×84 \times fewer parameters than CNN based models and ViT respectively, ViTRM maintains competitive performance on CIFAR-10 and CIFAR-100. This demonstrates that recursive computation is a viable, parameter-efficient alternative to architectural depth in vision.

Keywords

Cite

@article{arxiv.2603.19503,
  title  = {Vision Tiny Recursion Model (ViTRM): Parameter-Efficient Image Classification via Recursive State Refinement},
  author = {Ange-Clément Akazan and Abdoulaye Koroko and Verlon Roel Mbingui and Choukouriyah Arinloye and Hassan Fifen and Rose Bandolo},
  journal= {arXiv preprint arXiv:2603.19503},
  year   = {2026}
}