Related papers: Learning Adaptive Reasoning Paths for Efficient Vi…

PhysVLM-AVR: Active Visual Reasoning for Multimodal Large Language Models in Physical Environments

Visual reasoning in multimodal large language models (MLLMs) has primarily been studied in static, fully observable settings, limiting their effectiveness in real-world environments where information is often incomplete due to occlusion or…

Computer Vision and Pattern Recognition · Computer Science 2025-10-27 Weijie Zhou , Xuantang Xiong , Yi Peng , Manli Tao , Chaoyang Zhao , Honghui Dong , Ming Tang , Jinqiao Wang

Look Less, Reason More: Rollout-Guided Adaptive Pixel-Space Reasoning

Vision-Language Models (VLMs) excel at many multimodal tasks, yet they frequently struggle with tasks requiring precise understanding and handling of fine-grained visual elements. This is mainly due to information loss during image encoding…

Computer Vision and Pattern Recognition · Computer Science 2025-10-03 Xuchen Li , Xuzhao Li , Jiahui Gao , Renjie Pi , Shiyu Hu , Wentao Zhang

Reason-RFT: Reinforcement Fine-Tuning for Visual Reasoning of Vision Language Models

Visual reasoning abilities play a crucial role in understanding complex multimodal data, advancing both domain-specific applications and artificial general intelligence (AGI). Existing methods enhance Vision-Language Models (VLMs) through…

Computer Vision and Pattern Recognition · Computer Science 2025-10-07 Huajie Tan , Yuheng Ji , Xiaoshuai Hao , Xiansheng Chen , Pengwei Wang , Zhongyuan Wang , Shanghang Zhang

ARM2: Adaptive Reasoning Model with Vision Understanding and Executable Code

Large Reasoning Models (LRMs) often suffer from the ``over-thinking'' problem, generating unnecessarily long reasoning on simple tasks. Some strategies have been proposed to mitigate this issue, such as length penalties or routing…

Computation and Language · Computer Science 2025-10-16 Jian Xie , Zhendong Chu , Aoxiao Zhong , Kai Zhang , Mingzhe Han , Xing Fan , Jialie Shen , Qingsong Wen

Mixture-of-Visual-Thoughts: Exploring Context-Adaptive Reasoning Mode Selection for General Visual Reasoning

Current visual reasoning methods mainly focus on exploring specific reasoning modes. Although improvements can be achieved in particular domains, they struggle to develop general reasoning capabilities. Inspired by this, we propose a novel…

Artificial Intelligence · Computer Science 2026-05-15 Zejun Li , Yingxiu Zhao , Jiwen Zhang , Siyuan Wang , Yang Yao , Runzhou Zhao , Jun Song , Bo Zheng , Zhongyu Wei

Deep Learning Methods for Abstract Visual Reasoning: A Survey on Raven's Progressive Matrices

Abstract visual reasoning (AVR) domain encompasses problems solving which requires the ability to reason about relations among entities present in a given scene. While humans, generally, solve AVR tasks in a "natural" way, even without…

Artificial Intelligence · Computer Science 2025-02-24 Mikołaj Małkiński , Jacek Mańdziuk

Latent Visual Reasoning

Multimodal Large Language Models (MLLMs) have achieved notable gains in various tasks by incorporating Chain-of-Thought (CoT) reasoning in language spaces. Recent work extends this direction by leveraging external tools for visual editing,…

Computer Vision and Pattern Recognition · Computer Science 2025-10-07 Bangzheng Li , Ximeng Sun , Jiang Liu , Ze Wang , Jialian Wu , Xiaodong Yu , Hao Chen , Emad Barsoum , Muhao Chen , Zicheng Liu

ARM: Adaptive Reasoning Model

While large reasoning models demonstrate strong performance on complex tasks, they lack the ability to adjust reasoning token usage based on task difficulty. This often leads to the "overthinking" problem -- excessive and unnecessary…

Computation and Language · Computer Science 2025-10-14 Siye Wu , Jian Xie , Yikai Zhang , Aili Chen , Kai Zhang , Yu Su , Yanghua Xiao

A Review of Emerging Research Directions in Abstract Visual Reasoning

Abstract Visual Reasoning (AVR) problems are commonly used to approximate human intelligence. They test the ability of applying previously gained knowledge, experience and skills in a completely new setting, which makes them particularly…

Artificial Intelligence · Computer Science 2023-02-27 Mikołaj Małkiński , Jacek Mańdziuk

VRAG-RL: Empower Vision-Perception-Based RAG for Visually Rich Information Understanding via Iterative Reasoning with Reinforcement Learning

Effectively retrieving, reasoning and understanding visually rich information remains a challenge for RAG methods. Traditional text-based methods cannot handle visual-related information. On the other hand, current vision-based RAG…

Computation and Language · Computer Science 2025-06-04 Qiuchen Wang , Ruixue Ding , Yu Zeng , Zehui Chen , Lin Chen , Shihang Wang , Pengjun Xie , Fei Huang , Feng Zhao

IVR-R1: Refining Trajectories through Iterative Visual-Grounded Reasoning in Reinforcement Learning

Multimodal large language models via reinforcement learning (RL) have demonstrated remarkable capabilities in complex visual reasoning tasks, yet they remain limited in long-horizon multimodal scenarios, often suffering from visual…

Computer Vision and Pattern Recognition · Computer Science 2026-05-26 Chenghao Li , Fusheng Hao , Xikai Zhang , Likang Xiao , Yanwei Ren , Fuxiang Wu , Quan Chen , Liu Liu

Visionary-R1: Mitigating Shortcuts in Visual Reasoning with Reinforcement Learning

Learning general-purpose reasoning capabilities has long been a challenging problem in AI. Recent research in large language models (LLMs), such as DeepSeek-R1, has shown that reinforcement learning techniques like GRPO can enable…

Computer Vision and Pattern Recognition · Computer Science 2025-10-28 Jiaer Xia , Yuhang Zang , Peng Gao , Sharon Li , Kaiyang Zhou

Learning to Think Fast and Slow for Visual Language Models

When faced with complex problems, we tend to engage in slower, more deliberate thinking. In contrast, for simple questions we give quick, intuitive responses. This dual-system thinking approach allows us to allocate cognitive resources…

Computer Vision and Pattern Recognition · Computer Science 2026-03-10 Chenyu Lin , Cheng Chi , Jinlin Wu , Sharon Li , Kaiyang Zhou

Smart Vision-Language Reasoners

In this article, we investigate vision-language models (VLM) as reasoners. The ability to form abstractions underlies mathematical reasoning, problem-solving, and other Math AI tasks. Several formalisms have been given to these underlying…

Artificial Intelligence · Computer Science 2024-07-08 Denisa Roberts , Lucas Roberts

From Illusion to Intention: Visual Rationale Learning for Vision-Language Reasoning

Recent advances in vision-language reasoning underscore the importance of thinking with images, where models actively ground their reasoning in visual evidence. Yet, prevailing frameworks treat visual actions as optional tools, boosting…

Computer Vision and Pattern Recognition · Computer Science 2025-12-01 Changpeng Wang , Haozhe Wang , Xi Chen , Junhan Liu , Taofeng Xue , Chong Peng , Donglian Qi , Fangzhen Lin , Yunfeng Yan

Visual Reasoning Agent: Robust Vision Systems in Remote Sensing via Inference-Time Scaling

Building robust vision systems for high-stakes domains such as remote sensing requires stronger visual reasoning than what single-pass inference typically provides; yet, retraining large models is often computationally expensive and data…

Computer Vision and Pattern Recognition · Computer Science 2026-04-22 Chung-En Johnny Yu , Brian Jalaian , Nathaniel D. Bastian

More Thought, Less Accuracy? On the Dual Nature of Reasoning in Vision-Language Models

Reasoning has emerged as a pivotal capability in Large Language Models (LLMs). Through Reinforcement Learning (RL), typically Group Relative Policy Optimization (GRPO), these models are able to solve complex tasks such as mathematics and…

Computer Vision and Pattern Recognition · Computer Science 2026-03-31 Xinyu Tian , Shu Zou , Zhaoyuan Yang , Mengqi He , Fabian Waschkowski , Lukas Wesemann , Peter Tu , Jing Zhang

GAMR: A Guided Attention Model for (visual) Reasoning

Humans continue to outperform modern AI systems in their ability to flexibly parse and understand complex visual scenes. Here, we present a novel module for visual reasoning, the Guided Attention Model for (visual) Reasoning (GAMR), which…

Artificial Intelligence · Computer Science 2023-03-22 Mohit Vaishnav , Thomas Serre

From Seeing to Thinking: Decoupling Perception and Reasoning Improves Post-Training of Vision-Language Models

Recent advances in vision-language models (VLMs) emphasize long chain-of-thought reasoning; yet, we find that their performance on visual tasks is primarily limited by a lack of visual perception as opposed to reasoning itself. In this…

Computation and Language · Computer Science 2026-05-20 Juncheng Wu , Hardy Chen , Haoqin Tu , Xianfeng Tang , Freda Shi , Hui Liu , Hanqing Lu , Cihang Xie , Yuyin Zhou

Input-Adaptive Visual Preprocessing for Efficient Fast Vision-Language Model Inference

Vision-Language Models (VLMs) have demonstrated strong performance on multimodal reasoning tasks, but their deployment remains challenging due to high inference latency and computational cost, particularly when processing high-resolution…

Computer Vision and Pattern Recognition · Computer Science 2025-12-25 Putu Indah Githa Cahyani , Komang David Dananjaya Suartana , Novanto Yudistira