English
Related papers

Related papers: Learning Adaptive Reasoning Paths for Efficient Vi…

200 papers

Visual reasoning in multimodal large language models (MLLMs) has primarily been studied in static, fully observable settings, limiting their effectiveness in real-world environments where information is often incomplete due to occlusion or…

Computer Vision and Pattern Recognition · Computer Science 2025-10-27 Weijie Zhou , Xuantang Xiong , Yi Peng , Manli Tao , Chaoyang Zhao , Honghui Dong , Ming Tang , Jinqiao Wang

Vision-Language Models (VLMs) excel at many multimodal tasks, yet they frequently struggle with tasks requiring precise understanding and handling of fine-grained visual elements. This is mainly due to information loss during image encoding…

Computer Vision and Pattern Recognition · Computer Science 2025-10-03 Xuchen Li , Xuzhao Li , Jiahui Gao , Renjie Pi , Shiyu Hu , Wentao Zhang

Visual reasoning abilities play a crucial role in understanding complex multimodal data, advancing both domain-specific applications and artificial general intelligence (AGI). Existing methods enhance Vision-Language Models (VLMs) through…

Computer Vision and Pattern Recognition · Computer Science 2025-10-07 Huajie Tan , Yuheng Ji , Xiaoshuai Hao , Xiansheng Chen , Pengwei Wang , Zhongyuan Wang , Shanghang Zhang

Large Reasoning Models (LRMs) often suffer from the ``over-thinking'' problem, generating unnecessarily long reasoning on simple tasks. Some strategies have been proposed to mitigate this issue, such as length penalties or routing…

Computation and Language · Computer Science 2025-10-16 Jian Xie , Zhendong Chu , Aoxiao Zhong , Kai Zhang , Mingzhe Han , Xing Fan , Jialie Shen , Qingsong Wen

Current visual reasoning methods mainly focus on exploring specific reasoning modes. Although improvements can be achieved in particular domains, they struggle to develop general reasoning capabilities. Inspired by this, we propose a novel…

Artificial Intelligence · Computer Science 2026-05-15 Zejun Li , Yingxiu Zhao , Jiwen Zhang , Siyuan Wang , Yang Yao , Runzhou Zhao , Jun Song , Bo Zheng , Zhongyu Wei

Abstract visual reasoning (AVR) domain encompasses problems solving which requires the ability to reason about relations among entities present in a given scene. While humans, generally, solve AVR tasks in a "natural" way, even without…

Artificial Intelligence · Computer Science 2025-02-24 Mikołaj Małkiński , Jacek Mańdziuk

Multimodal Large Language Models (MLLMs) have achieved notable gains in various tasks by incorporating Chain-of-Thought (CoT) reasoning in language spaces. Recent work extends this direction by leveraging external tools for visual editing,…

Computer Vision and Pattern Recognition · Computer Science 2025-10-07 Bangzheng Li , Ximeng Sun , Jiang Liu , Ze Wang , Jialian Wu , Xiaodong Yu , Hao Chen , Emad Barsoum , Muhao Chen , Zicheng Liu

While large reasoning models demonstrate strong performance on complex tasks, they lack the ability to adjust reasoning token usage based on task difficulty. This often leads to the "overthinking" problem -- excessive and unnecessary…

Computation and Language · Computer Science 2025-10-14 Siye Wu , Jian Xie , Yikai Zhang , Aili Chen , Kai Zhang , Yu Su , Yanghua Xiao

Abstract Visual Reasoning (AVR) problems are commonly used to approximate human intelligence. They test the ability of applying previously gained knowledge, experience and skills in a completely new setting, which makes them particularly…

Artificial Intelligence · Computer Science 2023-02-27 Mikołaj Małkiński , Jacek Mańdziuk

Effectively retrieving, reasoning and understanding visually rich information remains a challenge for RAG methods. Traditional text-based methods cannot handle visual-related information. On the other hand, current vision-based RAG…

Computation and Language · Computer Science 2025-06-04 Qiuchen Wang , Ruixue Ding , Yu Zeng , Zehui Chen , Lin Chen , Shihang Wang , Pengjun Xie , Fei Huang , Feng Zhao

Multimodal large language models via reinforcement learning (RL) have demonstrated remarkable capabilities in complex visual reasoning tasks, yet they remain limited in long-horizon multimodal scenarios, often suffering from visual…

Computer Vision and Pattern Recognition · Computer Science 2026-05-26 Chenghao Li , Fusheng Hao , Xikai Zhang , Likang Xiao , Yanwei Ren , Fuxiang Wu , Quan Chen , Liu Liu

Learning general-purpose reasoning capabilities has long been a challenging problem in AI. Recent research in large language models (LLMs), such as DeepSeek-R1, has shown that reinforcement learning techniques like GRPO can enable…

Computer Vision and Pattern Recognition · Computer Science 2025-10-28 Jiaer Xia , Yuhang Zang , Peng Gao , Sharon Li , Kaiyang Zhou

When faced with complex problems, we tend to engage in slower, more deliberate thinking. In contrast, for simple questions we give quick, intuitive responses. This dual-system thinking approach allows us to allocate cognitive resources…

Computer Vision and Pattern Recognition · Computer Science 2026-03-10 Chenyu Lin , Cheng Chi , Jinlin Wu , Sharon Li , Kaiyang Zhou

In this article, we investigate vision-language models (VLM) as reasoners. The ability to form abstractions underlies mathematical reasoning, problem-solving, and other Math AI tasks. Several formalisms have been given to these underlying…

Artificial Intelligence · Computer Science 2024-07-08 Denisa Roberts , Lucas Roberts

Recent advances in vision-language reasoning underscore the importance of thinking with images, where models actively ground their reasoning in visual evidence. Yet, prevailing frameworks treat visual actions as optional tools, boosting…

Computer Vision and Pattern Recognition · Computer Science 2025-12-01 Changpeng Wang , Haozhe Wang , Xi Chen , Junhan Liu , Taofeng Xue , Chong Peng , Donglian Qi , Fangzhen Lin , Yunfeng Yan

Building robust vision systems for high-stakes domains such as remote sensing requires stronger visual reasoning than what single-pass inference typically provides; yet, retraining large models is often computationally expensive and data…

Computer Vision and Pattern Recognition · Computer Science 2026-04-22 Chung-En Johnny Yu , Brian Jalaian , Nathaniel D. Bastian

Reasoning has emerged as a pivotal capability in Large Language Models (LLMs). Through Reinforcement Learning (RL), typically Group Relative Policy Optimization (GRPO), these models are able to solve complex tasks such as mathematics and…

Computer Vision and Pattern Recognition · Computer Science 2026-03-31 Xinyu Tian , Shu Zou , Zhaoyuan Yang , Mengqi He , Fabian Waschkowski , Lukas Wesemann , Peter Tu , Jing Zhang

Humans continue to outperform modern AI systems in their ability to flexibly parse and understand complex visual scenes. Here, we present a novel module for visual reasoning, the Guided Attention Model for (visual) Reasoning (GAMR), which…

Artificial Intelligence · Computer Science 2023-03-22 Mohit Vaishnav , Thomas Serre

Recent advances in vision-language models (VLMs) emphasize long chain-of-thought reasoning; yet, we find that their performance on visual tasks is primarily limited by a lack of visual perception as opposed to reasoning itself. In this…

Computation and Language · Computer Science 2026-05-20 Juncheng Wu , Hardy Chen , Haoqin Tu , Xianfeng Tang , Freda Shi , Hui Liu , Hanqing Lu , Cihang Xie , Yuyin Zhou

Vision-Language Models (VLMs) have demonstrated strong performance on multimodal reasoning tasks, but their deployment remains challenging due to high inference latency and computational cost, particularly when processing high-resolution…

Computer Vision and Pattern Recognition · Computer Science 2025-12-25 Putu Indah Githa Cahyani , Komang David Dananjaya Suartana , Novanto Yudistira
‹ Prev 1 2 3 10 Next ›