Related papers: Visual Reasoning through Tool-supervised Reinforce…

ROLL: Visual Self-Supervised Reinforcement Learning with Object Reasoning

Current image-based reinforcement learning (RL) algorithms typically operate on the whole image without performing object-level reasoning. This leads to inefficient goal sampling and ineffective reward functions. In this paper, we improve…

Machine Learning · Computer Science 2020-11-16 Yufei Wang , Gautham Narayan Narasimhan , Xingyu Lin , Brian Okorn , David Held

ToRL: Scaling Tool-Integrated RL

We introduce ToRL (Tool-Integrated Reinforcement Learning), a framework for training large language models (LLMs) to autonomously use computational tools via reinforcement learning. Unlike supervised fine-tuning, ToRL allows models to…

Computation and Language · Computer Science 2025-04-01 Xuefeng Li , Haoyang Zou , Pengfei Liu

SSL4RL: Revisiting Self-supervised Learning as Intrinsic Reward for Visual-Language Reasoning

Vision-language models (VLMs) have shown remarkable abilities by integrating large language models with visual inputs. However, they often fail to utilize visual evidence adequately, either depending on linguistic priors in vision-centric…

Computer Vision and Pattern Recognition · Computer Science 2026-05-19 Xiaojun Guo , Runyu Zhou , Yifei Wang , Qi Zhang , Chenheng Zhang , Stefanie Jegelka , Xiaohan Wang , Jiajun Chai , Guojun Yin , Wei Lin , Yisen Wang

VisRL: Intention-Driven Visual Perception via Reinforced Reasoning

Visual understanding is inherently intention-driven - humans selectively focus on different regions of a scene based on their goals. Recent advances in large multimodal models (LMMs) enable flexible expression of such intentions through…

Computer Vision and Pattern Recognition · Computer Science 2025-04-02 Zhangquan Chen , Xufang Luo , Dongsheng Li

Boosting Reinforcement Learning in 3D Visuospatial Tasks Through Human-Informed Curriculum Design

Reinforcement Learning is a mature technology, often suggested as a potential route towards Artificial General Intelligence, with the ambitious goal of replicating the wide range of abilities found in natural and artificial intelligence,…

Machine Learning · Computer Science 2025-11-25 Markus D. Solbach , John K. Tsotsos

From Sight to Insight: Improving Visual Reasoning Capabilities of Multimodal Models via Reinforcement Learning

Reinforcement learning (RL) has emerged as a promising approach for eliciting reasoning chains before generating final answers. However, multimodal large language models (MLLMs) generate reasoning that lacks integration of visual…

Computer Vision and Pattern Recognition · Computer Science 2026-01-05 Omar Sharif , Eftekhar Hossain , Patrick Ng

Grounded Reinforcement Learning for Visual Reasoning

While reinforcement learning (RL) over chains of thought has significantly advanced language models in tasks such as mathematics and coding, visual reasoning introduces added complexity by requiring models to direct visual attention,…

Computer Vision and Pattern Recognition · Computer Science 2026-05-18 Gabriel Sarch , Snigdha Saha , Naitik Khandelwal , Ayush Jain , Michael J. Tarr , Aviral Kumar , Katerina Fragkiadaki

SSL-R1: Self-Supervised Visual Reinforcement Post-Training for Multimodal Large Language Models

Reinforcement learning (RL) with verifiable rewards (RLVR) has demonstrated the great potential of enhancing the reasoning abilities in multimodal large language models (MLLMs). However, the reliance on language-centric priors and expensive…

Computer Vision and Pattern Recognition · Computer Science 2026-04-23 Jiahao Xie , Alessio Tonioni , Nathalie Rauschmayr , Federico Tombari , Bernt Schiele

Beyond Where to Look: Trajectory-Guided Reinforcement Learning for Multimodal RLVR

Recent advances in Reinforcement Learning with Verifiable Rewards (RLVR) for multimodal large language models (MLLMs) have mainly focused on improving final answer correctness and strengthening visual grounding. However, a critical…

Computer Vision and Pattern Recognition · Computer Science 2026-03-30 Jinda Lu , Junkang Wu , Jinghan Li , Kexin Huang , Shuo Yang , Mingzhu Chen , Jiancan Wu , Kuien Liu , Xiang Wang

VTool-R1: VLMs Learn to Think with Images via Reinforcement Learning on Multimodal Tool Use

Reinforcement Learning Finetuning (RFT) has significantly advanced the reasoning capabilities of large language models (LLMs) by enabling long chains of thought, self-correction, and effective tool use. While recent works attempt to extend…

Machine Learning · Computer Science 2026-03-06 Mingyuan Wu , Jingcheng Yang , Jize Jiang , Meitang Li , Kaizhuo Yan , Hanchao Yu , Minjia Zhang , Chengxiang Zhai , Klara Nahrstedt

ThinkRL-Edit: Thinking in Reinforcement Learning for Reasoning-Centric Image Editing

Instruction-driven image editing with unified multimodal generative models has advanced rapidly, yet their underlying visual reasoning remains limited, leading to suboptimal performance on reasoning-centric edits. Reinforcement learning…

Computer Vision and Pattern Recognition · Computer Science 2026-02-27 Hengjia Li , Liming Jiang , Qing Yan , Yizhi Song , Hao Kang , Zichuan Liu , Xin Lu , Boxi Wu , Deng Cai

Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning

Large Language Models (LLMs) often struggle with problems that require multi-step reasoning. For small-scale open-source models, Reinforcement Learning with Verifiable Rewards (RLVR) fails when correct solutions are rarely sampled even…

Computation and Language · Computer Science 2026-03-02 Yihe Deng , I-Hung Hsu , Jun Yan , Zifeng Wang , Rujun Han , Gufeng Zhang , Yanfei Chen , Wei Wang , Tomas Pfister , Chen-Yu Lee

SuperRL: Reinforcement Learning with Supervision to Boost Language Model Reasoning

Large language models are increasingly used for complex reasoning tasks where high-quality offline data such as expert-annotated solutions and distilled reasoning traces are often available. However, in environments with sparse rewards,…

Artificial Intelligence · Computer Science 2025-08-11 Yihao Liu , Shuocheng Li , Lang Cao , Yuhang Xie , Mengyu Zhou , Haoyu Dong , Xiaojun Ma , Shi Han , Dongmei Zhang

Reinforced Visual Perception with Tools

Visual reasoning, a cornerstone of human intelligence, encompasses complex perceptual and logical processes essential for solving diverse visual problems. While advances in computer vision have produced powerful models for various…

Computer Vision and Pattern Recognition · Computer Science 2025-09-03 Zetong Zhou , Dongping Chen , Zixian Ma , Zhihan Hu , Mingyang Fu , Sinan Wang , Yao Wan , Zhou Zhao , Ranjay Krishna

Reinforced MLLM: A Survey on RL-Based Reasoning in Multimodal Large Language Models

The application of reinforcement learning (RL) to enhance the reasoning capabilities of Multimodal Large Language Models (MLLMs) constitutes a rapidly advancing research area. While MLLMs extend Large Language Models (LLMs) to handle…

Artificial Intelligence · Computer Science 2025-05-22 Guanghao Zhou , Panjia Qiu , Cen Chen , Jie Wang , Zheming Yang , Jian Xu , Minghui Qiu

Perception Before Reasoning: Two-Stage Reinforcement Learning for Visual Reasoning in Vision-Language Models

Reinforcement learning (RL) has proven highly effective in eliciting the reasoning capabilities of large language models (LLMs). Inspired by this success, recent studies have explored applying similar techniques to vision-language models…

Computer Vision and Pattern Recognition · Computer Science 2025-10-20 Yan Chen , Long Li , Teng Xi , Long Zeng , Jingdong Wang

ToolRL: Reward is All Tool Learning Needs

Current Large Language Models (LLMs) often undergo supervised fine-tuning (SFT) to acquire tool use capabilities. However, SFT struggles to generalize to unfamiliar or complex tool use scenarios. Recent advancements in reinforcement…

Machine Learning · Computer Science 2025-04-22 Cheng Qian , Emre Can Acikgoz , Qi He , Hongru Wang , Xiusi Chen , Dilek Hakkani-Tür , Gokhan Tur , Heng Ji

TaxonRL: Reinforcement Learning with Intermediate Rewards for Interpretable Fine-Grained Visual Reasoning

Traditional vision-language models struggle with contrastive fine-grained taxonomic reasoning, particularly when distinguishing between visually similar species within the same genus or family. We introduce TaxonRL, a reinforcement learning…

Computer Vision and Pattern Recognition · Computer Science 2026-03-05 Maximilian von Klinski , Maximilian Schall

Visual Reinforcement Learning with Self-Supervised 3D Representations

A prominent approach to visual Reinforcement Learning (RL) is to learn an internal state representation using self-supervised methods, which has the potential benefit of improved sample-efficiency and generalization through additional…

Machine Learning · Computer Science 2023-03-16 Yanjie Ze , Nicklas Hansen , Yinbo Chen , Mohit Jain , Xiaolong Wang

Tool-Star: Empowering LLM-Brained Multi-Tool Reasoner via Reinforcement Learning

Recently, large language models (LLMs) have shown remarkable reasoning capabilities via large-scale reinforcement learning (RL). However, leveraging the RL algorithm to empower effective multi-tool collaborative reasoning in LLMs remains an…

Computation and Language · Computer Science 2025-05-23 Guanting Dong , Yifei Chen , Xiaoxi Li , Jiajie Jin , Hongjin Qian , Yutao Zhu , Hangyu Mao , Guorui Zhou , Zhicheng Dou , Ji-Rong Wen