English
Related papers

Related papers: Visual Reasoning through Tool-supervised Reinforce…

200 papers

Current image-based reinforcement learning (RL) algorithms typically operate on the whole image without performing object-level reasoning. This leads to inefficient goal sampling and ineffective reward functions. In this paper, we improve…

Machine Learning · Computer Science 2020-11-16 Yufei Wang , Gautham Narayan Narasimhan , Xingyu Lin , Brian Okorn , David Held

We introduce ToRL (Tool-Integrated Reinforcement Learning), a framework for training large language models (LLMs) to autonomously use computational tools via reinforcement learning. Unlike supervised fine-tuning, ToRL allows models to…

Computation and Language · Computer Science 2025-04-01 Xuefeng Li , Haoyang Zou , Pengfei Liu

Vision-language models (VLMs) have shown remarkable abilities by integrating large language models with visual inputs. However, they often fail to utilize visual evidence adequately, either depending on linguistic priors in vision-centric…

Computer Vision and Pattern Recognition · Computer Science 2026-05-19 Xiaojun Guo , Runyu Zhou , Yifei Wang , Qi Zhang , Chenheng Zhang , Stefanie Jegelka , Xiaohan Wang , Jiajun Chai , Guojun Yin , Wei Lin , Yisen Wang

Visual understanding is inherently intention-driven - humans selectively focus on different regions of a scene based on their goals. Recent advances in large multimodal models (LMMs) enable flexible expression of such intentions through…

Computer Vision and Pattern Recognition · Computer Science 2025-04-02 Zhangquan Chen , Xufang Luo , Dongsheng Li

Reinforcement Learning is a mature technology, often suggested as a potential route towards Artificial General Intelligence, with the ambitious goal of replicating the wide range of abilities found in natural and artificial intelligence,…

Machine Learning · Computer Science 2025-11-25 Markus D. Solbach , John K. Tsotsos

Reinforcement learning (RL) has emerged as a promising approach for eliciting reasoning chains before generating final answers. However, multimodal large language models (MLLMs) generate reasoning that lacks integration of visual…

Computer Vision and Pattern Recognition · Computer Science 2026-01-05 Omar Sharif , Eftekhar Hossain , Patrick Ng

While reinforcement learning (RL) over chains of thought has significantly advanced language models in tasks such as mathematics and coding, visual reasoning introduces added complexity by requiring models to direct visual attention,…

Computer Vision and Pattern Recognition · Computer Science 2026-05-18 Gabriel Sarch , Snigdha Saha , Naitik Khandelwal , Ayush Jain , Michael J. Tarr , Aviral Kumar , Katerina Fragkiadaki

Reinforcement learning (RL) with verifiable rewards (RLVR) has demonstrated the great potential of enhancing the reasoning abilities in multimodal large language models (MLLMs). However, the reliance on language-centric priors and expensive…

Computer Vision and Pattern Recognition · Computer Science 2026-04-23 Jiahao Xie , Alessio Tonioni , Nathalie Rauschmayr , Federico Tombari , Bernt Schiele

Recent advances in Reinforcement Learning with Verifiable Rewards (RLVR) for multimodal large language models (MLLMs) have mainly focused on improving final answer correctness and strengthening visual grounding. However, a critical…

Computer Vision and Pattern Recognition · Computer Science 2026-03-30 Jinda Lu , Junkang Wu , Jinghan Li , Kexin Huang , Shuo Yang , Mingzhu Chen , Jiancan Wu , Kuien Liu , Xiang Wang

Reinforcement Learning Finetuning (RFT) has significantly advanced the reasoning capabilities of large language models (LLMs) by enabling long chains of thought, self-correction, and effective tool use. While recent works attempt to extend…

Machine Learning · Computer Science 2026-03-06 Mingyuan Wu , Jingcheng Yang , Jize Jiang , Meitang Li , Kaizhuo Yan , Hanchao Yu , Minjia Zhang , Chengxiang Zhai , Klara Nahrstedt

Instruction-driven image editing with unified multimodal generative models has advanced rapidly, yet their underlying visual reasoning remains limited, leading to suboptimal performance on reasoning-centric edits. Reinforcement learning…

Computer Vision and Pattern Recognition · Computer Science 2026-02-27 Hengjia Li , Liming Jiang , Qing Yan , Yizhi Song , Hao Kang , Zichuan Liu , Xin Lu , Boxi Wu , Deng Cai

Large Language Models (LLMs) often struggle with problems that require multi-step reasoning. For small-scale open-source models, Reinforcement Learning with Verifiable Rewards (RLVR) fails when correct solutions are rarely sampled even…

Computation and Language · Computer Science 2026-03-02 Yihe Deng , I-Hung Hsu , Jun Yan , Zifeng Wang , Rujun Han , Gufeng Zhang , Yanfei Chen , Wei Wang , Tomas Pfister , Chen-Yu Lee

Large language models are increasingly used for complex reasoning tasks where high-quality offline data such as expert-annotated solutions and distilled reasoning traces are often available. However, in environments with sparse rewards,…

Artificial Intelligence · Computer Science 2025-08-11 Yihao Liu , Shuocheng Li , Lang Cao , Yuhang Xie , Mengyu Zhou , Haoyu Dong , Xiaojun Ma , Shi Han , Dongmei Zhang

Visual reasoning, a cornerstone of human intelligence, encompasses complex perceptual and logical processes essential for solving diverse visual problems. While advances in computer vision have produced powerful models for various…

Computer Vision and Pattern Recognition · Computer Science 2025-09-03 Zetong Zhou , Dongping Chen , Zixian Ma , Zhihan Hu , Mingyang Fu , Sinan Wang , Yao Wan , Zhou Zhao , Ranjay Krishna

The application of reinforcement learning (RL) to enhance the reasoning capabilities of Multimodal Large Language Models (MLLMs) constitutes a rapidly advancing research area. While MLLMs extend Large Language Models (LLMs) to handle…

Artificial Intelligence · Computer Science 2025-05-22 Guanghao Zhou , Panjia Qiu , Cen Chen , Jie Wang , Zheming Yang , Jian Xu , Minghui Qiu

Reinforcement learning (RL) has proven highly effective in eliciting the reasoning capabilities of large language models (LLMs). Inspired by this success, recent studies have explored applying similar techniques to vision-language models…

Computer Vision and Pattern Recognition · Computer Science 2025-10-20 Yan Chen , Long Li , Teng Xi , Long Zeng , Jingdong Wang

Current Large Language Models (LLMs) often undergo supervised fine-tuning (SFT) to acquire tool use capabilities. However, SFT struggles to generalize to unfamiliar or complex tool use scenarios. Recent advancements in reinforcement…

Machine Learning · Computer Science 2025-04-22 Cheng Qian , Emre Can Acikgoz , Qi He , Hongru Wang , Xiusi Chen , Dilek Hakkani-Tür , Gokhan Tur , Heng Ji

Traditional vision-language models struggle with contrastive fine-grained taxonomic reasoning, particularly when distinguishing between visually similar species within the same genus or family. We introduce TaxonRL, a reinforcement learning…

Computer Vision and Pattern Recognition · Computer Science 2026-03-05 Maximilian von Klinski , Maximilian Schall

A prominent approach to visual Reinforcement Learning (RL) is to learn an internal state representation using self-supervised methods, which has the potential benefit of improved sample-efficiency and generalization through additional…

Machine Learning · Computer Science 2023-03-16 Yanjie Ze , Nicklas Hansen , Yinbo Chen , Mohit Jain , Xiaolong Wang

Recently, large language models (LLMs) have shown remarkable reasoning capabilities via large-scale reinforcement learning (RL). However, leveraging the RL algorithm to empower effective multi-tool collaborative reasoning in LLMs remains an…

Computation and Language · Computer Science 2025-05-23 Guanting Dong , Yifei Chen , Xiaoxi Li , Jiajie Jin , Hongjin Qian , Yutao Zhu , Hangyu Mao , Guorui Zhou , Zhicheng Dou , Ji-Rong Wen
‹ Prev 1 2 3 10 Next ›