Related papers: Test-Time Deep Thinking to Explore Implicit Rules

EXPLORER: Exploration-guided Reasoning for Textual Reinforcement Learning

Text-based games (TBGs) have emerged as an important collection of NLP tasks, requiring reinforcement learning (RL) agents to combine natural language understanding with reasoning. A key challenge for agents attempting to solve such tasks…

Computation and Language · Computer Science 2024-03-19 Kinjal Basu , Keerthiram Murugesan , Subhajit Chaudhury , Murray Campbell , Kartik Talamadupula , Tim Klinger

Thinking Makes LLM Agents Introverted: How Mandatory Thinking Can Backfire in User-Engaged Agents

Eliciting reasoning has emerged as a powerful technique for improving the performance of large language models (LLMs) on complex tasks by inducing thinking. However, their effectiveness in realistic user-engaged agent scenarios remains…

Computation and Language · Computer Science 2026-02-10 Jiatong Li , Changdae Oh , Hyeong Kyu Choi , Jindong Wang , Sharon Li

ThoughtProbe: Classifier-Guided LLM Thought Space Exploration via Probing Representations

This paper introduces ThoughtProbe, a novel inference time framework that leverages the hidden reasoning features of Large Language Models (LLMs) to improve their reasoning performance. Unlike previous works that manipulate the hidden…

Computation and Language · Computer Science 2025-11-03 Zijian Wang , Chang Xu

Effectively Controlling Reasoning Models through Thinking Intervention

Reasoning-enhanced large language models (LLMs) explicitly generate intermediate reasoning steps prior to generating final answers, helping the model excel in complex problem-solving. In this paper, we demonstrate that this emerging…

Machine Learning · Computer Science 2025-05-22 Tong Wu , Chong Xiang , Jiachen T. Wang , G. Edward Suh , Prateek Mittal

Guiding Pretraining in Reinforcement Learning with Large Language Models

Reinforcement learning algorithms typically struggle in the absence of a dense, well-shaped reward function. Intrinsically motivated exploration methods address this limitation by rewarding agents for visiting novel states or transitions,…

Machine Learning · Computer Science 2023-09-18 Yuqing Du , Olivia Watkins , Zihan Wang , Cédric Colas , Trevor Darrell , Pieter Abbeel , Abhishek Gupta , Jacob Andreas

From Implicit Exploration to Structured Reasoning: Leveraging Guideline and Refinement for LLMs

Large language models (LLMs) have advanced general-purpose reasoning, showing strong performance across diverse tasks. However, existing methods often rely on implicit exploration, where the model follows stochastic and unguided reasoning…

Artificial Intelligence · Computer Science 2025-09-09 Jiaxiang Chen , Zhuo Wang , Mingxi Zou , Zhucong Li , Zhijian Zhou , Song Wang , Zenglin Xu

How to Train Your Deep Research Agent? Prompt, Reward, and Policy Optimization in Search-R1

Deep Research agents tackle knowledge-intensive tasks through multi-round retrieval and decision-oriented generation. While reinforcement learning (RL) has been shown to improve performance in this paradigm, its contributions remain…

Computation and Language · Computer Science 2026-02-24 Yinuo Xu , Shuo Lu , Jianjie Cheng , Meng Wang , Qianlong Xie , Xingxing Wang , Ran He , Jian Liang

Thinker: Learning to Think Fast and Slow

Recent studies show that the reasoning capabilities of Large Language Models (LLMs) can be improved by applying Reinforcement Learning (RL) to question-answering (QA) tasks in areas such as math and coding. With a long context length, LLMs…

Computation and Language · Computer Science 2025-10-17 Stephen Chung , Wenyu Du , Jie Fu

ReEXplore: Improving MLLMs for Embodied Exploration with Contextualized Retrospective Experience Replay

Embodied exploration is a target-driven process that requires embodied agents to possess fine-grained perception and knowledge-enhanced decision making. While recent attempts leverage MLLMs for exploration due to their strong perceptual and…

Computer Vision and Pattern Recognition · Computer Science 2025-11-25 Gengyuan Zhang , Mingcong Ding , Jingpei Wu , Ruotong Liao , Volker Tresp

Adaptive Reasoning Executor: A Collaborative Agent System for Efficient Reasoning

Recent advances in Large Language Models (LLMs) demonstrate that chain-of-thought prompting and deep reasoning substantially enhance performance on complex tasks, and multi-agent systems can further improve accuracy by enabling model…

Artificial Intelligence · Computer Science 2025-10-16 Zehui Ling , Deshu Chen , Yichi Zhang , Yuchen Liu , Xigui Li , Xin Guo , Yuan Cheng

Reinforcement Learning on Web Interfaces Using Workflow-Guided Exploration

Reinforcement learning (RL) agents improve through trial-and-error, but when reward is sparse and the agent cannot discover successful action sequences, learning stagnates. This has been a notable problem in training deep RL agents to…

Artificial Intelligence · Computer Science 2018-02-27 Evan Zheran Liu , Kelvin Guu , Panupong Pasupat , Tianlin Shi , Percy Liang

TACLer: Tailored Curriculum Reinforcement Learning for Efficient Reasoning

Large Language Models (LLMs) have shown remarkable performance on complex reasoning tasks, especially when equipped with long chain-of-thought (CoT) reasoning. However, eliciting long CoT typically requires large-scale reinforcement…

Computation and Language · Computer Science 2026-01-30 Huiyuan Lai , Malvina Nissim

RLVMR: Reinforcement Learning with Verifiable Meta-Reasoning Rewards for Robust Long-Horizon Agents

The development of autonomous agents for complex, long-horizon tasks is a central goal in AI. However, dominant training paradigms face a critical limitation: reinforcement learning (RL) methods that optimize solely for final task success…

Machine Learning · Computer Science 2025-07-31 Zijing Zhang , Ziyang Chen , Mingxiao Li , Zhaopeng Tu , Xiaolong Li

Rethinking the Design of Reinforcement Learning-Based Deep Research Agents

Large language models (LLMs) augmented with external tools are increasingly deployed as deep research agents that gather, reason over, and synthesize web information to answer complex queries. Although recent open-source systems achieve…

Artificial Intelligence · Computer Science 2026-02-24 Yi Wan , Jiuqi Wang , Liam Li , Jinsong Liu , Ruihao Zhu , Zheqing Zhu

LaTER: Efficient Test-Time Reasoning via Latent Exploration and Explicit Verification

Chain-of-thought (CoT) reasoning improves large language models (LLMs) on difficult tasks, but it also makes inference expensive because every intermediate step must be generated as a discrete token. Latent reasoning reduces visible token…

Computation and Language · Computer Science 2026-05-11 Xuan Li , Yining Wang , Yuchen Liu , Guanjun Liu , Delai Qiu , Shengping Liu , Jiaen Liang , Wei Huang , Jun Yu , Junnan Zhu

Learning to Explore: Scaling Agentic Reasoning via Exploration-Aware Policy Optimization

Recent advancements in agentic test-time scaling allow models to gather environmental feedback before committing to final actions. A key limitation of existing methods is that they typically employ undifferentiated exploration strategies,…

Artificial Intelligence · Computer Science 2026-05-13 Xingyuan Hua , Sheng Yue , Ju Ren

Think Deep, Think Fast: Investigating Efficiency of Verifier-free Inference-time-scaling Methods

There is intense interest in investigating how inference time compute (ITC) (e.g. repeated sampling, refinements, etc) can improve large language model (LLM) capabilities. At the same time, recent breakthroughs in reasoning models, such as…

Artificial Intelligence · Computer Science 2025-04-22 Junlin Wang , Shang Zhu , Jon Saad-Falcon , Ben Athiwaratkun , Qingyang Wu , Jue Wang , Shuaiwen Leon Song , Ce Zhang , Bhuwan Dhingra , James Zou

RLeXplore: Accelerating Research in Intrinsically-Motivated Reinforcement Learning

Extrinsic rewards can effectively guide reinforcement learning (RL) agents in specific tasks. However, extrinsic rewards frequently fall short in complex environments due to the significant human effort needed for their design and…

Machine Learning · Computer Science 2025-04-28 Mingqi Yuan , Roger Creus Castanyer , Bo Li , Xin Jin , Wenjun Zeng , Glen Berseth

Implicit Reasoning in Large Language Models: A Comprehensive Survey

Large Language Models (LLMs) have demonstrated strong generalization across a wide range of tasks. Reasoning with LLMs is central to solving multi-step problems and complex decision-making. To support efficient reasoning, recent studies…

Computation and Language · Computer Science 2025-09-03 Jindong Li , Yali Fu , Li Fan , Jiahong Liu , Yao Shu , Chengwei Qin , Menglin Yang , Irwin King , Rex Ying

Integrating Large Language Models and Reinforcement Learning for Non-Linear Reasoning

Large Language Models (LLMs) were shown to struggle with long-term planning, which may be caused by the limited way in which they explore the space of possible solutions. We propose an architecture where a Reinforcement Learning (RL) Agent…

Machine Learning · Computer Science 2024-10-18 Yoav Alon , Cristina David