English
Related papers

Related papers: Efficient Exploration for LLMs

200 papers

Efficient exploration is a well known problem in deep reinforcement learning and this problem is exacerbated in multi-agent reinforcement learning due the intrinsic complexities of such algorithms. There are several approaches to…

Artificial Intelligence · Computer Science 2025-07-11 Ashish Kumar

A burgeoning area within reinforcement learning (RL) is the design of sequential decision-making agents centered around large language models (LLMs). While autonomous decision-making agents powered by modern LLMs could facilitate numerous…

Machine Learning · Computer Science 2026-02-10 Dilip Arumugam , Thomas L. Griffiths

We develop an online learning algorithm that dramatically improves the data efficiency of reinforcement learning from human feedback (RLHF). Our algorithm incrementally updates reward and language models as choice data is received. The…

Efficient exploration is an unsolved problem in Reinforcement Learning which is usually addressed by reactively rewarding the agent for fortuitously encountering novel situations. This paper introduces an efficient active exploration…

Machine Learning · Computer Science 2019-06-17 Pranav Shyam , Wojciech Jaśkowski , Faustino Gomez

Reinforcement learning algorithms typically struggle in the absence of a dense, well-shaped reward function. Intrinsically motivated exploration methods address this limitation by rewarding agents for visiting novel states or transitions,…

Machine Learning · Computer Science 2023-09-18 Yuqing Du , Olivia Watkins , Zihan Wang , Cédric Colas , Trevor Darrell , Pieter Abbeel , Abhishek Gupta , Jacob Andreas

Efficient exploration remains a challenging problem in reinforcement learning, especially for those tasks where rewards from environments are sparse. A commonly used approach for exploring such environments is to introduce some "intrinsic"…

Machine Learning · Computer Science 2020-07-16 Neale Ratzlaff , Qinxun Bai , Li Fuxin , Wei Xu

Planning in complex environments requires an agent to efficiently query a world model to find a feasible sequence of actions from start to goal. Recent work has shown that Large Language Models (LLMs), with their rich prior knowledge and…

Artificial Intelligence · Computer Science 2024-12-10 Gonzalo Gonzalez-Pumariega , Wayne Chen , Kushal Kedia , Sanjiban Choudhury

With expansive state-action spaces, efficient multi-agent exploration remains a longstanding challenge in reinforcement learning. Although pursuing novelty, diversity, or uncertainty attracts increasing attention, redundant efforts brought…

Artificial Intelligence · Computer Science 2024-10-04 Yun Qu , Boyuan Wang , Yuhang Jiang , Jianzhun Shao , Yixiu Mao , Cheems Wang , Chang Liu , Xiangyang Ji

Exploration is a crucial skill for in-context reinforcement learning in unknown environments. However, it remains unclear if large language models can effectively explore a partially hidden state space. This work isolates exploration as the…

Machine Learning · Computer Science 2025-08-26 Tim Grams , Patrick Betz , Sascha Marton , Stefan Lüdtke , Christian Bartelt

Designing protocols enhancing cooperation for multi-agent systems remains a grand challenge. Cheap talk, defined as costless, non-binding communication before formal action, serves as a pivotal solution. However, existing theoretical…

Multiagent Systems · Computer Science 2026-03-03 Zhao Song , Chen Shen , Zhen Wang , The Anh Han

Language model alignment (or, reinforcement learning) techniques that leverage active exploration -- deliberately encouraging the model to produce diverse, informative responses -- offer the promise of super-human capabilities. However,…

Machine Learning · Computer Science 2025-03-17 Dylan J. Foster , Zakaria Mhammedi , Dhruv Rohatgi

Recent advances in Large Language Models (LLMs) demonstrate that chain-of-thought prompting and deep reasoning substantially enhance performance on complex tasks, and multi-agent systems can further improve accuracy by enabling model…

Artificial Intelligence · Computer Science 2025-10-16 Zehui Ling , Deshu Chen , Yichi Zhang , Yuchen Liu , Xigui Li , Xin Guo , Yuan Cheng

High sample complexity remains a barrier to the application of reinforcement learning (RL), particularly in multi-agent systems. A large body of work has demonstrated that exploration mechanisms based on the principle of optimism under…

Machine Learning · Computer Science 2021-08-02 Robert Loftin , Aadirupa Saha , Sam Devlin , Katja Hofmann

Recent advancements in agentic test-time scaling allow models to gather environmental feedback before committing to final actions. A key limitation of existing methods is that they typically employ undifferentiated exploration strategies,…

Artificial Intelligence · Computer Science 2026-05-13 Xingyuan Hua , Sheng Yue , Ju Ren

Tool-using agents based on Large Language Models (LLMs) excel in tasks such as mathematical reasoning and multi-hop question answering. However, in long trajectories, agents often trigger excessive and low-quality tool calls, increasing…

Artificial Intelligence · Computer Science 2026-03-25 Zeping Li , Hongru Wang , Yiwen Zhao , Guanhua Chen , Yixia Li , Keyang Chen , Yixin Cao , Guangnan Ye , Hongfeng Chai , Zhenfei Yin

Preference-based feedback is important for many applications in machine learning where evaluation of a reward function is not feasible. Notable recent examples arise in preference alignment for large language models, including in…

Automatically extracting effective queries is challenging in information retrieval, especially in toxic content exploration, as such content is likely to be disguised. With the recent achievements in generative Large Language Model (LLM),…

Information Retrieval · Computer Science 2025-02-27 Shaola Ren , Li Ke , Longtao Huang , Dehong Gao , Hui Xue

Preference learning from human feedback has the ability to align generative models with the needs of end-users. Human feedback is costly and time-consuming to obtain, which creates demand for data-efficient query selection methods. This…

Machine Learning · Computer Science 2026-02-18 Guy Schacht , Ziyad Sheebaelhamd , Riccardo De Santi , Mojmír Mutný , Andreas Krause

Exploration, the act of broadening user experiences beyond their established preferences, is challenging in large-scale recommendation systems due to feedback loops and limited signals on user exploration patterns. Large Language Models…

We evaluate the ability of the current generation of large language models (LLMs) to help a decision-making agent facing an exploration-exploitation tradeoff. While previous work has largely study the ability of LLMs to solve combined…

Machine Learning · Computer Science 2026-02-18 Keegan Harris , Aleksandrs Slivkins
‹ Prev 1 2 3 10 Next ›