English
Related papers

Related papers: Value-Based Deep RL Scales Predictably

200 papers

Improving data utilization efficiency is critical for scaling reinforcement learning (RL) for long-horizon tasks where generating trajectories is expensive. However, the dominant RL methods for LLMs are largely on-policy: they update each…

Large language models (LLMs) excel in tasks like question answering and dialogue, but complex tasks requiring interaction, such as negotiation and persuasion, require additional long-horizon reasoning and planning. Reinforcement learning…

Computation and Language · Computer Science 2025-12-04 Joey Hong , Anca Dragan , Sergey Levine

Reinforcement learning (RL) algorithms still suffer from high sample complexity despite outstanding recent successes. The need for intensive interactions with the environment is especially observed in many widely popular policy gradient…

Machine Learning · Computer Science 2020-08-04 Samuele Tosatto , Joao Carvalho , Hany Abdulsamad , Jan Peters

As models grow larger and training them becomes expensive, it becomes increasingly important to scale training recipes not just to larger models and more data, but to do so in a compute-optimal manner that extracts maximal performance per…

Machine Learning · Computer Science 2025-08-26 Preston Fu , Oleh Rybkin , Zhiyuan Zhou , Michal Nauman , Pieter Abbeel , Sergey Levine , Aviral Kumar

Reinforcement learning (RL) approaches for Large Language Models (LLMs) frequently use on-policy algorithms, such as PPO or GRPO. However, policy lag from distributed training architectures and differences between the training and inference…

Machine Learning · Computer Science 2026-03-03 Daniel Ritter , Owen Oertell , Bradley Guo , Jonathan Chang , Kianté Brantley , Wen Sun

Predicting changes from scaling advanced AI systems is a desirable property for engineers, economists, governments and industry alike, and, while a well-established literature exists on how pretraining performance scales, predictable…

The escalating scale and cost of Large Language Models (LLMs) training necessitate accurate pre-training prediction of downstream task performance for comprehensive understanding of scaling properties. This is challenged by: 1) the…

Computation and Language · Computer Science 2026-03-10 Chengyin Xu , Kaiyuan Chen , Xiao Li , Ke Shen , Chenggang Li

Scaling laws are useful guides for derisking expensive training runs, as they predict performance of large models using cheaper, small-scale experiments. However, there remain gaps between current scaling studies and how language models are…

Recent advancements in deep reinforcement learning (RL) have demonstrated notable progress in sample efficiency, spanning both model-based and model-free paradigms. Despite the identification and mitigation of specific bottlenecks in prior…

Machine Learning · Computer Science 2024-04-02 Yibo Wang , Jiang Zhao

Reinforcement learning (RL) has emerged as a promising strategy for finetuning small language models (SLMs) to solve targeted tasks such as math and coding. However, RL algorithms tend to be resource-intensive, taking a significant amount…

Machine Learning · Computer Science 2025-10-07 Lianghuan Huang , Sagnik Anupam , Insup Lee , Shuo Li , Osbert Bastani

Reinforcement learning (RL) has become central to training large language models (LLMs), yet the field lacks predictive scaling methodologies comparable to those established for pre-training. Despite rapidly rising compute budgets, there is…

For deploying foundation models, practitioners increasingly need prescriptive scaling laws: given a pre training compute budget, what downstream accuracy is attainable with contemporary post training practice, and how stable is that mapping…

Machine Learning · Computer Science 2026-02-18 Hanlin Zhang , Jikai Jin , Vasilis Syrgkanis , Sham Kakade

Recent advances in large language models (LLMs) have increasingly relied on reinforcement learning (RL) to improve their reasoning capabilities. Three types of approaches have been widely adopted: The first relies on a deep neural network…

Machine Learning · Computer Science 2026-05-19 Shijin Gong , Kai Ye , Jin Zhu , Xinyu Zhang , Hongyi Zhou , Chengchun Shi

Off-policy Reinforcement Learning (RL) holds the promise of better data efficiency as it allows sample reuse and potentially enables safe interaction with the environment. Current off-policy policy gradient methods either suffer from high…

Machine Learning · Computer Science 2021-06-09 Samuele Tosatto , João Carvalho , Jan Peters

The application of Reinforcement Learning (RL) in real world environments can be expensive or risky due to sub-optimal policies during training. In Offline RL, this problem is avoided since interactions with an environment are prohibited.…

Despite substantial advances in scaling test-time compute, an ongoing debate in the community is how it should be scaled up to enable continued and efficient improvements with scaling. There are largely two approaches: first, distilling…

Machine Learning · Computer Science 2025-02-19 Amrith Setlur , Nived Rajaraman , Sergey Levine , Aviral Kumar

Offline reinforcement learning (RL) provides a powerful framework for training robotic agents using pre-collected, suboptimal datasets, eliminating the need for costly, time-consuming, and potentially hazardous online interactions. This is…

Machine Learning · Computer Science 2025-08-01 Tung M. Luu , Donghoon Lee , Younghwan Lee , Chang D. Yoo

While scaling laws for Large Language Models (LLMs) traditionally focus on proxy metrics like pretraining loss, predicting downstream task performance has been considered unreliable. This paper challenges that view by proposing a direct…

Machine Learning · Computer Science 2025-12-10 Jakub Krajewski , Amitis Shidani , Dan Busbridge , Sam Wiseman , Jason Ramapuram

Policy gradient methods in reinforcement learning update policy parameters by taking steps in the direction of an estimated gradient of policy value. In this paper, we consider the statistically efficient estimation of policy gradients from…

Machine Learning · Statistics 2020-02-21 Nathan Kallus , Masatoshi Uehara

Large Language Models (LLMs) are distinguished by their architecture, which dictates their parameter size and performance capabilities. Social scientists have increasingly adopted LLMs for text classification tasks, which are difficult to…

Computation and Language · Computer Science 2024-11-05 Marcello Carammia , Stefano Maria Iacus , Giuseppe Porro
‹ Prev 1 2 3 10 Next ›