Related papers: Online Meta-learning by Parallel Algorithm Competi…

Metaoptimization on a Distributed System for Deep Reinforcement Learning

Training intelligent agents through reinforcement learning is a notoriously unstable procedure. Massive parallelization on GPUs and distributed systems has been exploited to generate a large amount of training experiences and consequently…

Machine Learning · Computer Science 2019-02-08 Greg Heinrich , Iuri Frosio

Meta-Gradient Reinforcement Learning

The goal of reinforcement learning algorithms is to estimate and/or optimise the value function. However, unlike supervised learning, no teacher or oracle is available to provide the true value function. Instead, the majority of…

Machine Learning · Computer Science 2018-05-25 Zhongwen Xu , Hado van Hasselt , David Silver

Metatrace Actor-Critic: Online Step-size Tuning by Meta-gradient Descent for Reinforcement Learning Control

Reinforcement learning (RL) has had many successes in both "deep" and "shallow" settings. In both cases, significant hyperparameter tuning is often required to achieve good performance. Furthermore, when nonlinear function approximation is…

Machine Learning · Computer Science 2019-05-27 Kenny Young , Baoxiang Wang , Matthew E. Taylor

Joint Training of Multi-Token Prediction in Reinforcement Learning via Optimal Coefficient Calibration

Reinforcement Learning from Verifiable Rewards (RLVR) has emerged as the standard paradigm for improving reasoning capability of large language models, while Multi-Token Prediction (MTP) has been a widely adopted module in pretraining.…

Machine Learning · Computer Science 2026-05-28 Zili Wang , Jiajun Chai , Lin Chen , Xiaohan Wang , Shiming Xiang , Guojun Yin

A Deep Learning Approach for Joint Video Frame and Reward Prediction in Atari Games

Reinforcement learning is concerned with identifying reward-maximizing behaviour policies in environments that are initially unknown. State-of-the-art reinforcement learning approaches, such as deep Q-networks, are model-free and learn to…

Artificial Intelligence · Computer Science 2017-08-18 Felix Leibfried , Nate Kushman , Katja Hofmann

Meta-Learning Adversarial Bandit Algorithms

We study online meta-learning with bandit feedback, with the goal of improving performance across multiple tasks if they are similar according to some natural similarity measure. As the first to target the adversarial online-within-online…

Machine Learning · Computer Science 2023-11-02 Mikhail Khodak , Ilya Osadchiy , Keegan Harris , Maria-Florina Balcan , Kfir Y. Levy , Ron Meir , Zhiwei Steven Wu

Reinforcement Teaching

Machine learning algorithms learn to solve a task, but are unable to improve their ability to learn. Meta-learning methods learn about machine learning algorithms and improve them so that they learn more quickly. However, existing…

Machine Learning · Computer Science 2025-01-28 Calarina Muslimani , Alex Lewandowski , Dale Schuurmans , Matthew E. Taylor , Jun Luo

Meta-Learning Adversarial Bandits

We study online learning with bandit feedback across multiple tasks, with the goal of improving average performance across tasks if they are similar according to some natural task-similarity measure. As the first to target the adversarial…

Machine Learning · Computer Science 2022-05-30 Maria-Florina Balcan , Keegan Harris , Mikhail Khodak , Zhiwei Steven Wu

Online Meta-Critic Learning for Off-Policy Actor-Critic Methods

Off-Policy Actor-Critic (Off-PAC) methods have proven successful in a variety of continuous control tasks. Normally, the critic's action-value function is updated using temporal-difference, and the critic in turn provides a loss for the…

Machine Learning · Computer Science 2020-11-03 Wei Zhou , Yiying Li , Yongxin Yang , Huaimin Wang , Timothy M. Hospedales

Efficient Parallel Methods for Deep Reinforcement Learning

We propose a novel framework for efficient parallelization of deep reinforcement learning algorithms, enabling these algorithms to learn from multiple actors on a single machine. The framework is algorithm agnostic and can be applied to…

Machine Learning · Computer Science 2017-05-17 Alfredo V. Clemente , Humberto N. Castejón , Arjun Chandra

Simultaneously Learning Stochastic and Adversarial Bandits under the Position-Based Model

Online learning to rank (OLTR) interactively learns to choose lists of items from a large collection based on certain click models that describe users' click behaviors. Most recent works for this problem focus on the stochastic environment…

Machine Learning · Computer Science 2022-07-13 Cheng Chen , Canzhe Zhao , Shuai Li

Offline Meta-Reinforcement Learning with Advantage Weighting

This paper introduces the offline meta-reinforcement learning (offline meta-RL) problem setting and proposes an algorithm that performs well in this setting. Offline meta-RL is analogous to the widely successful supervised learning strategy…

Machine Learning · Computer Science 2021-07-22 Eric Mitchell , Rafael Rafailov , Xue Bin Peng , Sergey Levine , Chelsea Finn

Theoretical Study of Conflict-Avoidant Multi-Objective Reinforcement Learning

Multi-task reinforcement learning (MTRL) has shown great promise in many real-world applications. Existing MTRL algorithms often aim to learn a policy that optimizes individual objective functions simultaneously with a given prior…

Machine Learning · Computer Science 2024-12-24 Yudan Wang , Peiyao Xiao , Hao Ban , Kaiyi Ji , Shaofeng Zou

Online Meta-Learning

A central capability of intelligent systems is the ability to continuously build upon previous experiences to speed up and enhance learning of new tasks. Two distinct research paradigms have studied this question. Meta-learning views this…

Machine Learning · Computer Science 2019-07-05 Chelsea Finn , Aravind Rajeswaran , Sham Kakade , Sergey Levine

Offline Meta-Reinforcement Learning with Online Self-Supervision

Meta-reinforcement learning (RL) methods can meta-train policies that adapt to new tasks with orders of magnitude less data than standard RL, but meta-training itself is costly and time-consuming. If we can meta-train on offline data, then…

Machine Learning · Computer Science 2022-07-08 Vitchyr H. Pong , Ashvin Nair , Laura Smith , Catherine Huang , Sergey Levine

TAPAS: Fast and Automatic Derivation of Tensor Parallel Strategies for Large Neural Networks

Tensor parallelism is an essential technique for distributed training of large neural networks. However, automatically determining an optimal tensor parallel strategy is challenging due to the gigantic search space, which grows…

Machine Learning · Computer Science 2025-08-06 Ziji Shi , Le Jiang , Ang Wang , Jie Zhang , Chencan Wu , Yong Li , Xiaokui Xiao , Wei Lin , Jialin Li

Bootstrapped Meta-Learning

Meta-learning empowers artificial intelligence to increase its efficiency by learning how to learn. Unlocking this potential involves overcoming a challenging meta-optimisation problem. We propose an algorithm that tackles this problem by…

Machine Learning · Computer Science 2022-03-17 Sebastian Flennerhag , Yannick Schroecker , Tom Zahavy , Hado van Hasselt , David Silver , Satinder Singh

FOCAL: Efficient Fully-Offline Meta-Reinforcement Learning via Distance Metric Learning and Behavior Regularization

We study the offline meta-reinforcement learning (OMRL) problem, a paradigm which enables reinforcement learning (RL) algorithms to quickly adapt to unseen tasks without any interactions with the environments, making RL truly practical in…

Machine Learning · Computer Science 2021-05-07 Lanqing Li , Rui Yang , Dijun Luo

How Should We Meta-Learn Reinforcement Learning Algorithms?

The process of meta-learning algorithms from data, instead of relying on manual design, is growing in popularity as a paradigm for improving the performance of machine learning systems. Meta-learning shows particular promise for…

Machine Learning · Computer Science 2025-09-11 Alexander David Goldie , Zilin Wang , Jaron Cohen , Jakob Nicolaus Foerster , Shimon Whiteson

Provably Improved Context-Based Offline Meta-RL with Attention and Contrastive Learning

Meta-learning for offline reinforcement learning (OMRL) is an understudied problem with tremendous potential impact by enabling RL algorithms in many real-world applications. A popular solution to the problem is to infer task identity as…

Machine Learning · Computer Science 2021-10-18 Lanqing Li , Yuanhao Huang , Mingzhe Chen , Siteng Luo , Dijun Luo , Junzhou Huang