Related papers: Accelerating Reinforcement Learning through GPU At…

Accelerated Methods for Deep Reinforcement Learning

Deep reinforcement learning (RL) has achieved many recent successes, yet experiment turn-around time remains a key bottleneck in research and in practice. We investigate how to optimize existing deep RL algorithms for modern computers,…

Machine Learning · Computer Science 2019-01-14 Adam Stooke , Pieter Abbeel

Accelerating Deep Neuroevolution on Distributed FPGAs for Reinforcement Learning Problems

Reinforcement learning augmented by the representational power of deep neural networks, has shown promising results on high-dimensional problems, such as game playing and robotic control. However, the sequential nature of these problems…

Neural and Evolutionary Computing · Computer Science 2021-05-10 Alexis Asseman , Nicolas Antoine , Ahmet S. Ozcan

CALE: Continuous Arcade Learning Environment

We introduce the Continuous Arcade Learning Environment (CALE), an extension of the well-known Arcade Learning Environment (ALE) [Bellemare et al., 2013]. The CALE uses the same underlying emulator of the Atari 2600 gaming system (Stella),…

Machine Learning · Computer Science 2024-11-01 Jesse Farebrother , Pablo Samuel Castro

A Deep Learning Approach for Joint Video Frame and Reward Prediction in Atari Games

Reinforcement learning is concerned with identifying reward-maximizing behaviour policies in environments that are initially unknown. State-of-the-art reinforcement learning approaches, such as deep Q-networks, are model-free and learn to…

Artificial Intelligence · Computer Science 2017-08-18 Felix Leibfried , Nate Kushman , Katja Hofmann

Model-Based Reinforcement Learning for Atari

Model-free reinforcement learning (RL) can be used to learn effective policies for complex tasks, such as Atari games, even from image observations. However, this typically requires very large amounts of interaction -- substantially more,…

Machine Learning · Computer Science 2024-04-04 Lukasz Kaiser , Mohammad Babaeizadeh , Piotr Milos , Blazej Osinski , Roy H Campbell , Konrad Czechowski , Dumitru Erhan , Chelsea Finn , Piotr Kozakowski , Sergey Levine , Afroz Mohiuddin , Ryan Sepassi , George Tucker , Henryk Michalewski

Optimizing Data Collection in Deep Reinforcement Learning

Reinforcement learning (RL) workloads take a notoriously long time to train due to the large number of samples collected at run-time from simulators. Unfortunately, cluster scale-up approaches remain expensive, and commonly used CPU…

Machine Learning · Computer Science 2022-07-19 James Gleeson , Daniel Snider , Yvonne Yang , Moshe Gabel , Eyal de Lara , Gennady Pekhimenko

Efficient Parallel Methods for Deep Reinforcement Learning

We propose a novel framework for efficient parallelization of deep reinforcement learning algorithms, enabling these algorithms to learn from multiple actors on a single machine. The framework is algorithm agnostic and can be applied to…

Machine Learning · Computer Science 2017-05-17 Alfredo V. Clemente , Humberto N. Castejón , Arjun Chandra

Evaluating CUDA Tile for AI Workloads on Hopper and Blackwell GPUs

NVIDIA's CUDA Tile (CuTile) introduces a Python-based, tile-centric abstraction for GPU kernel development that aims to simplify programming while retaining Tensor Core and Tensor Memory Accelerator (TMA) efficiency on modern GPUs. We…

Machine Learning · Computer Science 2026-04-28 Divakar Kumar Yadav , Tian Zhao , Deepak Kumar

The Arcade Learning Environment: An Evaluation Platform for General Agents

In this article we introduce the Arcade Learning Environment (ALE): both a challenge problem and a platform and methodology for evaluating the development of general, domain-independent AI technology. ALE provides an interface to hundreds…

Artificial Intelligence · Computer Science 2013-06-24 Marc G. Bellemare , Yavar Naddaf , Joel Veness , Michael Bowling

Atari games and Intel processors

The asynchronous nature of the state-of-the-art reinforcement learning algorithms such as the Asynchronous Advantage Actor-Critic algorithm, makes them exceptionally suitable for CPU computations. However, given the fact that deep…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-04-17 Robert Adamski , Tomasz Grel , Maciej Klimek , Henryk Michalewski

Agile Reinforcement Learning for Real-Time Task Scheduling in Edge Computing

Soft real-time applications are becoming increasingly complex, posing significant challenges for scheduling offloaded tasks in edge computing environments while meeting task timing constraints. Moreover, the exponential growth of the search…

Machine Learning · Computer Science 2025-06-11 Amin Avan , Akramul Azim , Qusay Mahmoud

GPU-Accelerated Robotic Simulation for Distributed Reinforcement Learning

Most Deep Reinforcement Learning (Deep RL) algorithms require a prohibitively large number of training samples for learning complex tasks. Many recent works on speeding up Deep RL have focused on distributed training and simulation. While…

Robotics · Computer Science 2018-10-25 Jacky Liang , Viktor Makoviychuk , Ankur Handa , Nuttapong Chentanez , Miles Macklin , Dieter Fox

Octax: Accelerated CHIP-8 Arcade Environments for Reinforcement Learning in JAX

Reinforcement learning (RL) research requires diverse, challenging environments that are both tractable and scalable. While modern video games may offer rich dynamics, they are computationally expensive and poorly suited for large-scale…

Machine Learning · Computer Science 2025-10-06 Waris Radji , Thomas Michel , Hector Piteau

CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning

The exponential growth in demand for GPU computing resources has created an urgent need for automated CUDA optimization strategies. While recent advances in LLMs show promise for code generation, current SOTA models achieve low success…

Artificial Intelligence · Computer Science 2026-02-04 Xiaoya Li , Xiaofei Sun , Albert Wang , Jiwei Li , Chris Shum

Playing SNES in the Retro Learning Environment

Mastering a video game requires skill, tactics and strategy. While these attributes may be acquired naturally by human players, teaching them to a computer program is a far more challenging task. In recent years, extensive research was…

Machine Learning · Computer Science 2017-02-08 Nadav Bhonker , Shai Rozenberg , Itay Hubara

Deep Learning and Machine Learning with GPGPU and CUDA: Unlocking the Power of Parallel Computing

General Purpose Graphics Processing Unit (GPGPU) computing plays a transformative role in deep learning and machine learning by leveraging the computational advantages of parallel processing. Through the power of Compute Unified Device…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-11-20 Ming Li , Ziqian Bi , Tianyang Wang , Yizhu Wen , Qian Niu , Xinyuan Song , Zekun Jiang , Junyu Liu , Benji Peng , Sen Zhang , Xuanhe Pan , Jiawei Xu , Jinlang Wang , Keyu Chen , Caitlyn Heqi Yin , Pohsun Feng , Ming Liu

Enabling Compute-Communication Overlap in Distributed Deep Learning Training Platforms

Deep Learning (DL) training platforms are built by interconnecting multiple DL accelerators (e.g., GPU/TPU) via fast, customized interconnects with 100s of gigabytes (GBs) of bandwidth. However, as we identify in this work, driving this…

Hardware Architecture · Computer Science 2022-05-05 Saeed Rashidi , Matthew Denton , Srinivas Sridharan , Sudarshan Srinivasan , Amoghavarsha Suresh , Jade Ni , Tushar Krishna

Parallel $Q$-Learning: Scaling Off-policy Reinforcement Learning under Massively Parallel Simulation

Reinforcement learning is time-consuming for complex tasks due to the need for large amounts of training data. Recent advances in GPU-based simulation, such as Isaac Gym, have sped up data collection thousands of times on a commodity GPU.…

Machine Learning · Computer Science 2023-07-25 Zechu Li , Tao Chen , Zhang-Wei Hong , Anurag Ajay , Pulkit Agrawal

Deep In-GPU Experience Replay

Experience replay allows a reinforcement learning agent to train on samples from a large amount of the most recent experiences. A simple in-RAM experience replay stores these most recent experiences in a list in RAM, and then copies sampled…

Artificial Intelligence · Computer Science 2018-01-11 Ben Parr

CusADi: A GPU Parallelization Framework for Symbolic Expressions and Optimal Control

The parallelism afforded by GPUs presents significant advantages in training controllers through reinforcement learning (RL). However, integrating model-based optimization into this process remains challenging due to the complexity of…

Robotics · Computer Science 2024-08-20 Se Hwan Jeon , Seungwoo Hong , Ho Jae Lee , Charles Khazoom , Sangbae Kim