English
Related papers

Related papers: Optimizing Data Collection in Deep Reinforcement L…

200 papers

Deep reinforcement learning (RL) has achieved many recent successes, yet experiment turn-around time remains a key bottleneck in research and in practice. We investigate how to optimize existing deep RL algorithms for modern computers,…

Machine Learning · Computer Science 2019-01-14 Adam Stooke , Pieter Abbeel

Most Deep Reinforcement Learning (Deep RL) algorithms require a prohibitively large number of training samples for learning complex tasks. Many recent works on speeding up Deep RL have focused on distributed training and simulation. While…

Robotics · Computer Science 2018-10-25 Jacky Liang , Viktor Makoviychuk , Ankur Handa , Nuttapong Chentanez , Miles Macklin , Dieter Fox

One of the most efficient methods to solve L2-regularized primal problems, such as logistic regression and linear support vector machine (SVM) classification, is the widely used trust region Newton algorithm, TRON. While TRON has recently…

Machine Learning · Computer Science 2020-10-16 John T. Halloran , David M. Rocke

Modern GPUs are able to perform significantly more arithmetic operations than transfers of a single word to or from global memory. Hence, many GPU kernels are limited by memory bandwidth and cannot exploit the arithmetic power of GPUs.…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-09-13 J. Filipovič , M. Madzin , J. Fousek , L. Matyska

We accelerate deep reinforcement learning-based training in visually complex 3D environments by two orders of magnitude over prior work, realizing end-to-end training speeds of over 19,000 frames of experience per second on a single GPU and…

Machine Learning · Computer Science 2021-03-15 Brennan Shacklett , Erik Wijmans , Aleksei Petrenko , Manolis Savva , Dhruv Batra , Vladlen Koltun , Kayvon Fatahalian

GPUs have been favored for training deep learning models due to their highly parallelized architecture. As a result, most studies on training optimization focus on GPUs. There is often a trade-off, however, between cost and efficiency when…

The training process of Deep Neural Network (DNN) is compute-intensive, often taking days to weeks to train a DNN model. Therefore, parallel execution of DNN training on GPUs is a widely adopted approach to speed up the process nowadays.…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-10-29 Chi-Chung Chen , Chia-Lin Yang , Hsiang-Yun Cheng

As recurrent neural networks become larger and deeper, training times for single networks are rising into weeks or even months. As such there is a significant incentive to improve the performance and scalability of these networks. While…

Machine Learning · Computer Science 2016-04-08 Jeremy Appleyard , Tomas Kocisky , Phil Blunsom

Much of the current focus in high-performance computing is on multi-threading, multi-computing, and graphics processing unit (GPU) computing. However, vectorization and non-parallel optimization techniques, which can often be employed…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-05-18 Neil G. Dickson , Kamran Karimi , Firas Hamze

In atomistic spin dynamics simulations, the time cost of constructing the space- and time-displaced pair correlation function in real space increases quadratically as the number of spins $N$, leading to significant computational effort. The…

Computational Physics · Physics 2023-08-16 Hongwei Chen , Shiyang Chen , Joshua J. Turner , Adrian Feiguin

Deploying deep learning (DL) models across multiple compute devices to train large and complex models continues to grow in importance because of the demand for faster and more frequent training. Data parallelism (DP) is the most widely used…

Machine Learning · Computer Science 2022-11-08 Saptadeep Pal , Eiman Ebrahimi , Arslan Zulfiqar , Yaosheng Fu , Victor Zhang , Szymon Migacz , David Nellans , Puneet Gupta

In this work, we propose a computationally efficient algorithm for visual policy learning that leverages differentiable simulation and first-order analytical policy gradients. Our approach decouple the rendering process from the computation…

Machine Learning · Computer Science 2025-11-12 Haoxiang You , Yilang Liu , Ian Abraham

The exponential growth in data has intensified the demand for computational power to train large-scale deep learning models. However, the rapid growth in model size and complexity raises concerns about equal and fair access to computational…

Performance · Computer Science 2026-04-03 Lisan Al Amin , Md Ismail Hossain , Rupak Kumar Das , Mahbubul Islam , Abdulaziz Tabbakh

With the advent of high-performance computing techniques, the data for analysis has grown significantly. Here, graphic processing unit (GPU) based program kernels are discussed to exploit parallelism in the analysis codes specific to…

Computational Physics · Physics 2018-11-07 Gourav Shrivastav , Manish Agarwal

One of the key challenges arising when compilers vectorize loops for today's SIMD-compatible architectures is to decide if vectorization or interleaving is beneficial. Then, the compiler has to determine how many instructions to pack…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-01-07 Ameer Haj-Ali , Nesreen K. Ahmed , Ted Willke , Sophia Shao , Krste Asanovic , Ion Stoica

We propose a generic algorithmic building block to accelerate training of machine learning models on heterogeneous compute systems. Our scheme allows to efficiently employ compute accelerators such as GPUs and FPGAs for the training of…

Machine Learning · Computer Science 2017-11-08 Celestine Dünner , Thomas Parnell , Martin Jaggi

Reinforcement learning with verifiable rewards (RLVR) has recently unlocked strong reasoning capabilities in large language models (LLMs), triggering rapid exploration of new algorithms and data. However, RLVR training is notoriously…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-21 Yiqi Zhang , Fangzheng Jiao , Tian Tang , Boyu Tian , Hangyu Wang , Qiaoling Chen , Guoteng Wang , Zhen Jiang , Peng Sun , Ping Zhang , Xiaohe Hu , Ziming Liu , Menghao Zhang , Yanmin Jia , Yang You , Siyuan Feng

GPU architectural simulation is orders of magnitude slower than native execution, necessitating workload sampling for practical speedups. Existing methods rely on hand-crafted features with limited expressiveness, yielding either aggressive…

Performance · Computer Science 2026-03-03 Jiaqi Wang , Jingwei Sun , Jiyu Luo , Han Li , Guangzhong Sun

Machine learning (ML) compilers are an active area of research because they offer the potential to automatically speedup tensor programs. Kernel fusion is often cited as an important optimization performed by ML compilers. However, there…

Machine Learning · Computer Science 2023-01-31 Daniel Snider , Ruofan Liang

The Kernel Polynomial Method (KPM) is one of the fast diagonalization methods used for simulations of quantum systems in research fields of condensed matter physics and chemistry. The algorithm has a difficulty to be parallelized on a…

Computational Physics · Physics 2011-05-30 Shixun Zhang , Shinichi Yamagiwa , Masahiko Okumura , Seiji Yunoki
‹ Prev 1 2 3 10 Next ›