English
Related papers

Related papers: RL-Scope: Cross-Stack Profiling for Deep Reinforce…

200 papers

Reinforcement learning (RL) trains many agents, which is resource-intensive and must scale to large GPU clusters. Different RL training algorithms offer different opportunities for distributing and parallelising the computation. Yet,…

Machine Learning · Computer Science 2022-10-31 Huanzhou Zhu , Bo Zhao , Gang Chen , Weifeng Chen , Yijie Chen , Liang Shi , Yaodong Yang , Peter Pietzuch , Lei Chen

Reinforcement learning (RL) has become central to training large language models (LLMs), yet the field lacks predictive scaling methodologies comparable to those established for pre-training. Despite rapidly rising compute budgets, there is…

In the rapidly evolving field of serverless computing, efficient function scheduling and resource scaling are critical for optimizing performance and cost. This paper presents a comprehensive review of the application of Deep Reinforcement…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-11-23 Amjad Yousef Majid , Eduard Marin

With deep reinforcement learning (RL) methods achieving results that exceed human capabilities in games, robotics, and simulated environments, continued scaling of RL training is crucial to its deployment in solving complex real-world…

Machine Learning · Computer Science 2020-12-09 Ahmet Inci , Evgeny Bolotin , Yaosheng Fu , Gal Dalal , Shie Mannor , David Nellans , Diana Marculescu

Deep reinforcement learning (RL) algorithms are predominantly evaluated by comparing their relative performance on a large suite of tasks. Most published results on deep RL benchmarks compare point estimates of aggregate performance such as…

Machine Learning · Computer Science 2022-01-06 Rishabh Agarwal , Max Schwarzer , Pablo Samuel Castro , Aaron Courville , Marc G. Bellemare

Cloud computing has revolutionized the provisioning of computing resources, offering scalable, flexible, and on-demand services to meet the diverse requirements of modern applications. At the heart of efficient cloud operations are job…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-01-03 Yan Gu , Zhaoze Liu , Shuhong Dai , Cong Liu , Ying Wang , Shen Wang , Georgios Theodoropoulos , Long Cheng

Software testing activities scrutinize the artifacts and the behavior of a software product to find possible defects and ensure that the product meets its expected requirements. Recently, Deep Reinforcement Learning (DRL) has been…

Software Engineering · Computer Science 2023-06-30 Paulina Stevia Nouwou Mindom , Amin Nikanjam , Foutse Khomh

Reinforcement learning (RL) has become the dominant paradigm for improving the performance of language models on complex reasoning tasks. Despite the substantial empirical gains demonstrated by RL-based training methods like GRPO, a…

Artificial Intelligence · Computer Science 2025-10-27 Jiayu Wang , Yifei Ming , Zixuan Ke , Caiming Xiong , Shafiq Joty , Aws Albarghouthi , Frederic Sala

The past few years have seen rapid progress in combining reinforcement learning (RL) with deep learning. Various breakthroughs ranging from games to robotics have spurred the interest in designing sophisticated RL algorithms and systems.…

Machine Learning · Computer Science 2022-11-09 Zhihui Xie , Zichuan Lin , Junyou Li , Shuai Li , Deheng Ye

Reinforcement learning (RL) has become essential for unlocking advanced reasoning capabilities in large language models (LLMs). RL workflows involve interleaving rollout and training stages with fundamentally different resource…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-04-09 Yongji Wu , Xueshen Liu , Haizhong Zheng , Juncheng Gu , Beidi Chen , Z. Morley Mao , Arvind Krishnamurthy , Ion Stoica

This study addresses the challenge of resource scheduling optimization in edge-cloud collaborative computing using deep reinforcement learning (DRL). The proposed DRL-based approach improves task processing efficiency, reduces overall…

Machine Learning · Computer Science 2025-04-30 Yuqing Wang , Xiao Yang

Machine scheduling aims to optimize job assignments to machines while adhering to manufacturing rules and job specifications. This optimization leads to reduced operational costs, improved customer demand fulfillment, and enhanced…

Pretraining is a common technique in deep learning for increasing performance and reducing training time, with promising experimental results in deep reinforcement learning (RL). However, pretraining requires a relevant dataset for…

Machine Learning · Computer Science 2021-10-07 Saurav Kadavath , Samuel Paradis , Brian Yao

There has been a rapid proliferation of machine learning/deep learning (ML) models and wide adoption of them in many application domains. This has made profiling and characterization of ML model performance an increasingly pressing task for…

Machine Learning · Computer Science 2020-06-04 Cheng Li , Abdul Dakkak , Jinjun Xiong , Wei Wei , Lingjie Xu , Wen-mei Hwu

Predictive autoscaling (autoscaling with workload forecasting) is an important mechanism that supports autonomous adjustment of computing resources in accordance with fluctuating workload demands in the Cloud. In recent works, Reinforcement…

Deep reinforcement learning (RL) is a powerful framework to train decision-making models in complex environments. However, RL can be slow as it requires repeated interaction with a simulation of the environment. In particular, there are key…

Machine Learning · Computer Science 2021-10-12 Tian Lan , Sunil Srinivasa , Huan Wang , Stephan Zheng

Offline reinforcement learning algorithms hold the promise of enabling data-driven RL methods that do not require costly or dangerous real-world exploration and benefit from large pre-collected datasets. This in turn can facilitate…

Modern cloud platforms increasingly host large-scale deep learning (DL) workloads, demanding high-throughput, low-latency GPU scheduling. However, the growing heterogeneity of GPU clusters and limited visibility into application…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-12-12 Shruti Dongare , Redwan Ibne Seraj Khan , Hadeel Albahar , Nannan Zhao , Diego Melendez Maita , Ali R. Butt

The promotion of large-scale applications of reinforcement learning (RL) requires efficient training computation. While existing parallel RL frameworks encompass a variety of RL algorithms and parallelization techniques, the excessively…

Machine Learning · Computer Science 2023-12-12 Jing Hou , Guang Chen , Ruiqi Zhang , Zhijun Li , Shangding Gu , Changjun Jiang

Hyperparameter sensitivity in Deep Reinforcement Learning (RL) is often accepted as unavoidable. However, it remains unclear whether it is intrinsic to the RL problem or exacerbated by specific training mechanisms. We investigate this…

Machine Learning · Computer Science 2026-02-06 Jan Malte Töpperwien , Aditya Mohan , Marius Lindauer
‹ Prev 1 2 3 10 Next ›