Related papers: Device Placement Optimization with Reinforcement L…

A Structure-Aware Framework for Learning Device Placements on Computation Graphs

Computation graphs are Directed Acyclic Graphs (DAGs) where the nodes correspond to mathematical operations and are used widely as abstractions in optimizations of neural networks. The device placement problem aims to identify optimal…

Machine Learning · Computer Science 2025-01-14 Shukai Duan , Heng Ping , Nikos Kanakaris , Xiongye Xiao , Panagiotis Kyriakis , Nesreen K. Ahmed , Peiyu Zhang , Guixiang Ma , Mihai Capota , Shahin Nazarian , Theodore L. Willke , Paul Bogdan

Efficient Algorithms for Device Placement of DNN Graph Operators

Modern machine learning workloads use large models, with complex structures, that are very expensive to execute. The devices that execute complex models are becoming increasingly heterogeneous as we see a flourishing of domain-specific…

Machine Learning · Computer Science 2020-11-02 Jakub Tarnawski , Amar Phanishayee , Nikhil R. Devanur , Divya Mahajan , Fanny Nina Paravecino

Accelerate Model Parallel Training by Using Efficient Graph Traversal Order in Device Placement

Modern neural networks require long training to reach decent performance on massive datasets. One common approach to speed up training is model parallelization, where large neural networks are split across multiple devices. However,…

Machine Learning · Computer Science 2024-10-28 Tianze Wang , Amir H. Payberah , Desta Haileselassie Hagos , Vladimir Vlassov

GDP: Generalized Device Placement for Dataflow Graphs

Runtime and scalability of large neural networks can be significantly affected by the placement of operations in their dataflow graphs on suitable devices. With increasingly complex neural network architectures and heterogeneous device…

Machine Learning · Computer Science 2019-10-04 Yanqi Zhou , Sudip Roy , Amirali Abdolrashidi , Daniel Wong , Peter C. Ma , Qiumin Xu , Ming Zhong , Hanxiao Liu , Anna Goldie , Azalia Mirhoseini , James Laudon

Reinforced Genetic Algorithm Learning for Optimizing Computation Graphs

We present a deep reinforcement learning approach to minimizing the execution cost of neural network computation graphs in an optimizing compiler. Unlike earlier learning-based works that require training the optimizer on the same graph to…

Machine Learning · Computer Science 2020-02-11 Aditya Paliwal , Felix Gimeno , Vinod Nair , Yujia Li , Miles Lubin , Pushmeet Kohli , Oriol Vinyals

Runtime Concurrency Control and Operation Scheduling for High Performance Neural Network Training

Training neural network often uses a machine learning framework such as TensorFlow and Caffe2. These frameworks employ a dataflow model where the NN training is modeled as a directed graph composed of a set of nodes. Operations in neural…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-02-20 Jiawen Liu , Dong Li , Gokcen Kestor , Jeffrey Vetter

Placeto: Learning Generalizable Device Placement Algorithms for Distributed Machine Learning

We present Placeto, a reinforcement learning (RL) approach to efficiently find device placements for distributed neural network training. Unlike prior approaches that only find a device placement for a specific computation graph, Placeto…

Machine Learning · Computer Science 2019-06-24 Ravichandra Addanki , Shaileshh Bojja Venkatakrishnan , Shreyan Gupta , Hongzi Mao , Mohammad Alizadeh

Learning to Optimize Tensor Programs

We introduce a learning-based framework to optimize tensor programs for deep learning workloads. Efficient implementations of tensor operators, such as matrix multiplication and high dimensional convolution, are key enablers of effective…

Machine Learning · Computer Science 2019-01-10 Tianqi Chen , Lianmin Zheng , Eddie Yan , Ziheng Jiang , Thierry Moreau , Luis Ceze , Carlos Guestrin , Arvind Krishnamurthy

DOPPLER: Dual-Policy Learning for Device Assignment in Asynchronous Dataflow Graphs

We study the problem of assigning operations in a dataflow graph to devices to minimize execution time in a work-conserving system, with emphasis on complex machine learning workloads. Prior learning-based methods often struggle due to…

Machine Learning · Computer Science 2025-05-30 Xinyu Yao , Daniel Bourgeois , Abhinav Jain , Yuxin Tang , Jiawen Yao , Zhimin Ding , Arlei Silva , Chris Jermaine

The TensorFlow Partitioning and Scheduling Problem: It's the Critical Path!

State-of-the-art data flow systems such as TensorFlow impose iterative calculations on large graphs that need to be partitioned on heterogeneous devices such as CPUs, GPUs, and TPUs. However, partitioning can not be viewed in isolation.…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-11-07 Ruben Mayer , Christian Mayer , Larissa Laich

NEST: Network- and Memory-Aware Device Placement For Distributed Deep Learning

The growing scale of deep learning demands distributed training frameworks that jointly reason about parallelism, memory, and network topology. Prior works often rely on heuristic or topology-agnostic search, handling communication and…

Machine Learning · Computer Science 2026-05-26 Irene Wang , Vishnu Varma Venkata , Arvind Krishnamurthy , Divya Mahajan

DreamShard: Generalizable Embedding Table Placement for Recommender Systems

We study embedding table placement for distributed recommender systems, which aims to partition and place the tables on multiple hardware devices (e.g., GPUs) to balance the computation and communication costs. Although prior work has…

Machine Learning · Computer Science 2022-10-06 Daochen Zha , Louis Feng , Qiaoyu Tan , Zirui Liu , Kwei-Herng Lai , Bhargav Bhushanam , Yuandong Tian , Arun Kejariwal , Xia Hu

Dynamic Control Flow in Large-Scale Machine Learning

Many recent machine learning models rely on fine-grained dynamic control flow for training and inference. In particular, models based on recurrent neural networks and on reinforcement learning depend on recurrence relations, data-dependent…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-05-09 Yuan Yu , Martín Abadi , Paul Barham , Eugene Brevdo , Mike Burrows , Andy Davis , Jeff Dean , Sanjay Ghemawat , Tim Harley , Peter Hawkins , Michael Isard , Manjunath Kudlur , Rajat Monga , Derek Murray , Xiaoqiang Zheng

Baechi: Fast Device Placement of Machine Learning Graphs

Machine Learning graphs (or models) can be challenging or impossible to train when either devices have limited memory, or models are large. To split the model across devices, learning-based approaches are still popular. While these result…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-01-23 Beomyeol Jeon , Linda Cai , Chirag Shetty , Pallavi Srivastava , Jintao Jiang , Xiaolan Ke , Yitao Meng , Cong Xie , Indranil Gupta

Moirai: Towards Optimal Placement for Distributed Inference on Heterogeneous Devices

The escalating size of Deep Neural Networks (DNNs) has spurred a growing research interest in hosting and serving DNN models across multiple devices. A number of studies have been reported to partition a DNN model across devices, providing…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-12-27 Beibei Zhang , Hongwei Zhu , Feng Gao , Zhihui Yang , Sean Xiaoyang Wang

Placement Optimization with Deep Reinforcement Learning

Placement Optimization is an important problem in systems and chip design, which consists of mapping the nodes of a graph onto a limited set of resources to optimize for an objective, subject to constraints. In this paper, we start by…

Artificial Intelligence · Computer Science 2020-03-20 Anna Goldie , Azalia Mirhoseini

Neural Topological Ordering for Computation Graphs

Recent works on machine learning for combinatorial optimization have shown that learning based approaches can outperform heuristic methods in terms of speed and performance. In this paper, we consider the problem of finding an optimal…

Machine Learning · Computer Science 2022-10-11 Mukul Gagrani , Corrado Rainone , Yang Yang , Harris Teague , Wonseok Jeon , Herke Van Hoof , Weiliang Will Zeng , Piero Zappi , Christopher Lott , Roberto Bondesan

Modeling of Deep Neural Network (DNN) Placement and Inference in Edge Computing

With the edge computing becoming an increasingly adopted concept in system architectures, it is expected its utilization will be additionally heightened when combined with deep learning (DL) techniques. The idea behind integrating demanding…

Networking and Internet Architecture · Computer Science 2020-03-12 Mounir Bensalem , Jasenka Dizdarević , Admela Jukan

Routing and Placement of Macros using Deep Reinforcement Learning

Chip placement has been one of the most time consuming task in any semi conductor area, Due to this negligence, many projects are pushed and chips availability in real markets get delayed. An engineer placing macros on a chip also needs to…

Machine Learning · Computer Science 2022-05-20 Mrinal Mathur

Accelerated Training for CNN Distributed Deep Learning through Automatic Resource-Aware Layer Placement

The Convolutional Neural Network (CNN) model, often used for image classification, requires significant training time to obtain high accuracy. To this end, distributed training is performed with the parameter server (PS) architecture using…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-01-18 Jay H. Park , Sunghwan Kim , Jinwon Lee , Myeongjae Jeon , Sam H. Noh