English
Related papers

Related papers: Device Placement Optimization with Reinforcement L…

200 papers

Computation graphs are Directed Acyclic Graphs (DAGs) where the nodes correspond to mathematical operations and are used widely as abstractions in optimizations of neural networks. The device placement problem aims to identify optimal…

Modern machine learning workloads use large models, with complex structures, that are very expensive to execute. The devices that execute complex models are becoming increasingly heterogeneous as we see a flourishing of domain-specific…

Machine Learning · Computer Science 2020-11-02 Jakub Tarnawski , Amar Phanishayee , Nikhil R. Devanur , Divya Mahajan , Fanny Nina Paravecino

Modern neural networks require long training to reach decent performance on massive datasets. One common approach to speed up training is model parallelization, where large neural networks are split across multiple devices. However,…

Machine Learning · Computer Science 2024-10-28 Tianze Wang , Amir H. Payberah , Desta Haileselassie Hagos , Vladimir Vlassov

Runtime and scalability of large neural networks can be significantly affected by the placement of operations in their dataflow graphs on suitable devices. With increasingly complex neural network architectures and heterogeneous device…

We present a deep reinforcement learning approach to minimizing the execution cost of neural network computation graphs in an optimizing compiler. Unlike earlier learning-based works that require training the optimizer on the same graph to…

Machine Learning · Computer Science 2020-02-11 Aditya Paliwal , Felix Gimeno , Vinod Nair , Yujia Li , Miles Lubin , Pushmeet Kohli , Oriol Vinyals

Training neural network often uses a machine learning framework such as TensorFlow and Caffe2. These frameworks employ a dataflow model where the NN training is modeled as a directed graph composed of a set of nodes. Operations in neural…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-02-20 Jiawen Liu , Dong Li , Gokcen Kestor , Jeffrey Vetter

We present Placeto, a reinforcement learning (RL) approach to efficiently find device placements for distributed neural network training. Unlike prior approaches that only find a device placement for a specific computation graph, Placeto…

Machine Learning · Computer Science 2019-06-24 Ravichandra Addanki , Shaileshh Bojja Venkatakrishnan , Shreyan Gupta , Hongzi Mao , Mohammad Alizadeh

We introduce a learning-based framework to optimize tensor programs for deep learning workloads. Efficient implementations of tensor operators, such as matrix multiplication and high dimensional convolution, are key enablers of effective…

Machine Learning · Computer Science 2019-01-10 Tianqi Chen , Lianmin Zheng , Eddie Yan , Ziheng Jiang , Thierry Moreau , Luis Ceze , Carlos Guestrin , Arvind Krishnamurthy

We study the problem of assigning operations in a dataflow graph to devices to minimize execution time in a work-conserving system, with emphasis on complex machine learning workloads. Prior learning-based methods often struggle due to…

Machine Learning · Computer Science 2025-05-30 Xinyu Yao , Daniel Bourgeois , Abhinav Jain , Yuxin Tang , Jiawen Yao , Zhimin Ding , Arlei Silva , Chris Jermaine

State-of-the-art data flow systems such as TensorFlow impose iterative calculations on large graphs that need to be partitioned on heterogeneous devices such as CPUs, GPUs, and TPUs. However, partitioning can not be viewed in isolation.…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-11-07 Ruben Mayer , Christian Mayer , Larissa Laich

The growing scale of deep learning demands distributed training frameworks that jointly reason about parallelism, memory, and network topology. Prior works often rely on heuristic or topology-agnostic search, handling communication and…

Machine Learning · Computer Science 2026-05-26 Irene Wang , Vishnu Varma Venkata , Arvind Krishnamurthy , Divya Mahajan

We study embedding table placement for distributed recommender systems, which aims to partition and place the tables on multiple hardware devices (e.g., GPUs) to balance the computation and communication costs. Although prior work has…

Machine Learning · Computer Science 2022-10-06 Daochen Zha , Louis Feng , Qiaoyu Tan , Zirui Liu , Kwei-Herng Lai , Bhargav Bhushanam , Yuandong Tian , Arun Kejariwal , Xia Hu

Many recent machine learning models rely on fine-grained dynamic control flow for training and inference. In particular, models based on recurrent neural networks and on reinforcement learning depend on recurrence relations, data-dependent…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-05-09 Yuan Yu , Martín Abadi , Paul Barham , Eugene Brevdo , Mike Burrows , Andy Davis , Jeff Dean , Sanjay Ghemawat , Tim Harley , Peter Hawkins , Michael Isard , Manjunath Kudlur , Rajat Monga , Derek Murray , Xiaoqiang Zheng

Machine Learning graphs (or models) can be challenging or impossible to train when either devices have limited memory, or models are large. To split the model across devices, learning-based approaches are still popular. While these result…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-01-23 Beomyeol Jeon , Linda Cai , Chirag Shetty , Pallavi Srivastava , Jintao Jiang , Xiaolan Ke , Yitao Meng , Cong Xie , Indranil Gupta

The escalating size of Deep Neural Networks (DNNs) has spurred a growing research interest in hosting and serving DNN models across multiple devices. A number of studies have been reported to partition a DNN model across devices, providing…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-12-27 Beibei Zhang , Hongwei Zhu , Feng Gao , Zhihui Yang , Sean Xiaoyang Wang

Placement Optimization is an important problem in systems and chip design, which consists of mapping the nodes of a graph onto a limited set of resources to optimize for an objective, subject to constraints. In this paper, we start by…

Artificial Intelligence · Computer Science 2020-03-20 Anna Goldie , Azalia Mirhoseini

Recent works on machine learning for combinatorial optimization have shown that learning based approaches can outperform heuristic methods in terms of speed and performance. In this paper, we consider the problem of finding an optimal…

With the edge computing becoming an increasingly adopted concept in system architectures, it is expected its utilization will be additionally heightened when combined with deep learning (DL) techniques. The idea behind integrating demanding…

Networking and Internet Architecture · Computer Science 2020-03-12 Mounir Bensalem , Jasenka Dizdarević , Admela Jukan

Chip placement has been one of the most time consuming task in any semi conductor area, Due to this negligence, many projects are pushed and chips availability in real markets get delayed. An engineer placing macros on a chip also needs to…

Machine Learning · Computer Science 2022-05-20 Mrinal Mathur

The Convolutional Neural Network (CNN) model, often used for image classification, requires significant training time to obtain high accuracy. To this end, distributed training is performed with the parameter server (PS) architecture using…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-01-18 Jay H. Park , Sunghwan Kim , Jinwon Lee , Myeongjae Jeon , Sam H. Noh
‹ Prev 1 2 3 10 Next ›