Related papers: Supporting Very Large Models using Automatic Dataf…

A Computational-Graph Partitioning Method for Training Memory-Constrained DNNs

Many state-of-the-art Deep Neural Networks (DNNs) have substantial memory requirements. Limited device memory becomes a bottleneck when training those models. We propose ParDNN, an automatic, generic, and non-intrusive partitioning strategy…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-05-06 Fareed Qararyah , Mohamed Wahib , Doğa Dikbayır , Mehmet Esat Belviranli , Didem Unat

TensorFlow: A system for large-scale machine learning

TensorFlow is a machine learning system that operates at large scale and in heterogeneous environments. TensorFlow uses dataflow graphs to represent computation, shared state, and the operations that mutate that state. It maps the nodes of…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-06-01 Martín Abadi , Paul Barham , Jianmin Chen , Zhifeng Chen , Andy Davis , Jeffrey Dean , Matthieu Devin , Sanjay Ghemawat , Geoffrey Irving , Michael Isard , Manjunath Kudlur , Josh Levenberg , Rajat Monga , Sherry Moore , Derek G. Murray , Benoit Steiner , Paul Tucker , Vijay Vasudevan , Pete Warden , Martin Wicke , Yuan Yu , Xiaoqiang Zheng

GLASU: A Communication-Efficient Algorithm for Federated Learning with Vertically Distributed Graph Data

Vertical federated learning (VFL) is a distributed learning paradigm, where computing clients collectively train a model based on the partial features of the same set of samples they possess. Current research on VFL focuses on the case when…

Machine Learning · Computer Science 2023-03-17 Xinwei Zhang , Mingyi Hong , Jie Chen

Learning to Optimize Tensor Programs

We introduce a learning-based framework to optimize tensor programs for deep learning workloads. Efficient implementations of tensor operators, such as matrix multiplication and high dimensional convolution, are key enablers of effective…

Machine Learning · Computer Science 2019-01-10 Tianqi Chen , Lianmin Zheng , Eddie Yan , Ziheng Jiang , Thierry Moreau , Luis Ceze , Carlos Guestrin , Arvind Krishnamurthy

Buffered Streaming Graph Partitioning

Partitioning graphs into blocks of roughly equal size is widely used when processing large graphs. Currently there is a gap in the space of available partitioning algorithms. On the one hand, there are streaming algorithms that have been…

Data Structures and Algorithms · Computer Science 2021-12-23 Marcelo Fonseca Faraj , Christian Schulz

TpuGraphs: A Performance Prediction Dataset on Large Tensor Computational Graphs

Precise hardware performance models play a crucial role in code optimizations. They can assist compilers in making heuristic decisions or aid autotuners in identifying the optimal configuration for a given program. For example, the…

Machine Learning · Computer Science 2023-12-07 Phitchaya Mangpo Phothilimthana , Sami Abu-El-Haija , Kaidi Cao , Bahare Fatemi , Mike Burrows , Charith Mendis , Bryan Perozzi

Automated Deep Neural Network Inference Partitioning for Distributed Embedded Systems

Distributed systems can be found in various applications, e.g., in robotics or autonomous driving, to achieve higher flexibility and robustness. Thereby, data flow centric applications such as Deep Neural Network (DNN) inference benefit…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-10-14 Fabian Kreß , El Mahdi El Annabi , Tim Hotfilter , Julian Hoefer , Tanja Harbaum , Juergen Becker

The TensorFlow Partitioning and Scheduling Problem: It's the Critical Path!

State-of-the-art data flow systems such as TensorFlow impose iterative calculations on large graphs that need to be partitioned on heterogeneous devices such as CPUs, GPUs, and TPUs. However, partitioning can not be viewed in isolation.…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-11-07 Ruben Mayer , Christian Mayer , Larissa Laich

A Computational Model for Tensor Core Units

To respond to the need of efficient training and inference of deep neural networks, a plethora of domain-specific hardware architectures have been introduced, such as Google Tensor Processing Units and NVIDIA Tensor Cores. A common feature…

Data Structures and Algorithms · Computer Science 2020-07-10 Rezaul Chowdhury , Francesco Silvestri , Flavio Vella

TFLMS: Large Model Support in TensorFlow by Graph Rewriting

While accelerators such as GPUs have limited memory, deep neural networks are becoming larger and will not fit with the memory limitation of accelerators for training. We propose an approach to tackle this problem by rewriting the…

Machine Learning · Computer Science 2019-10-03 Tung D. Le , Haruki Imai , Yasushi Negishi , Kiyokuni Kawachiya

Towards Efficient Visual Simplification of Computational Graphs in Deep Neural Networks

A computational graph in a deep neural network (DNN) denotes a specific data flow diagram (DFD) composed of many tensors and operators. Existing toolkits for visualizing computational graphs are not applicable when the structure is highly…

Human-Computer Interaction · Computer Science 2023-01-02 Rusheng Pan , Zhiyong Wang , Yating Wei , Han Gao , Gongchang Ou , Caleb Chen Cao , Jingli Xu , Tong Xu , Wei Chen

Graph Partitioning via Parallel Submodular Approximation to Accelerate Distributed Machine Learning

Distributed computing excels at processing large scale data, but the communication cost for synchronizing the shared parameters may slow down the overall performance. Fortunately, the interactions between parameter and data in many problems…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-05-19 Mu Li , Dave G. Andersen , Alexander J. Smola

Partitioning Graphs for the Cloud using Reinforcement Learning

In this paper, we propose Revolver, a parallel graph partitioning algorithm capable of partitioning large-scale graphs on a single shared-memory machine. Revolver employs an asynchronous processing framework, which leverages reinforcement…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-07-18 Mohammad Hasanzadeh Mofrad , Rami Melhem , Mohammad Hammoud

Efficient Algorithms for Device Placement of DNN Graph Operators

Modern machine learning workloads use large models, with complex structures, that are very expensive to execute. The devices that execute complex models are becoming increasingly heterogeneous as we see a flourishing of domain-specific…

Machine Learning · Computer Science 2020-11-02 Jakub Tarnawski , Amar Phanishayee , Nikhil R. Devanur , Divya Mahajan , Fanny Nina Paravecino

Tera-Scale Multilevel Graph Partitioning

We present TeraPart, a memory-efficient multilevel graph partitioning method that is designed to scale to extremely large graphs. In balanced graph partitioning, the goal is to divide the vertices into $k$ blocks with balanced size while…

Data Structures and Algorithms · Computer Science 2024-10-28 Daniel Salwasser , Daniel Seemaier , Lars Gottesbüren , Peter Sanders

Graph Partition Neural Networks for Semi-Supervised Classification

We present graph partition neural networks (GPNN), an extension of graph neural networks (GNNs) able to handle extremely large graphs. GPNNs alternate between locally propagating information between nodes in small subgraphs and globally…

Machine Learning · Computer Science 2018-03-19 Renjie Liao , Marc Brockschmidt , Daniel Tarlow , Alexander L. Gaunt , Raquel Urtasun , Richard Zemel

Communication-Free Distributed GNN Training with Vertex Cut

Training Graph Neural Networks (GNNs) on real-world graphs consisting of billions of nodes and edges is quite challenging, primarily due to the substantial memory needed to store the graph and its intermediate node and edge features, and…

Machine Learning · Computer Science 2023-08-08 Kaidi Cao , Rui Deng , Shirley Wu , Edward W Huang , Karthik Subbian , Jure Leskovec

Post-Training Network Compression for 3D Medical Image Segmentation: Reducing Computational Efforts via Tucker Decomposition

We address the computational barrier of deploying advanced deep learning segmentation models in clinical settings by studying the efficacy of network compression through tensor decomposition. We propose a post-training Tucker factorization…

Image and Video Processing · Electrical Eng. & Systems 2024-04-19 Tobias Weber , Jakob Dexl , David Rügamer , Michael Ingrisch

Machine Learning-based Selection of Graph Partitioning Strategy Using the Characteristics of Graph Data and Algorithm

Analyzing large graph data is an essential part of many modern applications, such as social networks. Due to its large computational complexity, distributed processing is frequently employed. This requires graph data to be divided across…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-09-12 YoungJoon Park , DongKyu Lee , Tien-Cuong Bui

Mesh-TensorFlow: Deep Learning for Supercomputers

Batch-splitting (data-parallelism) is the dominant distributed Deep Neural Network (DNN) training strategy, due to its universal applicability and its amenability to Single-Program-Multiple-Data (SPMD) programming. However, batch-splitting…

Machine Learning · Computer Science 2018-11-07 Noam Shazeer , Youlong Cheng , Niki Parmar , Dustin Tran , Ashish Vaswani , Penporn Koanantakool , Peter Hawkins , HyoukJoong Lee , Mingsheng Hong , Cliff Young , Ryan Sepassi , Blake Hechtman