English
Related papers

Related papers: Supporting Very Large Models using Automatic Dataf…

200 papers

Many state-of-the-art Deep Neural Networks (DNNs) have substantial memory requirements. Limited device memory becomes a bottleneck when training those models. We propose ParDNN, an automatic, generic, and non-intrusive partitioning strategy…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-05-06 Fareed Qararyah , Mohamed Wahib , Doğa Dikbayır , Mehmet Esat Belviranli , Didem Unat

TensorFlow is a machine learning system that operates at large scale and in heterogeneous environments. TensorFlow uses dataflow graphs to represent computation, shared state, and the operations that mutate that state. It maps the nodes of…

Vertical federated learning (VFL) is a distributed learning paradigm, where computing clients collectively train a model based on the partial features of the same set of samples they possess. Current research on VFL focuses on the case when…

Machine Learning · Computer Science 2023-03-17 Xinwei Zhang , Mingyi Hong , Jie Chen

We introduce a learning-based framework to optimize tensor programs for deep learning workloads. Efficient implementations of tensor operators, such as matrix multiplication and high dimensional convolution, are key enablers of effective…

Machine Learning · Computer Science 2019-01-10 Tianqi Chen , Lianmin Zheng , Eddie Yan , Ziheng Jiang , Thierry Moreau , Luis Ceze , Carlos Guestrin , Arvind Krishnamurthy

Partitioning graphs into blocks of roughly equal size is widely used when processing large graphs. Currently there is a gap in the space of available partitioning algorithms. On the one hand, there are streaming algorithms that have been…

Data Structures and Algorithms · Computer Science 2021-12-23 Marcelo Fonseca Faraj , Christian Schulz

Precise hardware performance models play a crucial role in code optimizations. They can assist compilers in making heuristic decisions or aid autotuners in identifying the optimal configuration for a given program. For example, the…

Distributed systems can be found in various applications, e.g., in robotics or autonomous driving, to achieve higher flexibility and robustness. Thereby, data flow centric applications such as Deep Neural Network (DNN) inference benefit…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-10-14 Fabian Kreß , El Mahdi El Annabi , Tim Hotfilter , Julian Hoefer , Tanja Harbaum , Juergen Becker

State-of-the-art data flow systems such as TensorFlow impose iterative calculations on large graphs that need to be partitioned on heterogeneous devices such as CPUs, GPUs, and TPUs. However, partitioning can not be viewed in isolation.…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-11-07 Ruben Mayer , Christian Mayer , Larissa Laich

To respond to the need of efficient training and inference of deep neural networks, a plethora of domain-specific hardware architectures have been introduced, such as Google Tensor Processing Units and NVIDIA Tensor Cores. A common feature…

Data Structures and Algorithms · Computer Science 2020-07-10 Rezaul Chowdhury , Francesco Silvestri , Flavio Vella

While accelerators such as GPUs have limited memory, deep neural networks are becoming larger and will not fit with the memory limitation of accelerators for training. We propose an approach to tackle this problem by rewriting the…

Machine Learning · Computer Science 2019-10-03 Tung D. Le , Haruki Imai , Yasushi Negishi , Kiyokuni Kawachiya

A computational graph in a deep neural network (DNN) denotes a specific data flow diagram (DFD) composed of many tensors and operators. Existing toolkits for visualizing computational graphs are not applicable when the structure is highly…

Human-Computer Interaction · Computer Science 2023-01-02 Rusheng Pan , Zhiyong Wang , Yating Wei , Han Gao , Gongchang Ou , Caleb Chen Cao , Jingli Xu , Tong Xu , Wei Chen

Distributed computing excels at processing large scale data, but the communication cost for synchronizing the shared parameters may slow down the overall performance. Fortunately, the interactions between parameter and data in many problems…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-05-19 Mu Li , Dave G. Andersen , Alexander J. Smola

In this paper, we propose Revolver, a parallel graph partitioning algorithm capable of partitioning large-scale graphs on a single shared-memory machine. Revolver employs an asynchronous processing framework, which leverages reinforcement…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-07-18 Mohammad Hasanzadeh Mofrad , Rami Melhem , Mohammad Hammoud

Modern machine learning workloads use large models, with complex structures, that are very expensive to execute. The devices that execute complex models are becoming increasingly heterogeneous as we see a flourishing of domain-specific…

Machine Learning · Computer Science 2020-11-02 Jakub Tarnawski , Amar Phanishayee , Nikhil R. Devanur , Divya Mahajan , Fanny Nina Paravecino

We present TeraPart, a memory-efficient multilevel graph partitioning method that is designed to scale to extremely large graphs. In balanced graph partitioning, the goal is to divide the vertices into $k$ blocks with balanced size while…

Data Structures and Algorithms · Computer Science 2024-10-28 Daniel Salwasser , Daniel Seemaier , Lars Gottesbüren , Peter Sanders

We present graph partition neural networks (GPNN), an extension of graph neural networks (GNNs) able to handle extremely large graphs. GPNNs alternate between locally propagating information between nodes in small subgraphs and globally…

Machine Learning · Computer Science 2018-03-19 Renjie Liao , Marc Brockschmidt , Daniel Tarlow , Alexander L. Gaunt , Raquel Urtasun , Richard Zemel

Training Graph Neural Networks (GNNs) on real-world graphs consisting of billions of nodes and edges is quite challenging, primarily due to the substantial memory needed to store the graph and its intermediate node and edge features, and…

Machine Learning · Computer Science 2023-08-08 Kaidi Cao , Rui Deng , Shirley Wu , Edward W Huang , Karthik Subbian , Jure Leskovec

We address the computational barrier of deploying advanced deep learning segmentation models in clinical settings by studying the efficacy of network compression through tensor decomposition. We propose a post-training Tucker factorization…

Image and Video Processing · Electrical Eng. & Systems 2024-04-19 Tobias Weber , Jakob Dexl , David Rügamer , Michael Ingrisch

Analyzing large graph data is an essential part of many modern applications, such as social networks. Due to its large computational complexity, distributed processing is frequently employed. This requires graph data to be divided across…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-09-12 YoungJoon Park , DongKyu Lee , Tien-Cuong Bui

Batch-splitting (data-parallelism) is the dominant distributed Deep Neural Network (DNN) training strategy, due to its universal applicability and its amenability to Single-Program-Multiple-Data (SPMD) programming. However, batch-splitting…

‹ Prev 1 2 3 10 Next ›