Related papers: Accelerating Distributed Deep Learning using Lossl…

Homomorphic Parameter Compression for Distributed Deep Learning Training

Distributed training of deep neural networks has received significant research interest, and its major approaches include implementations on multiple GPUs and clusters. Parallelization can dramatically improve the efficiency of training…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-11-29 Jaehee Jang , Byungook Na , Sungroh Yoon

THC: Accelerating Distributed Deep Learning Using Tensor Homomorphic Compression

Deep neural networks (DNNs) are the de facto standard for essential use cases, such as image classification, computer vision, and natural language processing. As DNNs and datasets get larger, they require distributed training on…

Machine Learning · Computer Science 2024-03-07 Minghao Li , Ran Ben Basat , Shay Vargaftik , ChonLam Lao , Kevin Xu , Michael Mitzenmacher , Minlan Yu

A Novel Memory-Efficient Deep Learning Training Framework via Error-Bounded Lossy Compression

Deep neural networks (DNNs) are becoming increasingly deeper, wider, and non-linear due to the growing demands on prediction accuracy and analysis quality. When training a DNN model, the intermediate activation data must be saved in the…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-11-24 Sian Jin , Guanpeng Li , Shuaiwen Leon Song , Dingwen Tao

"Lossless" Compression of Deep Neural Networks: A High-dimensional Neural Tangent Kernel Approach

Modern deep neural networks (DNNs) are extremely powerful; however, this comes at the price of increased depth and having more parameters per layer, making their training and inference more computationally challenging. In an attempt to…

Machine Learning · Statistics 2024-03-04 Lingyu Gu , Yongqi Du , Yuan Zhang , Di Xie , Shiliang Pu , Robert C. Qiu , Zhenyu Liao

Distributed Deep Learning using Stochastic Gradient Staleness

Despite the notable success of deep neural networks (DNNs) in solving complex tasks, the training process still remains considerable challenges. A primary obstacle is the substantial time required for training, particularly as high…

Machine Learning · Computer Science 2025-09-09 Viet Hoang Pham , Hyo-Sung Ahn

Reducing Data Motion to Accelerate the Training of Deep Neural Networks

This paper reduces the cost of DNNs training by decreasing the amount of data movement across heterogeneous architectures composed of several GPUs and multicore CPU devices. In particular, this paper proposes an algorithm to dynamically…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-04-07 Sicong Zhuang , Cristiano Malossi , Marc Casas

COMET: A Novel Memory-Efficient Deep Learning Training Framework by Using Error-Bounded Lossy Compression

Training wide and deep neural networks (DNNs) require large amounts of storage resources such as memory because the intermediate activation data must be saved in the memory during forward propagation and then restored for backward…

Artificial Intelligence · Computer Science 2021-11-19 Sian Jin , Chengming Zhang , Xintong Jiang , Yunhe Feng , Hui Guan , Guanpeng Li , Shuaiwen Leon Song , Dingwen Tao

From Algorithm to Hardware: A Survey on Efficient and Safe Deployment of Deep Neural Networks

Deep neural networks (DNNs) have been widely used in many artificial intelligence (AI) tasks. However, deploying them brings significant challenges due to the huge cost of memory, energy, and computation. To address these challenges,…

Machine Learning · Computer Science 2024-05-13 Xue Geng , Zhe Wang , Chunyun Chen , Qing Xu , Kaixin Xu , Chao Jin , Manas Gupta , Xulei Yang , Zhenghua Chen , Mohamed M. Sabry Aly , Jie Lin , Min Wu , Xiaoli Li

Asynchronous Optimization Methods for Efficient Training of Deep Neural Networks with Guarantees

Asynchronous distributed algorithms are a popular way to reduce synchronization costs in large-scale optimization, and in particular for neural network training. However, for nonsmooth and nonconvex objectives, few convergence guarantees…

Optimization and Control · Mathematics 2020-07-14 Vyacheslav Kungurtsev , Malcolm Egan , Bapi Chatterjee , Dan Alistarh

DP-Net: Dynamic Programming Guided Deep Neural Network Compression

In this work, we propose an effective scheme (called DP-Net) for compressing the deep neural networks (DNNs). It includes a novel dynamic programming (DP) based algorithm to obtain the optimal solution of weight quantization and an…

Machine Learning · Computer Science 2020-03-24 Dingcheng Yang , Wenjian Yu , Ao Zhou , Haoyuan Mu , Gary Yao , Xiaoyi Wang

Workload-Aware Hardware Accelerator Mining for Distributed Deep Learning Training

In this paper, we present a novel technique to search for hardware architectures of accelerators optimized for end-to-end training of deep neural networks (DNNs). Our approach addresses both single-device and distributed pipeline and tensor…

Hardware Architecture · Computer Science 2024-04-24 Muhammad Adnan , Amar Phanishayee , Janardhan Kulkarni , Prashant J. Nair , Divya Mahajan

Distributed Training of Deep Neural Networks with Theoretical Analysis: Under SSP Setting

We propose a distributed approach to train deep neural networks (DNNs), which has guaranteed convergence theoretically and great scalability empirically: close to 6 times faster on instance of ImageNet data set when run with 6 machines. The…

Machine Learning · Statistics 2016-10-04 Abhimanu Kumar , Pengtao Xie , Junming Yin , Eric P. Xing

Accelerating Communication in Deep Learning Recommendation Model Training with Dual-Level Adaptive Lossy Compression

DLRM is a state-of-the-art recommendation system model that has gained widespread adoption across various industry applications. The large size of DLRM models, however, necessitates the use of multiple devices/GPUs for efficient training. A…

Machine Learning · Computer Science 2024-10-02 Hao Feng , Boyuan Zhang , Fanjiang Ye , Min Si , Ching-Hsiang Chu , Jiannan Tian , Chunxing Yin , Summer Deng , Yuchen Hao , Pavan Balaji , Tong Geng , Dingwen Tao

Efficient Compression of Overparameterized Deep Models through Low-Dimensional Learning Dynamics

Overparameterized models have proven to be powerful tools for solving various machine learning tasks. However, overparameterization often leads to a substantial increase in computational and memory costs, which in turn requires extensive…

Machine Learning · Computer Science 2024-03-13 Soo Min Kwon , Zekai Zhang , Dogyoon Song , Laura Balzano , Qing Qu

A Unified Approximation Framework for Compressing and Accelerating Deep Neural Networks

Deep neural networks (DNNs) have achieved significant success in a variety of real world applications, i.e., image classification. However, tons of parameters in the networks restrict the efficiency of neural networks due to the large model…

Machine Learning · Computer Science 2019-08-21 Yuzhe Ma , Ran Chen , Wei Li , Fanhua Shang , Wenjian Yu , Minsik Cho , Bei Yu

Distributed Optimization using Heterogeneous Compute Systems

Hardware compute power has been growing at an unprecedented rate in recent years. The utilization of such advancements plays a key role in producing better results in less time -- both in academia and industry. However, merging the existing…

Machine Learning · Computer Science 2021-10-19 Vineeth S

Channel-wise Feature Decorrelation for Enhanced Learned Image Compression

The emerging Learned Compression (LC) replaces the traditional codec modules with Deep Neural Networks (DNN), which are trained end-to-end for rate-distortion performance. This approach is considered as the future of image/video…

Image and Video Processing · Electrical Eng. & Systems 2024-07-08 Farhad Pakdaman , Moncef Gabbouj

Accelerating Data Loading in Deep Neural Network Training

Data loading can dominate deep neural network training time on large-scale systems. We present a comprehensive study on accelerating data loading performance in large-scale distributed training. We first identify performance and scalability…

Machine Learning · Computer Science 2020-02-20 Chih-Chieh Yang , Guojing Cong

Variance-based Gradient Compression for Efficient Distributed Deep Learning

Due to the substantial computational cost, training state-of-the-art deep neural networks for large-scale datasets often requires distributed training using multiple computation workers. However, by nature, workers need to frequently…

Machine Learning · Computer Science 2018-02-21 Yusuke Tsuzuku , Hiroto Imachi , Takuya Akiba

L-GreCo: Layerwise-Adaptive Gradient Compression for Efficient and Accurate Deep Learning

Data-parallel distributed training of deep neural networks (DNN) has gained very widespread adoption, but can still experience communication bottlenecks. To address this issue, entire families of compression mechanisms have been developed,…

Machine Learning · Computer Science 2023-06-12 Mohammadreza Alimohammadi , Ilia Markov , Elias Frantar , Dan Alistarh