Related papers: COMET: A Novel Memory-Efficient Deep Learning Trai…

A Novel Memory-Efficient Deep Learning Training Framework via Error-Bounded Lossy Compression

Deep neural networks (DNNs) are becoming increasingly deeper, wider, and non-linear due to the growing demands on prediction accuracy and analysis quality. When training a DNN model, the intermediate activation data must be saved in the…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-11-24 Sian Jin , Guanpeng Li , Shuaiwen Leon Song , Dingwen Tao

COMET: A Comprehensive Cluster Design Methodology for Distributed Deep Learning Training

Modern Deep Learning (DL) models have grown to sizes requiring massive clusters of specialized, high-end nodes to train. Designing such clusters to maximize both performance and utilization--to amortize their steep cost--is a challenging…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-03-15 Divya Kiran Kadiyala , Saeed Rashidi , Taekyung Heo , Abhimanyu Rajeshkumar Bambhaniya , Tushar Krishna , Alexandros Daglis

COMET: A Framework for Modeling Compound Operation Dataflows with Explicit Collectives

Modern machine learning accelerators are designed to efficiently execute deep neural networks (DNNs) by optimizing data movement, memory hierarchy, and compute throughput. However, emerging DNN models such as large language models, state…

Hardware Architecture · Computer Science 2025-09-03 Shubham Negi , Manik Singhal , Aayush Ankit , Sudeep Bhoja , Kaushik Roy

COMET: Co-Optimization of a CNN Model using Efficient-Hardware OBC Techniques

Convolutional Neural Networks (CNNs) achieve remarkable accuracy in vision tasks, yet their computational complexity challenges low-power edge deployment. In this work, we present COMET, a framework of CNN models that employ efficient…

Signal Processing · Electrical Eng. & Systems 2026-04-09 Boyang Chen , Mohd Tasleem Khan , George Goussetis , Mathini Sellathurai , Yuan Ding , João F. C. Mota , Jongeun Lee

Compressed Learning of Deep Neural Networks for OpenCL-Capable Embedded Systems

Deep neural networks (DNNs) have been quite successful in solving many complex learning problems. However, DNNs tend to have a large number of learning parameters, leading to a large memory and computation requirement. In this paper, we…

Machine Learning · Computer Science 2019-05-21 Sangkyun Lee , Jeonghyun Lee

Accelerating Communication in Deep Learning Recommendation Model Training with Dual-Level Adaptive Lossy Compression

DLRM is a state-of-the-art recommendation system model that has gained widespread adoption across various industry applications. The large size of DLRM models, however, necessitates the use of multiple devices/GPUs for efficient training. A…

Machine Learning · Computer Science 2024-10-02 Hao Feng , Boyuan Zhang , Fanjiang Ye , Min Si , Ching-Hsiang Chu , Jiannan Tian , Chunxing Yin , Summer Deng , Yuchen Hao , Pavan Balaji , Tong Geng , Dingwen Tao

Accelerating Distributed Deep Learning using Lossless Homomorphic Compression

As deep neural networks (DNNs) grow in complexity and size, the resultant increase in communication overhead during distributed training has become a significant bottleneck, challenging the scalability of distributed training systems.…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-02-13 Haoyu Li , Yuchen Xu , Jiayi Chen , Rohit Dwivedula , Wenfei Wu , Keqiang He , Aditya Akella , Daehyeok Kim

"Lossless" Compression of Deep Neural Networks: A High-dimensional Neural Tangent Kernel Approach

Modern deep neural networks (DNNs) are extremely powerful; however, this comes at the price of increased depth and having more parameters per layer, making their training and inference more computationally challenging. In an attempt to…

Machine Learning · Statistics 2024-03-04 Lingyu Gu , Yongqi Du , Yuan Zhang , Di Xie , Shiliang Pu , Robert C. Qiu , Zhenyu Liao

DeepSZ: A Novel Framework to Compress Deep Neural Networks by Using Error-Bounded Lossy Compression

DNNs have been quickly and broadly exploited to improve the data analysis quality in many complex science and engineering applications. Today's DNNs are becoming deeper and wider because of increasing demand on the analysis quality and more…

Computer Vision and Pattern Recognition · Computer Science 2019-04-24 Sian Jin , Sheng Di , Xin Liang , Jiannan Tian , Dingwen Tao , Franck Cappello

A Novel Sequential Coreset Method for Gradient Descent Algorithms

A wide range of optimization problems arising in machine learning can be solved by gradient descent algorithms, and a central question in this area is how to efficiently compress a large-scale dataset so as to reduce the computational…

Machine Learning · Computer Science 2022-10-11 Jiawei Huang , Ruomin Huang , Wenjie Liu , Nikolaos M. Freris , Hu Ding

Rethinking the Potential of Layer Freezing for Efficient DNN Training

With the growing size of deep neural networks and datasets, the computational costs of training have significantly increased. The layer-freezing technique has recently attracted great attention as a promising method to effectively reduce…

Machine Learning · Computer Science 2025-08-22 Chence Yang , Ci Zhang , Lei Lu , Qitao Tan , Sheng Li , Ao Li , Xulong Tang , Shaoyi Huang , Jinzhen Wang , Guoming Li , Jundong Li , Xiaoming Zhai , Jin Lu , Geng Yuan

Dataset Condensation with Gradient Matching

As the state-of-the-art machine learning methods in many fields rely on larger datasets, storing datasets and training models on them become significantly more expensive. This paper proposes a training set synthesis technique for…

Computer Vision and Pattern Recognition · Computer Science 2021-03-09 Bo Zhao , Konda Reddy Mopuri , Hakan Bilen

Energy-Constrained Compression for Deep Neural Networks via Weighted Sparse Projection and Layer Input Masking

Deep Neural Networks (DNNs) are increasingly deployed in highly energy-constrained environments such as autonomous drones and wearable devices while at the same time must operate in real-time. Therefore, reducing the energy consumption has…

Machine Learning · Computer Science 2019-06-04 Haichuan Yang , Yuhao Zhu , Ji Liu

Reducing Data Motion to Accelerate the Training of Deep Neural Networks

This paper reduces the cost of DNNs training by decreasing the amount of data movement across heterogeneous architectures composed of several GPUs and multicore CPU devices. In particular, this paper proposes an algorithm to dynamically…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-04-07 Sicong Zhuang , Cristiano Malossi , Marc Casas

Memory- and Communication-Aware Model Compression for Distributed Deep Learning Inference on IoT

Model compression has emerged as an important area of research for deploying deep learning models on Internet-of-Things (IoT). However, for extremely memory-constrained scenarios, even the compressed models cannot fit within the memory of a…

Machine Learning · Statistics 2019-07-30 Kartikeya Bhardwaj , Chingyi Lin , Anderson Sartor , Radu Marculescu

Condensed Composite Memory Continual Learning

Deep Neural Networks (DNNs) suffer from a rapid decrease in performance when trained on a sequence of tasks where only data of the most recent task is available. This phenomenon, known as catastrophic forgetting, prevents DNNs from…

Machine Learning · Computer Science 2021-04-22 Felix Wiewel , Bin Yang

Joint Matrix Decomposition for Deep Convolutional Neural Networks Compression

Deep convolutional neural networks (CNNs) with a large number of parameters require intensive computational resources, and thus are hard to be deployed in resource-constrained platforms. Decomposition-based methods, therefore, have been…

Computer Vision and Pattern Recognition · Computer Science 2022-10-27 Shaowu Chen , Jiahao Zhou , Weize Sun , Lei Huang

Transform-Based Feature Map Compression for CNN Inference

To achieve higher accuracy in machine learning tasks, very deep convolutional neural networks (CNNs) are designed recently. However, the large memory access of deep CNNs will lead to high power consumption. A variety of hardware-friendly…

Image and Video Processing · Electrical Eng. & Systems 2021-06-25 Yubo Shi , Meiqi Wang , Siyi Chen , Jinghe Wei , Zhongfeng Wang

CAT: Compression-Aware Training for bandwidth reduction

Convolutional neural networks (CNNs) have become the dominant neural network architecture for solving visual processing tasks. One of the major obstacles hindering the ubiquitous use of CNNs for inference is their relatively high memory…

Computer Vision and Pattern Recognition · Computer Science 2019-09-26 Chaim Baskin , Brian Chmiel , Evgenii Zheltonozhskii , Ron Banner , Alex M. Bronstein , Avi Mendelson

NetSenseML: Network-Adaptive Compression for Efficient Distributed Machine Learning

Training large-scale distributed machine learning models imposes considerable demands on network infrastructure, often resulting in sudden traffic spikes that lead to congestion, increased latency, and reduced throughput, which would…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-06-23 Yisu Wang , Xinjiao Li , Ruilong Wu , Huangxun Chen , Dirk Kutscher