English
Related papers

Related papers: COMET: A Novel Memory-Efficient Deep Learning Trai…

200 papers

Deep neural networks (DNNs) are becoming increasingly deeper, wider, and non-linear due to the growing demands on prediction accuracy and analysis quality. When training a DNN model, the intermediate activation data must be saved in the…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-11-24 Sian Jin , Guanpeng Li , Shuaiwen Leon Song , Dingwen Tao

Modern Deep Learning (DL) models have grown to sizes requiring massive clusters of specialized, high-end nodes to train. Designing such clusters to maximize both performance and utilization--to amortize their steep cost--is a challenging…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-03-15 Divya Kiran Kadiyala , Saeed Rashidi , Taekyung Heo , Abhimanyu Rajeshkumar Bambhaniya , Tushar Krishna , Alexandros Daglis

Modern machine learning accelerators are designed to efficiently execute deep neural networks (DNNs) by optimizing data movement, memory hierarchy, and compute throughput. However, emerging DNN models such as large language models, state…

Hardware Architecture · Computer Science 2025-09-03 Shubham Negi , Manik Singhal , Aayush Ankit , Sudeep Bhoja , Kaushik Roy

Convolutional Neural Networks (CNNs) achieve remarkable accuracy in vision tasks, yet their computational complexity challenges low-power edge deployment. In this work, we present COMET, a framework of CNN models that employ efficient…

Signal Processing · Electrical Eng. & Systems 2026-04-09 Boyang Chen , Mohd Tasleem Khan , George Goussetis , Mathini Sellathurai , Yuan Ding , João F. C. Mota , Jongeun Lee

Deep neural networks (DNNs) have been quite successful in solving many complex learning problems. However, DNNs tend to have a large number of learning parameters, leading to a large memory and computation requirement. In this paper, we…

Machine Learning · Computer Science 2019-05-21 Sangkyun Lee , Jeonghyun Lee

DLRM is a state-of-the-art recommendation system model that has gained widespread adoption across various industry applications. The large size of DLRM models, however, necessitates the use of multiple devices/GPUs for efficient training. A…

As deep neural networks (DNNs) grow in complexity and size, the resultant increase in communication overhead during distributed training has become a significant bottleneck, challenging the scalability of distributed training systems.…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-02-13 Haoyu Li , Yuchen Xu , Jiayi Chen , Rohit Dwivedula , Wenfei Wu , Keqiang He , Aditya Akella , Daehyeok Kim

Modern deep neural networks (DNNs) are extremely powerful; however, this comes at the price of increased depth and having more parameters per layer, making their training and inference more computationally challenging. In an attempt to…

Machine Learning · Statistics 2024-03-04 Lingyu Gu , Yongqi Du , Yuan Zhang , Di Xie , Shiliang Pu , Robert C. Qiu , Zhenyu Liao

DNNs have been quickly and broadly exploited to improve the data analysis quality in many complex science and engineering applications. Today's DNNs are becoming deeper and wider because of increasing demand on the analysis quality and more…

Computer Vision and Pattern Recognition · Computer Science 2019-04-24 Sian Jin , Sheng Di , Xin Liang , Jiannan Tian , Dingwen Tao , Franck Cappello

A wide range of optimization problems arising in machine learning can be solved by gradient descent algorithms, and a central question in this area is how to efficiently compress a large-scale dataset so as to reduce the computational…

Machine Learning · Computer Science 2022-10-11 Jiawei Huang , Ruomin Huang , Wenjie Liu , Nikolaos M. Freris , Hu Ding

With the growing size of deep neural networks and datasets, the computational costs of training have significantly increased. The layer-freezing technique has recently attracted great attention as a promising method to effectively reduce…

Machine Learning · Computer Science 2025-08-22 Chence Yang , Ci Zhang , Lei Lu , Qitao Tan , Sheng Li , Ao Li , Xulong Tang , Shaoyi Huang , Jinzhen Wang , Guoming Li , Jundong Li , Xiaoming Zhai , Jin Lu , Geng Yuan

As the state-of-the-art machine learning methods in many fields rely on larger datasets, storing datasets and training models on them become significantly more expensive. This paper proposes a training set synthesis technique for…

Computer Vision and Pattern Recognition · Computer Science 2021-03-09 Bo Zhao , Konda Reddy Mopuri , Hakan Bilen

Deep Neural Networks (DNNs) are increasingly deployed in highly energy-constrained environments such as autonomous drones and wearable devices while at the same time must operate in real-time. Therefore, reducing the energy consumption has…

Machine Learning · Computer Science 2019-06-04 Haichuan Yang , Yuhao Zhu , Ji Liu

This paper reduces the cost of DNNs training by decreasing the amount of data movement across heterogeneous architectures composed of several GPUs and multicore CPU devices. In particular, this paper proposes an algorithm to dynamically…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-04-07 Sicong Zhuang , Cristiano Malossi , Marc Casas

Model compression has emerged as an important area of research for deploying deep learning models on Internet-of-Things (IoT). However, for extremely memory-constrained scenarios, even the compressed models cannot fit within the memory of a…

Machine Learning · Statistics 2019-07-30 Kartikeya Bhardwaj , Chingyi Lin , Anderson Sartor , Radu Marculescu

Deep Neural Networks (DNNs) suffer from a rapid decrease in performance when trained on a sequence of tasks where only data of the most recent task is available. This phenomenon, known as catastrophic forgetting, prevents DNNs from…

Machine Learning · Computer Science 2021-04-22 Felix Wiewel , Bin Yang

Deep convolutional neural networks (CNNs) with a large number of parameters require intensive computational resources, and thus are hard to be deployed in resource-constrained platforms. Decomposition-based methods, therefore, have been…

Computer Vision and Pattern Recognition · Computer Science 2022-10-27 Shaowu Chen , Jiahao Zhou , Weize Sun , Lei Huang

To achieve higher accuracy in machine learning tasks, very deep convolutional neural networks (CNNs) are designed recently. However, the large memory access of deep CNNs will lead to high power consumption. A variety of hardware-friendly…

Image and Video Processing · Electrical Eng. & Systems 2021-06-25 Yubo Shi , Meiqi Wang , Siyi Chen , Jinghe Wei , Zhongfeng Wang

Convolutional neural networks (CNNs) have become the dominant neural network architecture for solving visual processing tasks. One of the major obstacles hindering the ubiquitous use of CNNs for inference is their relatively high memory…

Computer Vision and Pattern Recognition · Computer Science 2019-09-26 Chaim Baskin , Brian Chmiel , Evgenii Zheltonozhskii , Ron Banner , Alex M. Bronstein , Avi Mendelson

Training large-scale distributed machine learning models imposes considerable demands on network infrastructure, often resulting in sudden traffic spikes that lead to congestion, increased latency, and reduced throughput, which would…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-06-23 Yisu Wang , Xinjiao Li , Ruilong Wu , Huangxun Chen , Dirk Kutscher
‹ Prev 1 2 3 10 Next ›