English
Related papers

Related papers: Memory Optimization for Deep Networks

200 papers

Larger deep learning models usually lead to higher model quality with an ever-increasing GPU memory footprint. Although tensor checkpointing techniques have been proposed to enable training under a restricted GPU memory budget, the input…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-09-07 Jianjin Liao , Mingzhen Li , Qingxiao Sun , Jiwei Hao , Fengwei Yu , Shengdong Chen , Ye Tao , Zicheng Zhang , Hailong Yang , Zhongzhi Luan , Depei Qian

While hardware-software co-design has significantly improved the efficiency of neural network inference, modeling the training phase remains a critical yet underexplored challenge. Training workloads impose distinct constraints,…

Machine Learning · Computer Science 2026-03-17 Jérémy Morlier , Robin Geens , Stef Cuyckens , Arne Symons , Marian Verhelst , Vincent Gripon , Mathieu Léonardon

The DenseNet architecture is highly computationally efficient as a result of feature reuse. However, a naive DenseNet implementation can require a significant amount of GPU memory: If not properly managed, pre-activation batch normalization…

Computer Vision and Pattern Recognition · Computer Science 2017-07-24 Geoff Pleiss , Danlu Chen , Gao Huang , Tongcheng Li , Laurens van der Maaten , Kilian Q. Weinberger

In modern Deep Learning, it has been a trend to design larger Deep Neural Networks (DNNs) for the execution of more complex tasks and better accuracy. On the other hand, Convolutional Neural Networks (CNNs) have become the standard method…

Machine Learning · Computer Science 2025-02-19 Ding-Yong Hong , Tzu-Hsien Tsai , Ning Wang , Pangfeng Liu , Jan-Jan Wu

Convolution is a critical component in modern deep neural networks, thus several algorithms for convolution have been developed. Direct convolution is simple but suffers from poor performance. As an alternative, multiple indirect methods…

Machine Learning · Computer Science 2017-06-22 Minsik Cho , Daniel Brand

Training wide and deep neural networks (DNNs) require large amounts of storage resources such as memory because the intermediate activation data must be saved in the memory during forward propagation and then restored for backward…

Artificial Intelligence · Computer Science 2021-11-19 Sian Jin , Chengming Zhang , Xintong Jiang , Yunhe Feng , Hui Guan , Guanpeng Li , Shuaiwen Leon Song , Dingwen Tao

Ensembles of deep neural networks significantly improve generalization accuracy. However, training neural network ensembles requires a large amount of computational resources and time. State-of-the-art approaches either train all networks…

Machine Learning · Computer Science 2020-03-10 Abdul Wasay , Brian Hentschel , Yuze Liao , Sanyuan Chen , Stratos Idreos

Recent studies on automatic neural architectures search have demonstrated significant performance, competitive to or even better than hand-crafted neural architectures. However, most of the existing network architecture tend to use…

Machine Learning · Computer Science 2020-06-12 Peiye Liu , Bo Wu , Huadong Ma , Mingoo Seok

We propose a systematic approach to reduce the memory consumption of deep neural network training. Specifically, we design an algorithm that costs O(sqrt(n)) memory to train a n layer network, with only the computational cost of an extra…

Machine Learning · Computer Science 2016-04-25 Tianqi Chen , Bing Xu , Chiyuan Zhang , Carlos Guestrin

We introduce a learning-based framework to optimize tensor programs for deep learning workloads. Efficient implementations of tensor operators, such as matrix multiplication and high dimensional convolution, are key enablers of effective…

Machine Learning · Computer Science 2019-01-10 Tianqi Chen , Lianmin Zheng , Eddie Yan , Ziheng Jiang , Thierry Moreau , Luis Ceze , Carlos Guestrin , Arvind Krishnamurthy

Computationally intensive deep neural networks (DNNs) are well-suited to run on GPUs, but newly developed algorithms usually require the heavily optimized DNN routines to work efficiently, and this problem could be even more difficult for…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-11-12 Yu-Sheng Lin , Wei-Chao Chen , Shao-Yi Chien

The global scarcity of GPUs necessitates more sophisticated strategies for Deep Learning jobs in shared cluster environments. Accurate estimation of how much GPU memory a job will require is fundamental to enabling advanced scheduling and…

Performance · Computer Science 2025-10-27 Jiabo Shi , Dimitrios Pezaros , Yehia Elkhatib

Training deep learning models, particularly Transformer-based architectures such as Large Language Models (LLMs), demands substantial computational resources and extended training periods. While optimal configuration and infrastructure…

Machine Learning · Computer Science 2024-12-30 Alireza Pourali , Arian Boukani , Hamzeh Khazaei

As deep learning models continue to increase in size, the memory requirements for training have surged. While high-level techniques like offloading, recomputation, and compression can alleviate memory pressure, they also introduce…

Machine Learning · Computer Science 2023-10-31 Huiyao Shu , Ang Wang , Ziji Shi , Hanyu Zhao , Yong Li , Lu Lu

Deep neural networks have made significant progress in the field of computer vision. Recent studies have shown that depth, width and shortcut connections of neural network architectures play a crucial role in their performance. One of the…

Computer Vision and Pattern Recognition · Computer Science 2021-12-30 Rui-Yang Ju , Ting-Yu Lin , Jen-Shiun Chiang

As deep learning models become popular, there is a lot of need for deploying them to diverse device environments. Because it is costly to develop and optimize a neural network for every single environment, there is a line of research to…

Machine Learning · Computer Science 2023-11-20 Jong-Ryul Lee , Yong-Hyuk Moon

Tensor computations, with matrix multiplication being the primary operation, serve as the fundamental basis for data analysis, physics, machine learning, and deep learning. As the scale and complexity of data continue to grow rapidly, the…

Hardware Architecture · Computer Science 2024-10-24 Qizhe Wu , Yuchen Gui , Zhichen Zeng , Xiaotian Wang , Huawen Liang , Xi Jin

With the recent growth in demand for large-scale deep neural networks, compute in-memory (CiM) has come up as a prominent solution to alleviate bandwidth and on-chip interconnect bottlenecks that constrain Von-Neuman architectures. However,…

Hardware Architecture · Computer Science 2024-03-19 Souvik Kundu , Anthony Sarah , Vinay Joshi , Om J Omer , Sreenivas Subramoney

Tiny deep learning on microcontroller units (MCUs) is challenging due to the limited memory size. We find that the memory bottleneck is due to the imbalanced memory distribution in convolutional neural network (CNN) designs: the first…

Computer Vision and Pattern Recognition · Computer Science 2024-04-04 Ji Lin , Wei-Ming Chen , Han Cai , Chuang Gan , Song Han

Memory efficiency is crucial in training deep learning networks on resource-restricted devices. During backpropagation, forward tensors are used to calculate gradients. Despite the option of keeping those dependencies in memory until they…

Machine Learning · Computer Science 2022-12-22 Manuela Schuler , Richard Membarth , Philipp Slusallek
‹ Prev 1 2 3 10 Next ›