Related papers: Memory Optimization for Deep Networks

Mimose: An Input-Aware Checkpointing Planner for Efficient Training on GPU

Larger deep learning models usually lead to higher model quality with an ever-increasing GPU memory footprint. Although tensor checkpointing techniques have been proposed to enable training under a restricted GPU memory budget, the input…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-09-07 Jianjin Liao , Mingzhen Li , Qingxiao Sun , Jiwei Hao , Fengwei Yu , Shengdong Chen , Ye Tao , Zicheng Zhang , Hailong Yang , Zhongzhi Luan , Depei Qian

MONET: Modeling and Optimization of neural NEtwork Training from Edge to Data Centers

While hardware-software co-design has significantly improved the efficiency of neural network inference, modeling the training phase remains a critical yet underexplored challenge. Training workloads impose distinct constraints,…

Machine Learning · Computer Science 2026-03-17 Jérémy Morlier , Robin Geens , Stef Cuyckens , Arne Symons , Marian Verhelst , Vincent Gripon , Mathieu Léonardon

Memory-Efficient Implementation of DenseNets

The DenseNet architecture is highly computationally efficient as a result of feature reuse. However, a naive DenseNet implementation can require a significant amount of GPU memory: If not properly managed, pre-activation batch normalization…

Computer Vision and Pattern Recognition · Computer Science 2017-07-24 Geoff Pleiss , Danlu Chen , Gao Huang , Tongcheng Li , Laurens van der Maaten , Kilian Q. Weinberger

GPU Memory Usage Optimization for Backward Propagation in Deep Network Training

In modern Deep Learning, it has been a trend to design larger Deep Neural Networks (DNNs) for the execution of more complex tasks and better accuracy. On the other hand, Convolutional Neural Networks (CNNs) have become the standard method…

Machine Learning · Computer Science 2025-02-19 Ding-Yong Hong , Tzu-Hsien Tsai , Ning Wang , Pangfeng Liu , Jan-Jan Wu

MEC: Memory-efficient Convolution for Deep Neural Network

Convolution is a critical component in modern deep neural networks, thus several algorithms for convolution have been developed. Direct convolution is simple but suffers from poor performance. As an alternative, multiple indirect methods…

Machine Learning · Computer Science 2017-06-22 Minsik Cho , Daniel Brand

COMET: A Novel Memory-Efficient Deep Learning Training Framework by Using Error-Bounded Lossy Compression

Training wide and deep neural networks (DNNs) require large amounts of storage resources such as memory because the intermediate activation data must be saved in the memory during forward propagation and then restored for backward…

Artificial Intelligence · Computer Science 2021-11-19 Sian Jin , Chengming Zhang , Xintong Jiang , Yunhe Feng , Hui Guan , Guanpeng Li , Shuaiwen Leon Song , Dingwen Tao

MotherNets: Rapid Deep Ensemble Learning

Ensembles of deep neural networks significantly improve generalization accuracy. However, training neural network ensembles requires a large amount of computational resources and time. State-of-the-art approaches either train all networks…

Machine Learning · Computer Science 2020-03-10 Abdul Wasay , Brian Hentschel , Yuze Liao , Sanyuan Chen , Stratos Idreos

MemNet: Memory-Efficiency Guided Neural Architecture Search with Augment-Trim learning

Recent studies on automatic neural architectures search have demonstrated significant performance, competitive to or even better than hand-crafted neural architectures. However, most of the existing network architecture tend to use…

Machine Learning · Computer Science 2020-06-12 Peiye Liu , Bo Wu , Huadong Ma , Mingoo Seok

Training Deep Nets with Sublinear Memory Cost

We propose a systematic approach to reduce the memory consumption of deep neural network training. Specifically, we design an algorithm that costs O(sqrt(n)) memory to train a n layer network, with only the computational cost of an extra…

Machine Learning · Computer Science 2016-04-25 Tianqi Chen , Bing Xu , Chiyuan Zhang , Carlos Guestrin

Learning to Optimize Tensor Programs

We introduce a learning-based framework to optimize tensor programs for deep learning workloads. Efficient implementations of tensor operators, such as matrix multiplication and high dimensional convolution, are key enablers of effective…

Machine Learning · Computer Science 2019-01-10 Tianqi Chen , Lianmin Zheng , Eddie Yan , Ziheng Jiang , Thierry Moreau , Luis Ceze , Carlos Guestrin , Arvind Krishnamurthy

MERIT: Tensor Transform for Memory-Efficient Vision Processing on Parallel Architectures

Computationally intensive deep neural networks (DNNs) are well-suited to run on GPUs, but newly developed algorithms usually require the heavily optimized DNN routines to work efficiently, and this problem could be even more difficult for…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-11-12 Yu-Sheng Lin , Wei-Chao Chen , Shao-Yi Chien

xMem: A CPU-Based Approach for Accurate Estimation of GPU Memory in Deep Learning Training Workloads

The global scarcity of GPUs necessitates more sophisticated strategies for Deep Learning jobs in shared cluster environments. Accurate estimation of how much GPU memory a job will require is fundamental to enabling advanced scheduling and…

Performance · Computer Science 2025-10-27 Jiabo Shi , Dimitrios Pezaros , Yehia Elkhatib

PreNeT: Leveraging Computational Features to Predict Deep Neural Network Training Time

Training deep learning models, particularly Transformer-based architectures such as Large Language Models (LLMs), demands substantial computational resources and extended training periods. While optimal configuration and infrastructure…

Machine Learning · Computer Science 2024-12-30 Alireza Pourali , Arian Boukani , Hamzeh Khazaei

ROAM: memory-efficient large DNN training via optimized operator ordering and memory layout

As deep learning models continue to increase in size, the memory requirements for training have surged. While high-level techniques like offloading, recomputation, and compression can alleviate memory pressure, they also introduce…

Machine Learning · Computer Science 2023-10-31 Huiyao Shu , Ang Wang , Ziji Shi , Hanyu Zhao , Yong Li , Lu Lu

New Pruning Method Based on DenseNet Network for Image Classification

Deep neural networks have made significant progress in the field of computer vision. Recent studies have shown that depth, width and shortcut connections of neural network architectures play a crucial role in their performance. One of the…

Computer Vision and Pattern Recognition · Computer Science 2021-12-30 Rui-Yang Ju , Ting-Yu Lin , Jen-Shiun Chiang

Bespoke: A Block-Level Neural Network Optimization Framework for Low-Cost Deployment

As deep learning models become popular, there is a lot of need for deploying them to diverse device environments. Because it is costly to develop and optimize a neural network for every single environment, there is a line of research to…

Machine Learning · Computer Science 2023-11-20 Jong-Ryul Lee , Yong-Hyuk Moon

EN-T: Optimizing Tensor Computing Engines Performance via Encoder-Based Methodology

Tensor computations, with matrix multiplication being the primary operation, serve as the fundamental basis for data analysis, physics, machine learning, and deep learning. As the scale and complexity of data continue to grow rapidly, the…

Hardware Architecture · Computer Science 2024-10-24 Qizhe Wu , Yuchen Gui , Zhichen Zeng , Xiaotian Wang , Huawen Liang , Xi Jin

CiMNet: Towards Joint Optimization for DNN Architecture and Configuration for Compute-In-Memory Hardware

With the recent growth in demand for large-scale deep neural networks, compute in-memory (CiM) has come up as a prominent solution to alleviate bandwidth and on-chip interconnect bottlenecks that constrain Von-Neuman architectures. However,…

Hardware Architecture · Computer Science 2024-03-19 Souvik Kundu , Anthony Sarah , Vinay Joshi , Om J Omer , Sreenivas Subramoney

MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning

Tiny deep learning on microcontroller units (MCUs) is challenging due to the limited memory size. We find that the memory bottleneck is due to the imbalanced memory distribution in convolutional neural network (CNN) designs: the first…

Computer Vision and Pattern Recognition · Computer Science 2024-04-04 Ji Lin , Wei-Ming Chen , Han Cai , Chuang Gan , Song Han

XEngine: Optimal Tensor Rematerialization for Neural Networks in Heterogeneous Environments

Memory efficiency is crucial in training deep learning networks on resource-restricted devices. During backpropagation, forward tensors are used to calculate gradients. Despite the option of keeping those dependencies in memory until they…

Machine Learning · Computer Science 2022-12-22 Manuela Schuler , Richard Membarth , Philipp Slusallek