Related papers: Diagonal Memory Optimisation for Machine Learning …

vMCU: Coordinated Memory Management and Kernel Optimization for DNN Inference on MCUs

IoT devices based on microcontroller units (MCU) provide ultra-low power consumption and ubiquitous computation for near-sensor deep learning models (DNN). However, the memory of MCU is usually 2-3 orders of magnitude smaller than mobile…

Hardware Architecture · Computer Science 2024-06-12 Size Zheng , Renze Chen , Meng Li , Zihao Ye , Luis Ceze , Yun Liang

GPU Memory and Utilization Estimation for Training-Aware Resource Management: Opportunities and Limitations

Collocating deep learning training tasks improves GPU utilization but risks resource contention, severe slowdowns, and out-of-memory (OOM) failures. Accurate memory estimation is essential for robust collocation, and GPU utilization…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-04-29 Ehsan Yousefzadeh-Asl-Miandoab , Reza Karimzadeh , Danyal Yorulmaz , Bulat Ibragimov , Pınar Tözün

Optimizing for In-memory Deep Learning with Emerging Memory Technology

In-memory deep learning computes neural network models where they are stored, thus avoiding long distance communication between memory and computation units, resulting in considerable savings in energy and time. In-memory deep learning has…

Machine Learning · Computer Science 2021-12-02 Zhehui Wang , Tao Luo , Rick Siow Mong Goh , Wei Zhang , Weng-Fai Wong

Energy-efficient Deployment of Deep Learning Applications on Cortex-M based Microcontrollers using Deep Compression

Large Deep Neural Networks (DNNs) are the backbone of today's artificial intelligence due to their ability to make accurate predictions when being trained on huge datasets. With advancing technologies, such as the Internet of Things,…

Machine Learning · Computer Science 2023-07-14 Mark Deutel , Philipp Woller , Christopher Mutschler , Jürgen Teich

On Hyperparameter Optimization of Machine Learning Algorithms: Theory and Practice

Machine learning algorithms have been used widely in various applications and areas. To fit a machine learning model into different problems, its hyper-parameters must be tuned. Selecting the best hyper-parameter configuration for machine…

Machine Learning · Computer Science 2022-10-06 Li Yang , Abdallah Shami

Neural networks on microcontrollers: saving memory at inference via operator reordering

Designing deep learning models for highly-constrained hardware would allow imbuing many edge devices with intelligence. Microcontrollers (MCUs) are an attractive platform for building smart devices due to their low cost, wide availability,…

Machine Learning · Computer Science 2020-03-04 Edgar Liberis , Nicholas D. Lane

Analysis of memory consumption by neural networks based on hyperparameters

Deep learning models are trained and deployed in multiple domains. Increasing usage of deep learning models alarms the usage of memory consumed while computation by deep learning models. Existing approaches for reducing memory consumption…

Machine Learning · Computer Science 2021-10-25 Mahendran N

Efficient Memory Management for Deep Neural Net Inference

While deep neural net inference was considered a task for servers only, latest advances in technology allow the task of inference to be moved to mobile and embedded devices, desired for various reasons ranging from latency to privacy. These…

Machine Learning · Computer Science 2020-02-18 Yury Pisarchyk , Juhyun Lee

Exploiting Parallelism Opportunities with Deep Learning Frameworks

State-of-the-art machine learning frameworks support a wide variety of design features to enable a flexible machine learning programming interface and to ease the programmability burden on machine learning developers. Identifying and using…

Machine Learning · Computer Science 2020-07-01 Yu Emma Wang , Carole-Jean Wu , Xiaodong Wang , Kim Hazelwood , David Brooks

Memory Augmented Optimizers for Deep Learning

Popular approaches for minimizing loss in data-driven learning often involve an abstraction or an explicit retention of the history of gradients for efficient parameter updates. The aggregated history of gradients nudges the parameter…

Machine Learning · Computer Science 2021-06-22 Paul-Aymeric McRae , Prasanna Parthasarathi , Mahmoud Assran , Sarath Chandar

Learning to Remember More with Less Memorization

Memory-augmented neural networks consisting of a neural controller and an external memory have shown potentials in long-term sequential learning. Current RAM-like memory models maintain memory accessing every timesteps, thus they do not…

Machine Learning · Computer Science 2019-03-21 Hung Le , Truyen Tran , Svetha Venkatesh

Profiling and optimization of multi-card GPU machine learning jobs

The effectiveness and efficiency of machine learning methodologies are crucial, especially with respect to the quality of results and computational cost. This paper discusses different model optimization techniques, providing a…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-05-30 Marcin Lawenda , Kyrylo Khloponin , Krzesimir Samborski , Łukasz Szustak

Reimagining Memory Access for LLM Inference: Compression-Aware Memory Controller Design

The efficiency of Large Language Model~(LLM) inference is often constrained by substantial memory bandwidth and capacity demands. Existing techniques, such as pruning, quantization, and mixture of experts/depth, reduce memory capacity…

Hardware Architecture · Computer Science 2025-04-23 Rui Xie , Asad Ul Haq , Linsen Ma , Yunhua Fang , Zirak Burzin Engineer , Liu Liu , Tong Zhang

Deep Learning Inference on Heterogeneous Mobile Processors: Potentials and Pitfalls

There is a growing demand to deploy computation-intensive deep learning (DL) models on resource-constrained mobile devices for real-time intelligent applications. Equipped with a variety of processing units such as CPUs, GPUs, and NPUs, the…

Machine Learning · Computer Science 2024-05-06 Sicong Liu , Wentao Zhou , Zimu Zhou , Bin Guo , Minfan Wang , Cheng Fang , Zheng Lin , Zhiwen Yu

Learning to Optimize Tensor Programs

We introduce a learning-based framework to optimize tensor programs for deep learning workloads. Efficient implementations of tensor operators, such as matrix multiplication and high dimensional convolution, are key enablers of effective…

Machine Learning · Computer Science 2019-01-10 Tianqi Chen , Lianmin Zheng , Eddie Yan , Ziheng Jiang , Thierry Moreau , Luis Ceze , Carlos Guestrin , Arvind Krishnamurthy

Model-Based Deep Learning: On the Intersection of Deep Learning and Optimization

Decision making algorithms are used in a multitude of different applications. Conventional approaches for designing decision algorithms employ principled and simplified modelling, based on which one can determine decisions via tractable…

Signal Processing · Electrical Eng. & Systems 2022-06-23 Nir Shlezinger , Yonina C. Eldar , Stephen P. Boyd

Efficient Neural Network Deployment for Microcontroller

Edge computing for neural networks is getting important especially for low power applications and offline devices. TensorFlow Lite and PyTorch Mobile were released for this purpose. But they mainly support mobile devices instead of…

Hardware Architecture · Computer Science 2020-07-06 Hasan Unlu

MCUFormer: Deploying Vision Transformers on Microcontrollers with Limited Memory

Due to the high price and heavy energy consumption of GPUs, deploying deep models on IoT devices such as microcontrollers makes significant contributions for ecological AI. Conventional methods successfully enable convolutional neural…

Computer Vision and Pattern Recognition · Computer Science 2023-12-22 Yinan Liang , Ziwei Wang , Xiuwei Xu , Yansong Tang , Jie Zhou , Jiwen Lu

Scaling Studies for Efficient Parameter Search and Parallelism for Large Language Model Pre-training

AI accelerator processing capabilities and memory constraints largely dictate the scale in which machine learning workloads (e.g., training and inference) can be executed within a desirable time frame. Training a state of the art,…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-10-12 Michael Benington , Leo Phan , Chris Pierre Paul , Evan Shoemaker , Priyanka Ranade , Torstein Collett , Grant Hodgson Perez , Christopher Krieger

Profile-guided memory optimization for deep neural networks

Recent years have seen deep neural networks (DNNs) becoming wider and deeper to achieve better performance in many applications of AI. Such DNNs however require huge amounts of memory to store weights and intermediate results (e.g.,…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-04-27 Taro Sekiyama , Takashi Imamichi , Haruki Imai , Rudy Raymond