Related papers: MCUNetV2: Memory-Efficient Patch-based Inference f…

Split CNN Inference on Networked Microcontrollers

Running deep neural networks on microcontroller units (MCUs) is severely constrained by limited memory resources. While TinyML techniques reduce model size and computation, they often fail in practice due to excessive peak Random Access…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-12 Junyu Lu , Shashwath Suresh , Hao Liu , Qi Hong , Qing Wang

MCUNet: Tiny Deep Learning on IoT Devices

Machine learning on tiny IoT devices based on microcontroller units (MCU) is appealing but challenging: the memory of microcontrollers is 2-3 orders of magnitude smaller even than mobile phones. We propose MCUNet, a framework that jointly…

Computer Vision and Pattern Recognition · Computer Science 2020-11-20 Ji Lin , Wei-Ming Chen , Yujun Lin , John Cohn , Chuang Gan , Song Han

Designing Extremely Memory-Efficient CNNs for On-device Vision Tasks

In this paper, we introduce a memory-efficient CNN (convolutional neural network), which enables resource-constrained low-end embedded and IoT devices to perform on-device vision tasks, such as image classification and object detection,…

Computer Vision and Pattern Recognition · Computer Science 2024-10-15 Jaewook Lee , Yoel Park , Seulki Lee

vMCU: Coordinated Memory Management and Kernel Optimization for DNN Inference on MCUs

IoT devices based on microcontroller units (MCU) provide ultra-low power consumption and ubiquitous computation for near-sensor deep learning models (DNN). However, the memory of MCU is usually 2-3 orders of magnitude smaller than mobile…

Hardware Architecture · Computer Science 2024-06-12 Size Zheng , Renze Chen , Meng Li , Zihao Ye , Luis Ceze , Yun Liang

msf-CNN: Patch-based Multi-Stage Fusion with Convolutional Neural Networks for TinyML

AI spans from large language models to tiny models running on microcontrollers (MCUs). Extremely memory-efficient model architectures are decisive to fit within an MCU's tiny memory budget e.g., 128kB of RAM. However, inference latency must…

Machine Learning · Computer Science 2025-10-20 Zhaolan Huang , Emmanuel Baccelli

{\mu}-cuDNN: Accelerating Deep Learning Frameworks with Micro-Batching

NVIDIA cuDNN is a low-level library that provides GPU kernels frequently used in deep learning. Specifically, cuDNN implements several equivalent convolution algorithms, whose performance and memory footprint may vary considerably,…

Machine Learning · Computer Science 2018-04-16 Yosuke Oyama , Tal Ben-Nun , Torsten Hoefler , Satoshi Matsuoka

Neural networks on microcontrollers: saving memory at inference via operator reordering

Designing deep learning models for highly-constrained hardware would allow imbuing many edge devices with intelligence. Microcontrollers (MCUs) are an attractive platform for building smart devices due to their low cost, wide availability,…

Machine Learning · Computer Science 2020-03-04 Edgar Liberis , Nicholas D. Lane

Memory-Driven Mixed Low Precision Quantization For Enabling Deep Network Inference On Microcontrollers

This paper presents a novel end-to-end methodology for enabling the deployment of low-error deep networks on microcontrollers. To fit the memory and computational limitations of resource-constrained edge-devices, we exploit mixed…

Machine Learning · Computer Science 2019-05-31 Manuele Rusci , Alessandro Capotondi , Luca Benini

TinyD\'ej\`aVu: Smaller RAM and Faster Inference with Neural Networks on MCUs for Sensor Data Streams

Examples of embedded intelligence include a wide variety of tiny neural networks used on-board wireless sensors and actuators, which are expected to continuously perform inference on time-series of the data they sense. In order to fit…

Machine Learning · Computer Science 2026-05-28 Zhaolan Huang , Emmanuel Baccelli

On-Sensor Convolutional Neural Networks with Early-Exits

Tiny Machine Learning (TinyML) is a novel research field aiming at integrating Machine Learning (ML) within embedded devices with limited memory, computation, and energy. Recently, a new branch of TinyML has emerged, focusing on integrating…

Machine Learning · Computer Science 2025-06-03 Hazem Hesham Yousef Shalby , Arianna De Vecchi , Alice Scandelli , Pietro Bartoli , Diana Trojaniello , Manuel Roveri , Federica Villa

MAFAT: Memory-Aware Fusing and Tiling of Neural Networks for Accelerated Edge Inference

A rising research challenge is running costly machine learning (ML) networks locally on resource-constrained edge devices. ML networks with large convolutional layers can easily exceed available memory, increasing latency due to excessive…

Machine Learning · Computer Science 2023-07-20 Jackson Farley , Andreas Gerstlauer

RNNPool: Efficient Non-linear Pooling for RAM Constrained Inference

Standard Convolutional Neural Networks (CNNs) designed for computer vision tasks tend to have large intermediate activation maps. These require large working memory and are thus unsuitable for deployment on resource-constrained devices…

Computer Vision and Pattern Recognition · Computer Science 2020-10-26 Oindrila Saha , Aditya Kusupati , Harsha Vardhan Simhadri , Manik Varma , Prateek Jain

MicroNets: Neural Network Architectures for Deploying TinyML Applications on Commodity Microcontrollers

Executing machine learning workloads locally on resource constrained microcontrollers (MCUs) promises to drastically expand the application space of IoT. However, so-called TinyML presents severe technical challenges, as deep neural network…

Machine Learning · Computer Science 2021-04-14 Colby Banbury , Chuteng Zhou , Igor Fedorov , Ramon Matas Navarro , Urmish Thakker , Dibakar Gope , Vijay Janapa Reddi , Matthew Mattina , Paul N. Whatmough

Comparative Analysis of Lightweight Deep Learning Models for Memory-Constrained Devices

This paper presents a comprehensive evaluation of lightweight deep learning models for image classification, emphasizing their suitability for deployment in resource-constrained environments such as low-memory devices. Five state-of-the-art…

Computer Vision and Pattern Recognition · Computer Science 2025-09-10 Tasnim Shahriar

Accelerating TinyML Inference on Microcontrollers through Approximate Kernels

The rapid growth of microcontroller-based IoT devices has opened up numerous applications, from smart manufacturing to personalized healthcare. Despite the widespread adoption of energy-efficient microcontroller units (MCUs) in the Tiny…

Machine Learning · Computer Science 2024-09-26 Giorgos Armeniakos , Georgios Mentzos , Dimitrios Soudris

ProjectionNet: Learning Efficient On-Device Deep Networks Using Neural Projections

Deep neural networks have become ubiquitous for applications related to visual recognition and language understanding tasks. However, it is often prohibitive to use typical neural networks on devices like mobile phones or smart watches…

Machine Learning · Computer Science 2017-08-10 Sujith Ravi

Value-Driven Mixed-Precision Quantization for Patch-Based Inference on Microcontrollers

Deploying neural networks on microcontroller units (MCUs) presents substantial challenges due to their constrained computation and memory resources. Previous researches have explored patch-based inference as a strategy to conserve memory…

Computer Vision and Pattern Recognition · Computer Science 2024-01-26 Wei Tao , Shenglin He , Kai Lu , Xiaoyang Qu , Guokuan Li , Jiguang Wan , Jianzong Wang , Jing Xiao

Condensation-Net: Memory-Efficient Network Architecture with Cross-Channel Pooling Layers and Virtual Feature Maps

"Lightweight convolutional neural networks" is an important research topic in the field of embedded vision. To implement image recognition tasks on a resource-limited hardware platform, it is necessary to reduce the memory size and the…

Computer Vision and Pattern Recognition · Computer Science 2021-04-30 Tse-Wei Chen , Motoki Yoshinaga , Hongxing Gao , Wei Tao , Dongchao Wen , Junjie Liu , Kinya Osa , Masami Kato

Neural Network Quantization for Microcontrollers: A Comprehensive Survey of Methods, Platforms, and Applications

The deployment of Quantized Neural Networks (QNNs) on resource-constrained edge devices, such as microcontrollers (MCUs), introduces fundamental challenges in balancing model performance, computational complexity, and memory constraints.…

Machine Learning · Computer Science 2026-01-08 Hamza A. Abushahla , Dara Varam , Ariel Justine N. Panopio , Mohamed I. AlHajri

Neural Knitworks: Patched Neural Implicit Representation Networks

Coordinate-based Multilayer Perceptron (MLP) networks, despite being capable of learning neural implicit representations, are not performant for internal image synthesis applications. Convolutional Neural Networks (CNNs) are typically used…

Computer Vision and Pattern Recognition · Computer Science 2024-04-16 Mikolaj Czerkawski , Javier Cardona , Robert Atkinson , Craig Michie , Ivan Andonovic , Carmine Clemente , Christos Tachtatzis