Related papers: Split CNN Inference on Networked Microcontrollers

MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning

Tiny deep learning on microcontroller units (MCUs) is challenging due to the limited memory size. We find that the memory bottleneck is due to the imbalanced memory distribution in convolutional neural network (CNN) designs: the first…

Computer Vision and Pattern Recognition · Computer Science 2024-04-04 Ji Lin , Wei-Ming Chen , Han Cai , Chuang Gan , Song Han

vMCU: Coordinated Memory Management and Kernel Optimization for DNN Inference on MCUs

IoT devices based on microcontroller units (MCU) provide ultra-low power consumption and ubiquitous computation for near-sensor deep learning models (DNN). However, the memory of MCU is usually 2-3 orders of magnitude smaller than mobile…

Hardware Architecture · Computer Science 2024-06-12 Size Zheng , Renze Chen , Meng Li , Zihao Ye , Luis Ceze , Yun Liang

On-Sensor Convolutional Neural Networks with Early-Exits

Tiny Machine Learning (TinyML) is a novel research field aiming at integrating Machine Learning (ML) within embedded devices with limited memory, computation, and energy. Recently, a new branch of TinyML has emerged, focusing on integrating…

Machine Learning · Computer Science 2025-06-03 Hazem Hesham Yousef Shalby , Arianna De Vecchi , Alice Scandelli , Pietro Bartoli , Diana Trojaniello , Manuel Roveri , Federica Villa

Neural networks on microcontrollers: saving memory at inference via operator reordering

Designing deep learning models for highly-constrained hardware would allow imbuing many edge devices with intelligence. Microcontrollers (MCUs) are an attractive platform for building smart devices due to their low cost, wide availability,…

Machine Learning · Computer Science 2020-03-04 Edgar Liberis , Nicholas D. Lane

msf-CNN: Patch-based Multi-Stage Fusion with Convolutional Neural Networks for TinyML

AI spans from large language models to tiny models running on microcontrollers (MCUs). Extremely memory-efficient model architectures are decisive to fit within an MCU's tiny memory budget e.g., 128kB of RAM. However, inference latency must…

Machine Learning · Computer Science 2025-10-20 Zhaolan Huang , Emmanuel Baccelli

DPU or GPU for Accelerating Neural Networks Inference -- Why not both? Split CNN Inference

Video and image streaming on edge devices requires low latency. To address this, Neural Networks (NNs) are widely used, and prior work mainly focuses on accelerating them with single hardware units such as Graphics Processing Units (GPUs),…

Hardware Architecture · Computer Science 2026-05-04 Ali Emre Oztas , Mahir Demir , James Garside , Mikel Luj'an

Accelerating TinyML Inference on Microcontrollers through Approximate Kernels

The rapid growth of microcontroller-based IoT devices has opened up numerous applications, from smart manufacturing to personalized healthcare. Despite the widespread adoption of energy-efficient microcontroller units (MCUs) in the Tiny…

Machine Learning · Computer Science 2024-09-26 Giorgos Armeniakos , Georgios Mentzos , Dimitrios Soudris

Neural Network Quantization for Microcontrollers: A Comprehensive Survey of Methods, Platforms, and Applications

The deployment of Quantized Neural Networks (QNNs) on resource-constrained edge devices, such as microcontrollers (MCUs), introduces fundamental challenges in balancing model performance, computational complexity, and memory constraints.…

Machine Learning · Computer Science 2026-01-08 Hamza A. Abushahla , Dara Varam , Ariel Justine N. Panopio , Mohamed I. AlHajri

Splitting Convolutional Neural Network Structures for Efficient Inference

For convolutional neural networks (CNNs) that have a large volume of input data, memory management becomes a major concern. Memory cost reduction can be an effective way to deal with these problems that can be realized through different…

Computer Vision and Pattern Recognition · Computer Science 2020-02-11 Emad MalekHosseini , Mohsen Hajabdollahi , Nader Karimi , Shadrokh Samavi , Shahram Shirani

Enabling Large Neural Networks on Tiny Microcontrollers with Swapping

Running neural networks (NNs) on microcontroller units (MCUs) is becoming increasingly important, but is very difficult due to the tiny SRAM size of MCU. Prior work proposes many algorithm-level techniques to reduce NN memory footprints,…

Hardware Architecture · Computer Science 2021-09-02 Hongyu Miao , Felix Xiaozhu Lin

SpArSe: Sparse Architecture Search for CNNs on Resource-Constrained Microcontrollers

The vast majority of processors in the world are actually microcontroller units (MCUs), which find widespread use performing simple control tasks in applications ranging from automobiles to medical devices and office equipment. The Internet…

Machine Learning · Computer Science 2019-05-30 Igor Fedorov , Ryan P. Adams , Matthew Mattina , Paul N. Whatmough

AutoDiCE: Fully Automated Distributed CNN Inference at the Edge

Deep Learning approaches based on Convolutional Neural Networks (CNNs) are extensively utilized and very successful in a wide range of application areas, including image classification and speech recognition. For the execution of trained…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-07-26 Xiaotian Guo , Andy D. Pimentel , Todor Stefanov

Efficient Neural Networks for Tiny Machine Learning: A Comprehensive Review

The field of Tiny Machine Learning (TinyML) has gained significant attention due to its potential to enable intelligent applications on resource-constrained devices. This review provides an in-depth analysis of the advancements in efficient…

Machine Learning · Statistics 2023-11-21 Minh Tri Lê , Pierre Wolinski , Julyan Arbel

Efficient CNN Inference on Ultra-Low-Power MCUs via Saturation-Aware Convolution

Quantized CNN inference on ultra-low-power MCUs incurs unnecessary computations in neurons that produce saturated output values. These values are too extreme and are eventually clamped to the boundaries allowed by the neuron. Often times,…

Systems and Control · Electrical Eng. & Systems 2026-02-27 Shiming Li , Luca Mottola , Yuan Yao , Stefanos Kaxiras

Balanced segmentation of CNNs for multi-TPU inference

In this paper, we propose different alternatives for convolutional neural networks (CNNs) segmentation, addressing inference processes on computing architectures composed by multiple Edge TPUs. Specifically, we compare the inference…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-03-04 Jorge Villarrubia , Luis Costero , Francisco D. Igual , Katzalin Olcoz

Designing Extremely Memory-Efficient CNNs for On-device Vision Tasks

In this paper, we introduce a memory-efficient CNN (convolutional neural network), which enables resource-constrained low-end embedded and IoT devices to perform on-device vision tasks, such as image classification and object detection,…

Computer Vision and Pattern Recognition · Computer Science 2024-10-15 Jaewook Lee , Yoel Park , Seulki Lee

Optimizing CNN Model Inference on CPUs

The popularity of Convolutional Neural Network (CNN) models and the ubiquity of CPUs imply that better performance of CNN model inference on CPUs can deliver significant gain to a large number of users. To improve the performance of CNN…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-07-09 Yizhi Liu , Yao Wang , Ruofei Yu , Mu Li , Vin Sharma , Yida Wang

A flexible FPGA accelerator for convolutional neural networks

Though CNNs are highly parallel workloads, in the absence of efficient on-chip memory reuse techniques, an accelerator for them quickly becomes memory bound. In this paper, we propose a CNN accelerator design for inference that is able to…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-08-26 Kingshuk Majumder , Shubham Nema , Uday Bondhugula

MicroNets: Neural Network Architectures for Deploying TinyML Applications on Commodity Microcontrollers

Executing machine learning workloads locally on resource constrained microcontrollers (MCUs) promises to drastically expand the application space of IoT. However, so-called TinyML presents severe technical challenges, as deep neural network…

Machine Learning · Computer Science 2021-04-14 Colby Banbury , Chuteng Zhou , Igor Fedorov , Ramon Matas Navarro , Urmish Thakker , Dibakar Gope , Vijay Janapa Reddi , Matthew Mattina , Paul N. Whatmough

MCUNet: Tiny Deep Learning on IoT Devices

Machine learning on tiny IoT devices based on microcontroller units (MCU) is appealing but challenging: the memory of microcontrollers is 2-3 orders of magnitude smaller even than mobile phones. We propose MCUNet, a framework that jointly…

Computer Vision and Pattern Recognition · Computer Science 2020-11-20 Ji Lin , Wei-Ming Chen , Yujun Lin , John Cohn , Chuang Gan , Song Han