Related papers: Split CNN Inference on Networked Microcontrollers
Tiny deep learning on microcontroller units (MCUs) is challenging due to the limited memory size. We find that the memory bottleneck is due to the imbalanced memory distribution in convolutional neural network (CNN) designs: the first…
IoT devices based on microcontroller units (MCU) provide ultra-low power consumption and ubiquitous computation for near-sensor deep learning models (DNN). However, the memory of MCU is usually 2-3 orders of magnitude smaller than mobile…
Tiny Machine Learning (TinyML) is a novel research field aiming at integrating Machine Learning (ML) within embedded devices with limited memory, computation, and energy. Recently, a new branch of TinyML has emerged, focusing on integrating…
Designing deep learning models for highly-constrained hardware would allow imbuing many edge devices with intelligence. Microcontrollers (MCUs) are an attractive platform for building smart devices due to their low cost, wide availability,…
AI spans from large language models to tiny models running on microcontrollers (MCUs). Extremely memory-efficient model architectures are decisive to fit within an MCU's tiny memory budget e.g., 128kB of RAM. However, inference latency must…
Video and image streaming on edge devices requires low latency. To address this, Neural Networks (NNs) are widely used, and prior work mainly focuses on accelerating them with single hardware units such as Graphics Processing Units (GPUs),…
The rapid growth of microcontroller-based IoT devices has opened up numerous applications, from smart manufacturing to personalized healthcare. Despite the widespread adoption of energy-efficient microcontroller units (MCUs) in the Tiny…
The deployment of Quantized Neural Networks (QNNs) on resource-constrained edge devices, such as microcontrollers (MCUs), introduces fundamental challenges in balancing model performance, computational complexity, and memory constraints.…
For convolutional neural networks (CNNs) that have a large volume of input data, memory management becomes a major concern. Memory cost reduction can be an effective way to deal with these problems that can be realized through different…
Running neural networks (NNs) on microcontroller units (MCUs) is becoming increasingly important, but is very difficult due to the tiny SRAM size of MCU. Prior work proposes many algorithm-level techniques to reduce NN memory footprints,…
The vast majority of processors in the world are actually microcontroller units (MCUs), which find widespread use performing simple control tasks in applications ranging from automobiles to medical devices and office equipment. The Internet…
Deep Learning approaches based on Convolutional Neural Networks (CNNs) are extensively utilized and very successful in a wide range of application areas, including image classification and speech recognition. For the execution of trained…
The field of Tiny Machine Learning (TinyML) has gained significant attention due to its potential to enable intelligent applications on resource-constrained devices. This review provides an in-depth analysis of the advancements in efficient…
Quantized CNN inference on ultra-low-power MCUs incurs unnecessary computations in neurons that produce saturated output values. These values are too extreme and are eventually clamped to the boundaries allowed by the neuron. Often times,…
In this paper, we propose different alternatives for convolutional neural networks (CNNs) segmentation, addressing inference processes on computing architectures composed by multiple Edge TPUs. Specifically, we compare the inference…
In this paper, we introduce a memory-efficient CNN (convolutional neural network), which enables resource-constrained low-end embedded and IoT devices to perform on-device vision tasks, such as image classification and object detection,…
The popularity of Convolutional Neural Network (CNN) models and the ubiquity of CPUs imply that better performance of CNN model inference on CPUs can deliver significant gain to a large number of users. To improve the performance of CNN…
Though CNNs are highly parallel workloads, in the absence of efficient on-chip memory reuse techniques, an accelerator for them quickly becomes memory bound. In this paper, we propose a CNN accelerator design for inference that is able to…
Executing machine learning workloads locally on resource constrained microcontrollers (MCUs) promises to drastically expand the application space of IoT. However, so-called TinyML presents severe technical challenges, as deep neural network…
Machine learning on tiny IoT devices based on microcontroller units (MCU) is appealing but challenging: the memory of microcontrollers is 2-3 orders of magnitude smaller even than mobile phones. We propose MCUNet, a framework that jointly…