Related papers: Pex: Memory-efficient Microcontroller Deep Learnin…

StruM: Structured Mixed Precision for Efficient Deep Learning Hardware Codesign

In this paper, we propose StruM, a novel structured mixed-precision-based deep learning inference method, co-designed with its associated hardware accelerator (DPU), to address the escalating computational and memory demands of deep…

Hardware Architecture · Computer Science 2025-05-20 Michael Wu , Arnab Raha , Deepak A. Mathaikutty , Martin Langhammer , Engin Tunali , Daksha Sharma

MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning

Tiny deep learning on microcontroller units (MCUs) is challenging due to the limited memory size. We find that the memory bottleneck is due to the imbalanced memory distribution in convolutional neural network (CNN) designs: the first…

Computer Vision and Pattern Recognition · Computer Science 2024-04-04 Ji Lin , Wei-Ming Chen , Han Cai , Chuang Gan , Song Han

vMCU: Coordinated Memory Management and Kernel Optimization for DNN Inference on MCUs

IoT devices based on microcontroller units (MCU) provide ultra-low power consumption and ubiquitous computation for near-sensor deep learning models (DNN). However, the memory of MCU is usually 2-3 orders of magnitude smaller than mobile…

Hardware Architecture · Computer Science 2024-06-12 Size Zheng , Renze Chen , Meng Li , Zihao Ye , Luis Ceze , Yun Liang

Neural networks on microcontrollers: saving memory at inference via operator reordering

Designing deep learning models for highly-constrained hardware would allow imbuing many edge devices with intelligence. Microcontrollers (MCUs) are an attractive platform for building smart devices due to their low cost, wide availability,…

Machine Learning · Computer Science 2020-03-04 Edgar Liberis , Nicholas D. Lane

A Low-Power Sparse Deep Learning Accelerator with Optimized Data Reuse

Sparse deep learning has reduced computation significantly, but its irregular non-zero data distribution complicates the data flow and hinders data reuse, increasing on-chip SRAM access and thus power consumption of the chip. This paper…

Hardware Architecture · Computer Science 2025-03-26 Kai-Chieh Hsu , Tian-Sheuan Chang

MCUFormer: Deploying Vision Transformers on Microcontrollers with Limited Memory

Due to the high price and heavy energy consumption of GPUs, deploying deep models on IoT devices such as microcontrollers makes significant contributions for ecological AI. Conventional methods successfully enable convolutional neural…

Computer Vision and Pattern Recognition · Computer Science 2023-12-22 Yinan Liang , Ziwei Wang , Xiuwei Xu , Yansong Tang , Jie Zhou , Jiwen Lu

Optimizing the flash-RAM energy trade-off in deeply embedded systems

Deeply embedded systems often have the tightest constraints on energy consumption, requiring that they consume tiny amounts of current and run on batteries for years. However, they typically execute code directly from flash, instead of the…

Other Computer Science · Computer Science 2021-04-13 James Pallister , Kerstin Eder , Simon Hollis

MEFT: Memory-Efficient Fine-Tuning through Sparse Adapter

Parameter-Efficient Fine-tuning (PEFT) facilitates the fine-tuning of Large Language Models (LLMs) under limited resources. However, the fine-tuning performance with PEFT on complex, knowledge-intensive tasks is limited due to the…

Computation and Language · Computer Science 2024-06-10 Jitai Hao , WeiWei Sun , Xin Xin , Qi Meng , Zhumin Chen , Pengjie Ren , Zhaochun Ren

Differentiable Network Pruning for Microcontrollers

Embedded and personal IoT devices are powered by microcontroller units (MCUs), whose extreme resource scarcity is a major obstacle for applications relying on on-device deep learning inference. Orders of magnitude less storage, memory and…

Machine Learning · Computer Science 2022-12-09 Edgar Liberis , Nicholas D. Lane

xMem: A CPU-Based Approach for Accurate Estimation of GPU Memory in Deep Learning Training Workloads

The global scarcity of GPUs necessitates more sophisticated strategies for Deep Learning jobs in shared cluster environments. Accurate estimation of how much GPU memory a job will require is fundamental to enabling advanced scheduling and…

Performance · Computer Science 2025-10-27 Jiabo Shi , Dimitrios Pezaros , Yehia Elkhatib

Split CNN Inference on Networked Microcontrollers

Running deep neural networks on microcontroller units (MCUs) is severely constrained by limited memory resources. While TinyML techniques reduce model size and computation, they often fail in practice due to excessive peak Random Access…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-12 Junyu Lu , Shashwath Suresh , Hao Liu , Qi Hong , Qing Wang

$\mu$NAS: Constrained Neural Architecture Search for Microcontrollers

IoT devices are powered by microcontroller units (MCUs) which are extremely resource-scarce: a typical MCU may have an underpowered processor and around 64 KB of memory and persistent storage, which is orders of magnitude fewer…

Machine Learning · Computer Science 2020-12-09 Edgar Liberis , Łukasz Dudziak , Nicholas D. Lane

Taking a Look into Execute-Only Memory

The development process of microcontroller firmware often involves multiple parties. In such a scenario, the Intellectual Property (IP) is not protected against adversarial developers which have unrestricted access to the firmware binary.…

Cryptography and Security · Computer Science 2019-09-13 Marc Schink , Johannes Obermaier

Efficient Hardware Acceleration of Sparsely Active Convolutional Spiking Neural Networks

Spiking Neural Networks (SNNs) compute in an event-based matter to achieve a more efficient computation than standard Neural Networks. In SNNs, neuronal outputs (i.e. activations) are not encoded with real-valued activations but with…

Hardware Architecture · Computer Science 2023-08-08 Jan Sommer , M. Akif Özkan , Oliver Keszocze , Jürgen Teich

Lightweight Software Kernels and Hardware Extensions for Efficient Sparse Deep Neural Networks on Microcontrollers

The acceleration of pruned Deep Neural Networks (DNNs) on edge devices such as Microcontrollers (MCUs) is a challenging task, given the tight area- and power-constraints of these devices. In this work, we propose a three-fold contribution…

Machine Learning · Computer Science 2025-03-20 Francesco Daghero , Daniele Jahier Pagliari , Francesco Conti , Luca Benini , Massimo Poncino , Alessio Burrello

MCUNet: Tiny Deep Learning on IoT Devices

Machine learning on tiny IoT devices based on microcontroller units (MCU) is appealing but challenging: the memory of microcontrollers is 2-3 orders of magnitude smaller even than mobile phones. We propose MCUNet, a framework that jointly…

Computer Vision and Pattern Recognition · Computer Science 2020-11-20 Ji Lin , Wei-Ming Chen , Yujun Lin , John Cohn , Chuang Gan , Song Han

APEX: A High-Performance Learned Index on Persistent Memory

The recently released persistent memory (PM) offers high performance, persistence, and is cheaper than DRAM. This opens up new possibilities for indexes that operate and persist data directly on the memory bus. Recent learned indexes…

Databases · Computer Science 2021-12-07 Baotong Lu , Jialin Ding , Eric Lo , Umar Farooq Minhas , Tianzheng Wang

An Experimental Exploration of In-Memory Computing for Multi-Layer Perceptrons

In modern computer architectures, the performance of many memory-bound workloads (e.g., machine learning, graph processing, databases) is limited by the data movement bottleneck that emerges when transferring large amounts of data between…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-08-12 Pedro Carrinho , Hamid Moghadaspour , Oscar Ferraz , João Dinis Ferreira , Yann Falevoz , Vitor Silva , Gabriel Falcao

MEC: Memory-efficient Convolution for Deep Neural Network

Convolution is a critical component in modern deep neural networks, thus several algorithms for convolution have been developed. Direct convolution is simple but suffers from poor performance. As an alternative, multiple indirect methods…

Machine Learning · Computer Science 2017-06-22 Minsik Cho , Daniel Brand

Improving Inference Performance of Machine Learning with the Divide-and-Conquer Principle

Many popular machine learning models scale poorly when deployed on CPUs. In this paper we explore the reasons why and propose a simple, yet effective approach based on the well-known Divide-and-Conquer Principle to tackle this problem of…

Machine Learning · Computer Science 2023-03-03 Alex Kogan