Related papers: VW-SDK: Efficient Convolutional Weight Mapping Usi…

TetrisG-SDK: Efficient Convolutional Layer Mapping with Adaptive Windows and Grouped Convolutions for Fast In-Memory Computing

Shifted-and-Duplicated-Kernel (SDK) mapping has emerged as an effective strategy to accelerate convolutional layers on compute-in-memory (CIM) hardware. However, existing SDK variants (e.g., VWC-SDK) merely optimize mapping for a single CIM…

Hardware Architecture · Computer Science 2026-04-29 Ke Dong , Kejie Huang , Tao Luo , Bo Wang

An Area and Energy Efficient Design of Domain-Wall Memory-Based Deep Convolutional Neural Networks using Stochastic Computing

With recent trend of wearable devices and Internet of Things (IoTs), it becomes attractive to develop hardware-based deep convolutional neural networks (DCNNs) for embedded applications, which require low power/energy consumptions and small…

Neural and Evolutionary Computing · Computer Science 2018-02-06 Xiaolong Ma , Yipeng Zhang , Geng Yuan , Ao Ren , Zhe Li , Jie Han , Jingtong Hu , Yanzhi Wang

Efficient Convolutional Neural Networks for Pixelwise Classification on Heterogeneous Hardware Systems

This work presents and analyzes three convolutional neural network (CNN) models for efficient pixelwise classification of images. When using convolutional neural networks to classify single pixels in patches of a whole image, a lot of…

Computer Vision and Pattern Recognition · Computer Science 2015-09-14 Fabian Tschopp

Sparse Systolic Tensor Array for Efficient CNN Hardware Acceleration

Convolutional neural network (CNN) inference on mobile devices demands efficient hardware acceleration of low-precision (INT8) general matrix multiplication (GEMM). Exploiting data sparsity is a common approach to further accelerate GEMM…

Hardware Architecture · Computer Science 2020-10-14 Zhi-Gang Liu , Paul N. Whatmough , Matthew Mattina

SWIM: Selective Write-Verify for Computing-in-Memory Neural Accelerators

Computing-in-Memory architectures based on non-volatile emerging memories have demonstrated great potential for deep neural network (DNN) acceleration thanks to their high energy efficiency. However, these emerging devices can suffer from…

Machine Learning · Computer Science 2022-10-10 Zheyu Yan , Xiaobo Sharon Hu , Yiyu Shi

SlimConv: Reducing Channel Redundancy in Convolutional Neural Networks by Weights Flipping

The channel redundancy in feature maps of convolutional neural networks (CNNs) results in the large consumption of memories and computational resources. In this work, we design a novel Slim Convolution (SlimConv) module to boost the…

Computer Vision and Pattern Recognition · Computer Science 2021-09-08 Jiaxiong Qiu , Cai Chen , Shuaicheng Liu , Bing Zeng

Content-Aware Convolutional Neural Networks

Convolutional Neural Networks (CNNs) have achieved great success due to the powerful feature learning ability of convolution layers. Specifically, the standard convolution traverses the input images/features using a sliding window scheme to…

Computer Vision and Pattern Recognition · Computer Science 2021-07-26 Yong Guo , Yaofo Chen , Mingkui Tan , Kui Jia , Jian Chen , Jingdong Wang

ConvPIM: Evaluating Digital Processing-in-Memory through Convolutional Neural Network Acceleration

Processing-in-memory (PIM) architectures are emerging to reduce data movement in data-intensive applications. These architectures seek to exploit the same physical devices for both information storage and logic, thereby dwarfing the…

Hardware Architecture · Computer Science 2023-05-09 Orian Leitersdorf , Ronny Ronen , Shahar Kvatinsky

PSCNN: A 885.86 TOPS/W Programmable SRAM-based Computing-In-Memory Processor for Keyword Spotting

Computing-in-memory (CIM) has attracted significant attentions in recent years due to its massive parallelism and low power consumption. However, current CIM designs suffer from large area overhead of small CIM macros and bad programmablity…

Hardware Architecture · Computer Science 2022-05-04 Shu-Hung Kuo , Tian-Sheuan Chang

Towards Design Space Exploration and Optimization of Fast Algorithms for Convolutional Neural Networks (CNNs) on FPGAs

Convolutional Neural Networks (CNNs) have gained widespread popularity in the field of computer vision and image processing. Due to huge computational requirements of CNNs, dedicated hardware-based implementations are being explored to…

Signal Processing · Electrical Eng. & Systems 2019-03-06 Afzal Ahmad , Muhammad Adeel Pasha

ProactivePIM: Accelerating Weight-Sharing Embedding Layer with PIM for Scalable Recommendation System

Although deep learning-based personalized recommendation systems provide qualified recommendations, they strain data center resources. The main bottleneck is the embedding layer, which is highly memory-intensive due to its sparse, irregular…

Hardware Architecture · Computer Science 2025-11-26 Youngsuk Kim , Junghwan Lim , Hyuk-Jae Lee , Chae Eun Rhee

NAND-SPIN-Based Processing-in-MRAM Architecture for Convolutional Neural Network Acceleration

The performance and efficiency of running large-scale datasets on traditional computing systems exhibit critical bottlenecks due to the existing "power wall" and "memory wall" problems. To resolve those problems, processing-in-memory (PIM)…

Hardware Architecture · Computer Science 2022-04-22 Yinglin Zhao , Jianlei Yang , Bing Li , Xingzhou Cheng , Xucheng Ye , Xueyan Wang , Xiaotao Jia , Zhaohao Wang , Youguang Zhang , Weisheng Zhao

Introducing Hann windows for reducing edge-effects in patch-based image segmentation

There is a limitation in the size of an image that can be processed using computationally demanding methods such as e.g. Convolutional Neural Networks (CNNs). Some imaging modalities - notably biological and medical - can result in images…

Image and Video Processing · Electrical Eng. & Systems 2020-07-01 Nicolas Pielawski , Carolina Wählby

U-SWIM: Universal Selective Write-Verify for Computing-in-Memory Neural Accelerators

Architectures that incorporate Computing-in-Memory (CiM) using emerging non-volatile memory (NVM) devices have become strong contenders for deep neural network (DNN) acceleration due to their impressive energy efficiency. Yet, a significant…

Hardware Architecture · Computer Science 2024-01-12 Zheyu Yan , Xiaobo Sharon Hu , Yiyu Shi

5 Parallel Prism: A topology for pipelined implementations of convolutional neural networks using computational memory

In-memory computing is an emerging computing paradigm that could enable deeplearning inference at significantly higher energy efficiency and reduced latency. The essential idea is to map the synaptic weights corresponding to each layer to…

Machine Learning · Computer Science 2019-06-11 Martino Dazzi , Abu Sebastian , Pier Andrea Francese , Thomas Parnell , Luca Benini , Evangelos Eleftheriou

VWA: Hardware Efficient Vectorwise Accelerator for Convolutional Neural Network

Hardware accelerators for convolution neural networks (CNNs) enable real-time applications of artificial intelligence technology. However, most of the existing designs suffer from low hardware utilization or high area cost due to complex…

Hardware Architecture · Computer Science 2022-05-06 Kuo-Wei Chang , Tian-Sheuan Chang

DDC-PIM: Efficient Algorithm/Architecture Co-design for Doubling Data Capacity of SRAM-based Processing-In-Memory

Processing-in-memory (PIM), as a novel computing paradigm, provides significant performance benefits from the aspect of effective data movement reduction. SRAM-based PIM has been demonstrated as one of the most promising candidates due to…

Hardware Architecture · Computer Science 2023-11-01 Cenlin Duan , Jianlei Yang , Xiaolin He , Yingjie Qi , Yikun Wang , Yiou Wang , Ziyan He , Bonan Yan , Xueyan Wang , Xiaotao Jia , Weitao Pan , Weisheng Zhao

DWM: A Decomposable Winograd Method for Convolution Acceleration

Winograd's minimal filtering algorithm has been widely used in Convolutional Neural Networks (CNNs) to reduce the number of multiplications for faster processing. However, it is only effective on convolutions with kernel size as 3x3 and…

Machine Learning · Computer Science 2020-02-06 Di Huang , Xishan Zhang , Rui Zhang , Tian Zhi , Deyuan He , Jiaming Guo , Chang Liu , Qi Guo , Zidong Du , Shaoli Liu , Tianshi Chen , Yunji Chen

No More Sliding Window: Efficient 3D Medical Image Segmentation with Differentiable Top-k Patch Sampling

3D models surpass 2D models in CT/MRI segmentation by effectively capturing inter-slice relationships. However, the added depth dimension substantially increases memory consumption. While patch-based training alleviates memory constraints,…

Image and Video Processing · Electrical Eng. & Systems 2025-06-30 Young Seok Jeon , Hongfei Yang , Huazhu Fu , Mengling Feng

Sliding Window Sum Algorithms for Deep Neural Networks

Sliding window sums are widely used for string indexing, hashing and time series analysis. We have developed a family of the generic vectorized sliding sum algorithms that provide speedup of O(P/w) for window size $w$ and number of…

Machine Learning · Computer Science 2023-05-29 Roman Snytsar