Related papers: Compiling Neural Networks for a Computational Memo…

CIM-MLC: A Multi-level Compilation Stack for Computing-In-Memory Accelerators

In recent years, various computing-in-memory (CIM) processors have been presented, showing superior performance over traditional architectures. To unleash the potential of various CIM architectures, such as device precision, crossbar size,…

Hardware Architecture · Computer Science 2024-05-09 Songyun Qu , Shixin Zhao , Bing Li , Yintao He , Xuyi Cai , Lei Zhang , Ying Wang

Compiling Halide Programs to Push-Memory Accelerators

Image processing and machine learning applications benefit tremendously from hardware acceleration, but existing compilers target either FPGAs, which sacrifice power and performance for flexible hardware, or ASICs, which rapidly become…

Hardware Architecture · Computer Science 2021-05-28 Qiaoyi Liu , Dillon Huff , Jeff Setter , Maxwell Strange , Kathleen Feng , Kavya Sreedhar , Ziheng Wang , Keyi Zhang , Mark Horowitz , Priyanka Raina , Fredrik Kjolstad

Mixed-Precision Training and Compilation for RRAM-based Computing-in-Memory Accelerators

Computing-in-Memory (CIM) accelerators are a promising solution for accelerating Machine Learning (ML) workloads, as they perform Matrix-Vector Multiplications (MVMs) on crossbar arrays directly in memory. Although the bit widths of the…

Machine Learning · Computer Science 2026-03-20 Rebecca Pelke , Joel Klein , Jose Cubero-Cascante , Nils Bosbach , Jan Moritz Joseph , Rainer Leupers

MARS: Multi-macro Architecture SRAM CIM-Based Accelerator with Co-designed Compressed Neural Networks

Convolutional neural networks (CNNs) play a key role in deep learning applications. However, the large storage overheads and the substantial computation cost of CNNs are problematic in hardware accelerators. Computing-in-memory (CIM)…

Hardware Architecture · Computer Science 2021-05-26 Syuan-Hao Sie , Jye-Luen Lee , Yi-Ren Chen , Chih-Cheng Lu , Chih-Cheng Hsieh , Meng-Fan Chang , Kea-Tiong Tang

Modular Neural Computer

This paper introduces the Modular Neural Computer (MNC), a memory-augmented neural architecture for exact algorithmic computation on variable-length inputs. The model combines an external associative memory of scalar cells, explicit read…

Machine Learning · Computer Science 2026-03-17 Florin Leon

WWW: What, When, Where to Compute-in-Memory

Matrix multiplication is the dominant computation during Machine Learning (ML) inference. To efficiently perform such multiplication operations, Compute-in-memory (CiM) paradigms have emerged as a highly energy efficient solution. However,…

Hardware Architecture · Computer Science 2025-03-03 Tanvi Sharma , Mustafa Ali , Indranil Chakraborty , Kaushik Roy

Hardware-software co-exploration with racetrack memory based in-memory computing for CNN inference in embedded systems

Deep neural networks generate and process large volumes of data, posing challenges for low-resource embedded systems. In-memory computing has been demonstrated as an efficient computing infrastructure and shows promise for embedded AI…

Emerging Technologies · Computer Science 2025-07-03 Benjamin Chen Ming Choong , Tao Luo , Cheng Liu , Bingsheng He , Wei Zhang , Joey Tianyi Zhou

Be CIM or Be Memory: A Dual-mode-aware DNN Compiler for CIM Accelerators

Computing-in-memory (CIM) architectures demonstrate superior performance over traditional architectures. To unleash the potential of CIM accelerators, many compilation methods have been proposed, focusing on application scheduling…

Hardware Architecture · Computer Science 2025-02-25 Shixin Zhao , Yuming Li , Bing Li , Yintao He , Mengdi Wang , Yinhe Han , Ying Wang

PIMSIM-NN: An ISA-based Simulation Framework for Processing-in-Memory Accelerators

Processing-in-memory (PIM) has shown extraordinary potential in accelerating neural networks. To evaluate the performance of PIM accelerators, we present an ISA-based simulation framework including a dedicated ISA targeting neural networks…

Hardware Architecture · Computer Science 2024-02-29 Xinyu Wang , Xiaotian Sun , Yinhe Han , Xiaoming Chen

CINM (Cinnamon): A Compilation Infrastructure for Heterogeneous Compute In-Memory and Compute Near-Memory Paradigms

The rise of data-intensive applications exposed the limitations of conventional processor-centric von-Neumann architectures that struggle to meet the off-chip memory bandwidth demand. Therefore, recent innovations in computer architecture…

Hardware Architecture · Computer Science 2024-05-28 Asif Ali Khan , Hamid Farzaneh , Karl F. A. Friebel , Clément Fournier , Lorenzo Chelini , Jeronimo Castrillon

A High-Level Compiler Integration Approach for Deep Learning Accelerators Supporting Abstraction and Optimization

The growing adoption of domain-specific architectures in edge computing platforms for deep learning has highlighted the efficiency of hardware accelerators. However, integrating custom accelerators into modern machine learning (ML)…

Machine Learning · Computer Science 2025-07-08 Samira Ahmadifarsani , Daniel Mueller-Gritschneder , Ulf Schlichtmann

Near-Memory Computing: Past, Present, and Future

The conventional approach of moving data to the CPU for computation has become a significant performance bottleneck for emerging scale-out data-intensive applications due to their limited data reuse. At the same time, the advancement in 3D…

Hardware Architecture · Computer Science 2019-08-08 Gagandeep Singh , Lorenzo Chelini , Stefano Corda , Ahsan Javed Awan , Sander Stuijk , Roel Jordans , Henk Corporaal , Albert-Jan Boonstra

Bridging the Gap Between Neural Networks and Neuromorphic Hardware with A Neural Network Compiler

Different from developing neural networks (NNs) for general-purpose processors, the development for NN chips usually faces with some hardware-specific restrictions, such as limited precision of network signals and parameters, constrained…

Neural and Evolutionary Computing · Computer Science 2018-01-19 Yu Ji , YouHui Zhang , WenGuang Chen , Yuan Xie

Compiling a Calculus for Relaxed Memory: Practical constraint-based low-level concurrency

Crary and Sullivan's Relaxed Memory Calculus (RMC) proposed a new declarative approach for writing low-level shared memory concurrent programs in the presence of modern relaxed-memory multi-processor architectures and optimizing compilers.…

Programming Languages · Computer Science 2019-04-12 Michael J. Sullivan , Karl Crary , Salil Joshi

C4CAM: A Compiler for CAM-based In-memory Accelerators

Machine learning and data analytics applications increasingly suffer from the high latency and energy consumption of conventional von Neumann architectures. Recently, several in-memory and near-memory systems have been proposed to remove…

Hardware Architecture · Computer Science 2023-09-13 Hamid Farzaneh , João Paulo Cardoso de Lima , Mengyuan Li , Asif Ali Khan , Xiaobo Sharon Hu , Jeronimo Castrillon

Compilation and Optimizations for Efficient Machine Learning on Embedded Systems

Deep Neural Networks (DNNs) have achieved great success in a variety of machine learning (ML) applications, delivering high-quality inferencing solutions in computer vision, natural language processing, and virtual reality, etc. However,…

Machine Learning · Computer Science 2022-08-29 Xiaofan Zhang , Yao Chen , Cong Hao , Sitao Huang , Yuhong Li , Deming Chen

CoMoNM: A Cost Modeling Framework for Compute-Near-Memory Systems

Compute-Near-Memory (CNM) systems offer a promising approach to mitigate the von Neumann bottleneck by bringing computational units closer to data. However, optimizing for these architectures remains challenging due to their unique hardware…

Emerging Technologies · Computer Science 2025-08-18 Hamid Farzaneh , Asif Ali Khan , Jeronimo Castrillon

CMLCompiler: A Unified Compiler for Classical Machine Learning

Classical machine learning (CML) occupies nearly half of machine learning pipelines in production applications. Unfortunately, it fails to utilize the state-of-the-practice devices fully and performs poorly. Without a unified framework, the…

Machine Learning · Computer Science 2023-05-01 Xu Wen , Wanling Gao , Anzheng Li , Lei Wang , Zihan Jiang , Jianfeng Zhan

A flexible FPGA accelerator for convolutional neural networks

Though CNNs are highly parallel workloads, in the absence of efficient on-chip memory reuse techniques, an accelerator for them quickly becomes memory bound. In this paper, we propose a CNN accelerator design for inference that is able to…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-08-26 Kingshuk Majumder , Shubham Nema , Uday Bondhugula

CAMASim: A Comprehensive Simulation Framework for Content-Addressable Memory based Accelerators

Content addressable memory (CAM) stands out as an efficient hardware solution for memory-intensive search operations by supporting parallel computation in memory. However, developing a CAM-based accelerator architecture that achieves…

Hardware Architecture · Computer Science 2024-03-11 Mengyuan Li , Shiyi Liu , Mohammad Mehdi Sharifi , X. Sharon Hu