Related papers: MASIM: An Efficient Multi-Array Scheduler for In-M…

High-Quality Iterative Logic Compiler for In-Memory SIMD Computation with Tight Coupling of Synthesis and Scheduling

In-memory computing (IMC) with single instruction multiple data (SIMD) setup enables memory to perform operations on the stored data in parallel to achieve high throughput and energy saving. To instruct a SIMD IMC hardware to compute a…

Emerging Technologies · Computer Science 2024-12-04 Xingyue Qian , Chenyang Lv , Zhezhi He , Weikang Qian

FAST: A Fully-Concurrent Access Technique to All SRAM Rows for Enhanced Speed and Energy Efficiency in Data-Intensive Applications

Compute-in-memory (CiM) is a promising approach to improving the computing speed and energy efficiency in dataintensive applications. Beyond existing CiM techniques of bitwise logic-in-memory operations and dot product operations, this…

Hardware Architecture · Computer Science 2023-01-03 Yiming Chen , Yushen Fu , Mingyen Lee , Sumitha George , Yongpan Liu , Vijaykrishnan Narayanan , Huazhong Yang , Xueqing Li

CLSA-CIM: A Cross-Layer Scheduling Approach for Computing-in-Memory Architectures

The demand for efficient machine learning (ML) accelerators is growing rapidly, driving the development of novel computing concepts such as resistive random access memory (RRAM)-based tiled computing-in-memory (CIM) architectures. CIM…

Hardware Architecture · Computer Science 2024-01-18 Rebecca Pelke , Jose Cubero-Cascante , Nils Bosbach , Felix Staudigl , Rainer Leupers , Jan Moritz Joseph

Concurrent Processing Memory

A theoretical memory with limited processing power and internal connectivity at each element is proposed. This memory carries out parallel processing within itself to solve generic array problems. The applicability of this in-memory…

Distributed, Parallel, and Cluster Computing · Computer Science 2010-09-28 Chengpu Wang

MIMDRAM: An End-to-End Processing-Using-DRAM System for High-Throughput, Energy-Efficient and Programmer-Transparent Multiple-Instruction Multiple-Data Processing

Processing-using-DRAM (PUD) is a processing-in-memory (PIM) approach that uses a DRAM array's massive internal parallelism to execute very-wide data-parallel operations, in a single-instruction multiple-data (SIMD) fashion. However, DRAM…

Hardware Architecture · Computer Science 2024-03-05 Geraldo F. Oliveira , Ataberk Olgun , Abdullah Giray Yağlıkçı , F. Nisa Bostancı , Juan Gómez-Luna , Saugata Ghose , Onur Mutlu

CADS: Core-Aware Dynamic Scheduler for Multicore Memory Controllers

Memory controller scheduling is crucial in multicore processors, where DRAM bandwidth is shared. Since increased number of requests from multiple cores of processors becomes a source of bottleneck, scheduling the requests efficiently is…

Hardware Architecture · Computer Science 2019-07-19 Eduardo Olmedo Sanchez , Xian-He Sun

SynDCIM: A Performance-Aware Digital Computing-in-Memory Compiler with Multi-Spec-Oriented Subcircuit Synthesis

Digital Computing-in-Memory (DCIM) is an innovative technology that integrates multiply-accumulation (MAC) logic directly into memory arrays to enhance the performance of modern AI computing. However, the need for customized memory cells…

Hardware Architecture · Computer Science 2025-01-07 Kunming Shao , Fengshi Tian , Xiaomeng Wang , Jiakun Zheng , Jia Chen , Jingyu He , Hui Wu , Jinbo Chen , Xihao Guan , Yi Deng , Fengbin Tu , Jie Yang , Mohamad Sawan , Tim Kwang-Ting Cheng , Chi-Ying Tsui

Computing-In-Memory Aware Model Adaption For Edge Devices

Computing-in-Memory (CIM) macros have gained popularity for deep learning acceleration due to their highly parallel computation and low power consumption. However, limited macro size and ADC precision introduce throughput and accuracy…

Hardware Architecture · Computer Science 2026-05-01 Ming-Han Lin , Tian-Sheuan Chang

Custom Memory Design for Logic-in-Memory: Drawbacks and Improvements over Conventional Memories

The speed of modern digital systems is severely limited by memory latency (the ``Memory Wall'' problem). Data exchange between Logic and Memory is also responsible for a large part of the system energy consumption. Logic--In--Memory (LiM)…

Hardware Architecture · Computer Science 2023-04-14 Fabrizio Ottati , Giovanna Turvani , Marco Vacca , Guido Masera

Efficient Multi-Cycle Folded Integer Multipliers

Fast combinational multipliers with large bit widths can occupy significant silicon area, which also drives up power consumption. Area can be reduced through resource sharing (i.e., folding) at the expense of lower throughput, which is…

Hardware Architecture · Computer Science 2025-09-03 Ahmad Houraniah , H. Fatih Ugurdag , C. Emre Dedeagac

MIME: Adapting a Single Neural Network for Multi-task Inference with Memory-efficient Dynamic Pruning

Recent years have seen a paradigm shift towards multi-task learning. This calls for memory and energy-efficient solutions for inference in a multi-task scenario. We propose an algorithm-hardware co-design approach called MIME. MIME reuses…

Machine Learning · Computer Science 2022-06-22 Abhiroop Bhattacharjee , Yeshwanth Venkatesha , Abhishek Moitra , Priyadarshini Panda

CIM-MLC: A Multi-level Compilation Stack for Computing-In-Memory Accelerators

In recent years, various computing-in-memory (CIM) processors have been presented, showing superior performance over traditional architectures. To unleash the potential of various CIM architectures, such as device precision, crossbar size,…

Hardware Architecture · Computer Science 2024-05-09 Songyun Qu , Shixin Zhao , Bing Li , Yintao He , Xuyi Cai , Lei Zhang , Ying Wang

Breaking Barriers: Maximizing Array Utilization for Compute In-Memory Fabrics

Compute in-memory (CIM) is a promising technique that minimizes data transport, the primary performance bottleneck and energy cost of most data intensive applications. This has found wide-spread adoption in accelerating neural networks for…

Hardware Architecture · Computer Science 2020-08-18 Brian Crafton , Samuel Spetalnick , Gauthaman Murali , Tushar Krishna , Sung-Kyu Lim , Arijit Raychowdhury

WWW: What, When, Where to Compute-in-Memory

Matrix multiplication is the dominant computation during Machine Learning (ML) inference. To efficiently perform such multiplication operations, Compute-in-memory (CiM) paradigms have emerged as a highly energy efficient solution. However,…

Hardware Architecture · Computer Science 2025-03-03 Tanvi Sharma , Mustafa Ali , Indranil Chakraborty , Kaushik Roy

SIMDRAM: An End-to-End Framework for Bit-Serial SIMD Computing in DRAM

Processing-using-DRAM has been proposed for a limited set of basic operations (i.e., logic operations, addition). However, in order to enable full adoption of processing-using-DRAM, it is necessary to provide support for more complex…

Hardware Architecture · Computer Science 2021-07-01 Nastaran Hajinazar , Geraldo F. Oliveira , Sven Gregorio , João Ferreira , Nika Mansouri Ghiasi , Minesh Patel , Mohammed Alser , Saugata Ghose , Juan Gómez Luna , Onur Mutlu

SimplePIM: A Software Framework for Productive and Efficient Processing-in-Memory

Data movement between memory and processors is a major bottleneck in modern computing systems. The processing-in-memory (PIM) paradigm aims to alleviate this bottleneck by performing computation inside memory chips. Real PIM hardware (e.g.,…

Hardware Architecture · Computer Science 2023-10-04 Jinfan Chen , Juan Gómez-Luna , Izzat El Hajj , Yuxin Guo , Onur Mutlu

NASiC: 3D NAND-based CAM-Selected Multibit CIM Architecture for Efficient On-Device Mixture-of-Experts LLM Inference

The Mixture-of-Experts (MoE) models have emerged as the state-of-the-art paradigm for scaling up large language models (LLMs) without proportionally increased computational cost. However, its on-device deployment faces a critical challenge…

Hardware Architecture · Computer Science 2026-05-25 Weikai Xu , Meng Li , Shuzhang Zhong , Tianyang Luo , Dongxue Zhao , Ling Liang , Zongwei Wang , Qianqian Huang , Yimao Cai , Ru Huang

Search-in-Memory (SiM): Reliable, Versatile, and Efficient Data Matching in SSD's NAND Flash Memory Chip for Data Indexing Acceleration

To index the increasing volume of data, modern data indexes are typically stored on SSDs and cached in DRAM. However, searching such an index has resulted in significant I/O traffic due to limited access locality and inefficient cache…

Hardware Architecture · Computer Science 2024-08-05 Yun-Chih Chen , Yuan-Hao Chang , Tei-Wei Kuo

Eva-CiM: A System-Level Performance and Energy Evaluation Framework for Computing-in-Memory Architectures

Computing-in-Memory (CiM) architectures aim to reduce costly data transfers by performing arithmetic and logic operations in memory and hence relieve the pressure due to the memory wall. However, determining whether a given workload can…

Hardware Architecture · Computer Science 2020-01-16 Di Gao , Dayane Reis , Xiaobo Sharon Hu , Cheng Zhuo

Real-Time Scheduling of Machine Learning Operations on Heterogeneous Neuromorphic SoC

Neuromorphic Systems-on-Chip (NSoCs) are becoming heterogeneous by integrating general-purpose processors (GPPs) and neural processing units (NPUs) on the same SoC. For embedded systems, an NSoC may need to execute user applications built…

Hardware Architecture · Computer Science 2022-09-30 Anup Das