Related papers: In-memory Multi-valued Associative Processor

In-memory Associative Processors: Tutorial, Potential, and Challenges

In-memory computing is an emerging computing paradigm that overcomes the limitations of exiting Von-Neumann computing architectures such as the memory-wall bottleneck. In such paradigm, the computations are performed directly on the data…

Emerging Technologies · Computer Science 2022-04-14 Mohammed E. Fouda , Hasan Erdem Yantir , Ahmed M. Eltawil , Fadi Kurdahi

Sparse Matrix Multiplication On An Associative Processor

Sparse matrix multiplication is an important component of linear algebra computations. Implementing sparse matrix multiplication on an associative processor (AP) enables high level of parallelism, where a row of one matrix is multiplied in…

Mathematical Software · Computer Science 2017-05-23 L. Yavits , A. Morad , R. Ginosar

Hardware Generation and Exploration of Lookup Table-Based Accelerators for 1.58-bit LLM Inference

Ternary weight quantization (e.g., BitNet b1.58) offers a promising path to mitigate the memory bandwidth bottleneck in Large Language Model (LLM) inference. However, conventional compute platforms lack native support for ternary-weight…

Hardware Architecture · Computer Science 2026-04-29 Robin Geens , Joran Heldens , Joren Dumoulin , Marian Verhelst

MatPIM: Accelerating Matrix Operations with Memristive Stateful Logic

The emerging memristive Memory Processing Unit (mMPU) overcomes the memory wall through memristive devices that unite storage and logic for real processing-in-memory (PIM) systems. At the core of the mMPU is stateful logic, which is…

Hardware Architecture · Computer Science 2022-07-01 Orian Leitersdorf , Ronny Ronen , Shahar Kvatinsky

In-memory Implementation of On-chip Trainable and Scalable ANN for AI/ML Applications

Traditional von Neumann architecture based processors become inefficient in terms of energy and throughput as they involve separate processing and memory units, also known as~\textit{memory wall}. The memory wall problem is further…

Signal Processing · Electrical Eng. & Systems 2020-05-20 Abhash Kumar , Jawar Singh , Sai Manohar Beeraka , Bharat Gupta

Low Power In-Memory Implementation of Ternary Neural Networks with Resistive RAM-Based Synapse

The design of systems implementing low precision neural networks with emerging memories such as resistive random access memory (RRAM) is a major lead for reducing the energy consumption of artificial intelligence (AI). Multiple works have…

Emerging Technologies · Computer Science 2020-05-06 Axel Laborieux , Marc Bocquet , Tifenn Hirtzlin , Jacques-Olivier Klein , Liza Herrera Diez , Etienne Nowak , Elisa Vianello , Jean-Michel Portal , Damien Querlioz

Shared-PIM: Enabling Concurrent Computation and Data Flow for Faster Processing-in-DRAM

Processing-in-Memory (PIM) enhances memory with computational capabilities, potentially solving energy and latency issues associated with data transfer between memory and processors. However, managing concurrent computation and data flow…

Hardware Architecture · Computer Science 2025-05-09 Ahmed Mamdouh , Haoran Geng , Michael Niemier , Xiaobo Sharon Hu , Dayane Reis

Parallel Mapper

The construction of Mapper has emerged in the last decade as a powerful and effective topological data analysis tool that approximates and generalizes other topological summaries, such as the Reeb graph, the contour tree, split, and joint…

Computer Vision and Pattern Recognition · Computer Science 2020-09-15 Mustafa Hajij , Basem Assiri , Paul Rosen

Pack my weights and run! Minimizing overheads for in-memory computing accelerators

In-memory computing hardware accelerators allow more than 10x improvements in peak efficiency and performance for matrix-vector multiplications (MVM) compared to conventional digital designs. For this, they have gained great interest for…

Hardware Architecture · Computer Science 2024-09-19 Pouya Houshmand , Marian Verhelst

LEAP: LLM Inference on Scalable PIM-NoC Architecture with Balanced Dataflow and Fine-Grained Parallelism

Large language model (LLM) inference has been a prevalent demand in daily life and industries. The large tensor sizes and computing complexities in LLMs have brought challenges to memory, computing, and databus. This paper proposes a…

Hardware Architecture · Computer Science 2025-09-19 Yimin Wang , Yue Jiet Chong , Xuanyao Fong

Triangle Counting Accelerations: From Algorithm to In-Memory Computing Architecture

Triangles are the basic substructure of networks and triangle counting (TC) has been a fundamental graph computing problem in numerous fields such as social network analysis. Nevertheless, like other graph computing problems, due to the…

Hardware Architecture · Computer Science 2021-12-02 Xueyan Wang , Jianlei Yang , Yinglin Zhao , Xiaotao Jia , Rong Yin , Xuhang Chen , Gang Qu , Weisheng Zhao

Applying Incremental Learning in Binary-Addition-Tree Algorithm for Dynamic Binary-State Network Reliability

This paper presents a novel approach to enhance the Binary-Addition-Tree algorithm (BAT) by integrating incremental learning techniques. BAT, known for its simplicity in development, implementation, and application, is a powerful implicit…

Machine Learning · Computer Science 2024-09-25 Wei-Chang Yeh

PPAC: A Versatile In-Memory Accelerator for Matrix-Vector-Product-Like Operations

Processing in memory (PIM) moves computation into memories with the goal of improving throughput and energy-efficiency compared to traditional von Neumann-based architectures. Most existing PIM architectures are either general-purpose but…

Hardware Architecture · Computer Science 2019-07-23 Oscar Castañeda , Maria Bobbett , Alexandra Gallyas-Sanhueza , Christoph Studer

AritPIM: High-Throughput In-Memory Arithmetic

Digital processing-in-memory (PIM) architectures are rapidly emerging to overcome the memory-wall bottleneck by integrating logic within memory elements. Such architectures provide vast computational power within the memory itself in the…

Hardware Architecture · Computer Science 2023-04-18 Orian Leitersdorf , Dean Leitersdorf , Jonathan Gal , Mor Dahan , Ronny Ronen , Shahar Kvatinsky

Full-Stack Optimization for CAM-Only DNN Inference

The accuracy of neural networks has greatly improved across various domains over the past years. Their ever-increasing complexity, however, leads to prohibitively high energy demands and latency in von Neumann systems. Several…

Hardware Architecture · Computer Science 2024-01-24 João Paulo C. de Lima , Asif Ali Khan , Luigi Carro , Jeronimo Castrillon

TCIM: Triangle Counting Acceleration With Processing-In-MRAM Architecture

Triangle counting (TC) is a fundamental problem in graph analysis and has found numerous applications, which motivates many TC acceleration solutions in the traditional computing platforms like GPU and FPGA. However, these approaches suffer…

Hardware Architecture · Computer Science 2020-07-22 Xueyan Wang , Jianlei Yang , Yinglin Zhao , Yingjie Qi , Meichen Liu , Xingzhou Cheng , Xiaotao Jia , Xiaoming Chen , Gang Qu , Weisheng Zhao

Efficient and Fault-Tolerant Memristive Neural Networks with In-Situ Training

Neuromorphic architectures, which incorporate parallel and in-memory processing, are crucial for accelerating artificial neural network (ANN) computations. This work presents a novel memristor-based multi-layer neural network (memristive…

Emerging Technologies · Computer Science 2025-07-29 Santlal Prajapat , Manobendra Nath Mondal , Susmita Sur-Kolay

Res-DPU: Resource-shared Digital Processing-in-memory Unit for Edge-AI Workloads

Processing-in-memory (PIM) has emerged as the go to solution for addressing the von Neumann bottleneck in edge AI accelerators. However, state-of-the-art (SoTA) digital PIM approaches suffer from low compute density, primarily due to the…

Hardware Architecture · Computer Science 2025-10-23 Mukul Lokhande , Narendra Singh Dhakad , Seema Chouhan , Akash Sankhe , Santosh Kumar Vishvakarma

Topology-induced Enhancement of Mappings

In this paper we propose a new method to enhance a mapping $\mu(\cdot)$ of a parallel application's computational tasks to the processing elements (PEs) of a parallel computer. The idea behind our method \mswap is to enhance such a mapping…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-04-20 Roland Glantz , Maria Predari , Henning Meyerhenke

Parallel In-Place Algorithms: Theory and Practice

Many parallel algorithms use at least linear auxiliary space in the size of the input to enable computations to be done independently without conflicts. Unfortunately, this extra space can be prohibitive for memory-limited machines,…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-03-02 Yan Gu , Omar Obeya , Julian Shun