Related papers: Efficient Multi-Cycle Folded Integer Multipliers

Low Power Shift and Add Multiplier Design

Today every circuit has to face the power consumption issue for both portable device aiming at large battery life and high end circuits avoiding cooling packages and reliability issues that are too complex. It is generally accepted that…

Hardware Architecture · Computer Science 2010-07-15 C. N. Marimuthu , P. Thangaraj , Aswathy Ramesan

Power-Area Efficient Serial IMPLY-based 4:2 Compressor Applied in Data-Intensive Applications

The data transfer between a processor and memory has become a design bottleneck in data-intensive applications. Processing-In-Memory (PIM) is a practical approach to overcome the memory wall bottleneck. The 4:2 compressor is suitable for…

Emerging Technologies · Computer Science 2024-07-16 Bahareh Bagheralmoosavi , Seyed Erfan Fatemieh , Mohammad Reza Reshadinezhad , Antonio Rubio

Towards the Multiple Constant Multiplication at Minimal Hardware Cost

Multiple Constant Multiplication (MCM) over integers is a frequent operation arising in embedded systems that require highly optimized hardware. An efficient way is to replace costly generic multiplication by bit-shifts and additions, i.e.…

Hardware Architecture · Computer Science 2022-10-11 Rémi Garcia , Anastasia Volkova

WWW: What, When, Where to Compute-in-Memory

Matrix multiplication is the dominant computation during Machine Learning (ML) inference. To efficiently perform such multiplication operations, Compute-in-memory (CiM) paradigms have emerged as a highly energy efficient solution. However,…

Hardware Architecture · Computer Science 2025-03-03 Tanvi Sharma , Mustafa Ali , Indranil Chakraborty , Kaushik Roy

Energy-Efficient and Fast Memristor-based Serial Multipliers Applicable in Image Processing

Memristive Processing In-Memory (PIM) is one of the promising techniques for overcoming the Von-Neumann bottleneck. Reduction of data transfer between processor and memory and data processing by memristors in data-intensive applications…

Emerging Technologies · Computer Science 2024-10-15 Seyed Erfan Fatemieh , Bahareh Bagheralmoosavi , Mohammad Reza Reshadinezhad

Efficient Hardware Implementation of Modular Multiplier over GF (2m) on FPGA

Elliptic curve cryptography (ECC) has emerged as the dominant public-key protocol, with NIST standardizing parameters for binary field GF(2^m) ECC systems. This work presents a hardware implementation of a Hybrid Multiplication technique…

Cryptography and Security · Computer Science 2025-06-25 Ruby Kumari , Gaurav Purohit , Abhijit Karmakar

CIM-MLC: A Multi-level Compilation Stack for Computing-In-Memory Accelerators

In recent years, various computing-in-memory (CIM) processors have been presented, showing superior performance over traditional architectures. To unleash the potential of various CIM architectures, such as device precision, crossbar size,…

Hardware Architecture · Computer Science 2024-05-09 Songyun Qu , Shixin Zhao , Bing Li , Yintao He , Xuyi Cai , Lei Zhang , Ying Wang

A Survey on Approximate Multiplier Designs for Energy Efficiency: From Algorithms to Circuits

Given the stringent requirements of energy efficiency for Internet-of-Things edge devices, approximate multipliers, as a basic component of many processors and accelerators, have been constantly proposed and studied for decades, especially…

Hardware Architecture · Computer Science 2023-06-30 Ying Wu , Chuangtao Chen , Weihua Xiao , Xuan Wang , Chenyi Wen , Jie Han , Xunzhao Yin , Weikang Qian , Cheng Zhuo

Mixed-Precision Training and Compilation for RRAM-based Computing-in-Memory Accelerators

Computing-in-Memory (CIM) accelerators are a promising solution for accelerating Machine Learning (ML) workloads, as they perform Matrix-Vector Multiplications (MVMs) on crossbar arrays directly in memory. Although the bit widths of the…

Machine Learning · Computer Science 2026-03-20 Rebecca Pelke , Joel Klein , Jose Cubero-Cascante , Nils Bosbach , Jan Moritz Joseph , Rainer Leupers

High Throughput Matrix-Matrix Multiplication between Asymmetric Bit-Width Operands

Matrix multiplications between asymmetric bit-width operands, especially between 8- and 4-bit operands are likely to become a fundamental kernel of many important workloads including neural networks and machine learning. While existing SIMD…

Machine Learning · Computer Science 2020-08-04 Dibakar Gope , Jesse Beu , Matthew Mattina

SIMDive: Approximate SIMD Soft Multiplier-Divider for FPGAs with Tunable Accuracy

The ever-increasing quest for data-level parallelism and variable precision in ubiquitous multimedia and Deep Neural Network (DNN) applications has motivated the use of Single Instruction, Multiple Data (SIMD) architectures. To alleviate…

Hardware Architecture · Computer Science 2020-11-03 Zahra Ebrahimi , Salim Ullah , Akash Kumar

Reconfigurable multiplier architecture based on memristor-cmos with higher flexibility

Multiplication is an indispensable operation in most of digital signal processing systems. Recently, many systems need to execute different types of algorithms on a multiplier. Therefore, it needs complicated computation and large area…

Hardware Architecture · Computer Science 2019-07-23 Seungbum Baek

Efficient Modular Arithmetic for SIMD Devices

This paper describes several new improvements of modular arithmetic and how to exploit them in order to gain more efficient implementations of commonly used algorithms, especially in cryptographic applications. We further present a new…

Cryptography and Security · Computer Science 2013-10-15 Wilke Trei

A Soft SIMD Based Energy Efficient Computing Microarchitecture

The ever-increasing size and computational complexity of today's machine-learning algorithms pose an increasing strain on the underlying hardware. In this light, novel and dedicated architectural solutions are required to optimize energy…

Hardware Architecture · Computer Science 2022-12-20 Pengbo Yu , Alexandre Levisse , Mohit Gupta , Evenblij Timon , Giovanni Ansaloni , Francky Catthoor , David Atienza

MASIM: An Efficient Multi-Array Scheduler for In-Memory SIMD Computation

Single instruction, multiple data (SIMD) is a popular design style of in-memory computing (IMC) architectures, which enables memory arrays to perform logic operations to achieve low energy consumption and high parallelism. To implement a…

Emerging Technologies · Computer Science 2024-12-04 Xingyue Qian , Chen Nie , Zhezhi He , Weikang Qian

Faster and Low Power Twin Precision Multiplier

In this work faster unsigned multiplication has been achieved by using a combination of High Performance Multiplication [HPM] column reduction technique and implementing a N-bit multiplier using 4 N/2-bit multipliers (recursive…

Hardware Architecture · Computer Science 2011-10-20 V. Sreedeep , B. Ramkumar , Harish M Kittur

Comparative Study of Approximate Multipliers

Approximate multipliers are widely being advocated for energy-efficient computing in applications that exhibit an inherent tolerance to inaccuracy. However, the inclusion of accuracy as a key design parameter, besides the performance, area…

Emerging Technologies · Computer Science 2018-03-20 Mahmoud Masadeh , Osman Hasan , Sofiene Tahar

Explicit Sign-Magnitude Encoders Enable Power-Efficient Multipliers

This work presents a method to maximize power-efficiency of fixed point multiplier units by decomposing them into sub-components. First, an encoder block converts the operands from a two's complement to a sign magnitude representation,…

Neural and Evolutionary Computing · Computer Science 2025-07-25 Felix Arnold , Maxence Bouvier , Ryan Amaudruz , Renzo Andri , Lukas Cavigelli

A Logic-Reuse Approach to Nibble-based Multiplier Design for Low Power Vector Computing

Vector multiplication is a fundamental operation for AI acceleration, responsible for over 85% of computational load in convolution tasks. While essential, these operations are primary drivers of area, power, and delay in modern datapath…

Hardware Architecture · Computer Science 2026-02-24 Md Rownak Hossain Chowdhury , Mostafizur Rahman

Memory Slices: A Modular Building Block for Scalable, Intelligent Memory Systems

While reduction in feature size makes computation cheaper in terms of latency, area, and power consumption, performance of emerging data-intensive applications is determined by data movement. These trends have introduced the concept of…

Hardware Architecture · Computer Science 2018-03-19 Bahar Asgari , Saibal Mukhopadhyay , Sudhakar Yalamanchili