Related papers: RPU: The Ring Processing Unit

PPU: Design and Implementation of a Pipelined Full Posit Processing Unit

By exploiting the modular RISC-V ISA this paper presents the customization of instruction set with posit\textsuperscript{\texttrademark} arithmetic instructions to provide improved numerical accuracy, well-defined behavior and increased…

Hardware Architecture · Computer Science 2024-04-09 Federico Rossi , Francesco Urbani , Marco Cococcioni , Emanuele Ruffaldi , Sergio Saponara

Near Threshold Computation of Partitioned Ring Learning With Error (RLWE) Post Quantum Cryptography on Reconfigurable Architecture

Ring Learning With Error (RLWE) algorithm is used in Post Quantum Cryptography (PQC) and Homomorphic Encryption (HE) algorithm. The existing classical crypto algorithms may be broken in quantum computers. The adversaries can store all…

Cryptography and Security · Computer Science 2024-05-15 Paresh Baidya , Swagata Mondal , Rourab Paul

RPU -- A Reasoning Processing Unit

Large language model (LLM) inference performance is increasingly bottlenecked by the memory wall. While GPUs continue to scale raw compute throughput, they struggle to deliver scalable performance for memory bandwidth bound workloads. This…

Hardware Architecture · Computer Science 2026-02-25 Matthew Adiletta , Gu-Yeon Wei , David Brooks

Fast Arithmetic Hardware Library For RLWE-Based Homomorphic Encryption

In this work, we propose an open-source, first-of-its-kind, arithmetic hardware library with a focus on accelerating the arithmetic operations involved in Ring Learning with Error (RLWE)-based somewhat homomorphic encryption (SHE). We…

Cryptography and Security · Computer Science 2020-07-06 Rashmi Agrawal , Lake Bu , Alan Ehret , Michel A. Kinsy

The Recurrent Processing Unit: Hardware for High Speed Machine Learning

Machine learning applications are computationally demanding and power intensive. Hardware acceleration of these software tools is a natural step being explored using various technologies. A recurrent processing unit (RPU) is fast and…

Emerging Technologies · Computer Science 2019-12-17 Heidi Komkov , Alessandro Restelli , Brian Hunt , Liam Shaughnessy , Itamar Shani , Daniel P. Lathrop

Hardware-Friendly Randomization: Enabling Random-Access and Minimal Wiring in FHE Accelerators with Low Total Cost

The Ring-Learning With Errors (RLWE) problem forms the backbone of highly efficient Fully Homomorphic Encryption (FHE) schemes. A significant component of the RLWE public key and ciphertext of the form $(b,a)$ is the uniformly random…

Cryptography and Security · Computer Science 2026-02-24 Ilan Rosenfeld , Noam Kleinburd , Hillel Chapman , Dror Reuven

Training LSTM Networks with Resistive Cross-Point Devices

In our previous work we have shown that resistive cross point devices, so called Resistive Processing Unit (RPU) devices, can provide significant power and speed benefits when training deep fully connected networks as well as convolutional…

Machine Learning · Computer Science 2023-02-17 Tayfun Gokmen , Malte Rasch , Wilfried Haensch

FPGA-Accelerated RISC-V ISA Extensions for Efficient Neural Network Inference on Edge Devices

Edge AI deployment faces critical challenges balancing computational performance, energy efficiency, and resource constraints. This paper presents FPGA-accelerated RISC-V instruction set architecture (ISA) extensions for efficient neural…

Hardware Architecture · Computer Science 2025-11-11 Arya Parameshwara , Santosh Hanamappa Mokashi

FERIVer: An FPGA-assisted Emulated Framework for RTL Verification of RISC-V Processors

Processor design and verification require a synergistic approach that combines instruction-level functional simulations with precise hardware emulations. The trade-off between speed and accuracy in the instruction set simulation poses a…

Hardware Architecture · Computer Science 2025-04-08 Kun Qin , Xiaorang Guo , Martin Schulz , Carsten Trinitis

AraOS: Analyzing the Impact of Virtual Memory Management on Vector Unit Performance

Vector processor architectures offer an efficient solution for accelerating data-parallel workloads (e.g., ML, AI), reducing instruction count, and enhancing processing efficiency. This is evidenced by the increasing adoption of vector…

Hardware Architecture · Computer Science 2025-04-15 Matteo Perotti , Vincenzo Maisto , Moritz Imfeld , Nils Wistoff , Alessandro Cilardo , Luca Benini

Accelerating More Secure RC4 : Implementation of Seven FPGA Designs in Stages upto 8 byte per clock

RC4 can be made more secured if an additional RC4-like Post-KSA Random Shuffing (PKRS) process is introduced between KSA and PRGA. It can also be made significantly faster if RC4 bytes are processed in a FPGA embedded system using multiple…

Applications · Statistics 2016-09-21 Rourab Paul , Hemanta Dey , Amlan Chakrabarti , Ranjan Ghosh

Accelerating RNN-based Speech Enhancement on a Multi-Core MCU with Mixed FP16-INT8 Post-Training Quantization

This paper presents an optimized methodology to design and deploy Speech Enhancement (SE) algorithms based on Recurrent Neural Networks (RNNs) on a state-of-the-art MicroController Unit (MCU), with 1+8 general-purpose RISC-V cores. To…

Sound · Computer Science 2022-10-17 Manuele Rusci , Marco Fariselli , Martin Croome , Francesco Paci , Eric Flamand

PIMSIM-NN: An ISA-based Simulation Framework for Processing-in-Memory Accelerators

Processing-in-memory (PIM) has shown extraordinary potential in accelerating neural networks. To evaluate the performance of PIM accelerators, we present an ISA-based simulation framework including a dedicated ISA targeting neural networks…

Hardware Architecture · Computer Science 2024-02-29 Xinyu Wang , Xiaotian Sun , Yinhe Han , Xiaoming Chen

Graphics Processing Unit acceleration of the Random Phase Approximation in the projector augmented wave method

The Random Phase Approximation (RPA) for correlation energy in the grid-based projector augmented wave (gpaw) code is accelerated by porting to the Graphics Processing Unit (GPU) architecture. The acceleration is achieved by grouping…

Computational Physics · Physics 2013-07-31 Jun Yan , Lin Li , Christopher O'Grady

Toward a Lightweight, Scalable, and Parallel Secure Encryption Engine

The exponential growth of Internet of Things (IoT) applications has intensified the demand for efficient, high-throughput, and energy-efficient data processing at the edge. Conventional CPU-centric encryption methods suffer from performance…

Cryptography and Security · Computer Science 2025-06-19 Rasha Karakchi , Rye Stahle-Smith , Nishant Chinnasami , Tiffany Yu

RVCoreP : An optimized RISC-V soft processor of five-stage pipelining

RISC-V is a RISC based open and loyalty free instruction set architecture which has been developed since 2010, and can be used for cost-effective soft processors on FPGAs. The basic 32-bit integer instruction set in RISC-V is defined as…

Hardware Architecture · Computer Science 2020-12-30 Hiromu Miyazaki , Takuto Kanamori , Md Ashraful Islam , Kenji Kise

Evaluating Ising Processing Units with Integer Programming

The recent emergence of novel computational devices, such as adiabatic quantum computers, CMOS annealers, and optical parametric oscillators, present new opportunities for hybrid-optimization algorithms that are hardware accelerated by…

Optimization and Control · Mathematics 2019-06-20 Carleton Coffrin , Harsha Nagarajan , Russell Bent

FPGA-extended General Purpose Computer Architecture

This paper introduces a computer architecture, where part of the instruction set architecture (ISA) is implemented on small highly-integrated field-programmable gate arrays (FPGAs). Small FPGAs inside a general-purpose processor (CPU) can…

Hardware Architecture · Computer Science 2022-08-23 Philippos Papaphilippou , Myrtle Shah

Ara2: Exploring Single- and Multi-Core Vector Processing with an Efficient RVV 1.0 Compliant Open-Source Processor

Vector processing is highly effective in boosting processor performance and efficiency for data-parallel workloads. In this paper, we present Ara2, the first fully open-source vector processor to support the RISC-V V 1.0 frozen ISA. We…

Hardware Architecture · Computer Science 2024-06-18 Matteo Perotti , Matheus Cavalcante , Renzo Andri , Lukas Cavigelli , Luca Benini

Extending the RISC-V ISA for exploring advanced reconfigurable SIMD instructions

This paper presents a novel, non-standard set of vector instruction types for exploring custom SIMD instructions in a softcore. The new types allow simultaneous access to a relatively high number of operands, reducing the instruction count…

Hardware Architecture · Computer Science 2021-06-15 Philippos Papaphilippou , Paul H. J. Kelly , Wayne Luk