English
Related papers

Related papers: RPU: The Ring Processing Unit

200 papers

By exploiting the modular RISC-V ISA this paper presents the customization of instruction set with posit\textsuperscript{\texttrademark} arithmetic instructions to provide improved numerical accuracy, well-defined behavior and increased…

Hardware Architecture · Computer Science 2024-04-09 Federico Rossi , Francesco Urbani , Marco Cococcioni , Emanuele Ruffaldi , Sergio Saponara

Ring Learning With Error (RLWE) algorithm is used in Post Quantum Cryptography (PQC) and Homomorphic Encryption (HE) algorithm. The existing classical crypto algorithms may be broken in quantum computers. The adversaries can store all…

Cryptography and Security · Computer Science 2024-05-15 Paresh Baidya , Swagata Mondal , Rourab Paul

Large language model (LLM) inference performance is increasingly bottlenecked by the memory wall. While GPUs continue to scale raw compute throughput, they struggle to deliver scalable performance for memory bandwidth bound workloads. This…

Hardware Architecture · Computer Science 2026-02-25 Matthew Adiletta , Gu-Yeon Wei , David Brooks

In this work, we propose an open-source, first-of-its-kind, arithmetic hardware library with a focus on accelerating the arithmetic operations involved in Ring Learning with Error (RLWE)-based somewhat homomorphic encryption (SHE). We…

Cryptography and Security · Computer Science 2020-07-06 Rashmi Agrawal , Lake Bu , Alan Ehret , Michel A. Kinsy

Machine learning applications are computationally demanding and power intensive. Hardware acceleration of these software tools is a natural step being explored using various technologies. A recurrent processing unit (RPU) is fast and…

Emerging Technologies · Computer Science 2019-12-17 Heidi Komkov , Alessandro Restelli , Brian Hunt , Liam Shaughnessy , Itamar Shani , Daniel P. Lathrop

The Ring-Learning With Errors (RLWE) problem forms the backbone of highly efficient Fully Homomorphic Encryption (FHE) schemes. A significant component of the RLWE public key and ciphertext of the form $(b,a)$ is the uniformly random…

Cryptography and Security · Computer Science 2026-02-24 Ilan Rosenfeld , Noam Kleinburd , Hillel Chapman , Dror Reuven

In our previous work we have shown that resistive cross point devices, so called Resistive Processing Unit (RPU) devices, can provide significant power and speed benefits when training deep fully connected networks as well as convolutional…

Machine Learning · Computer Science 2023-02-17 Tayfun Gokmen , Malte Rasch , Wilfried Haensch

Edge AI deployment faces critical challenges balancing computational performance, energy efficiency, and resource constraints. This paper presents FPGA-accelerated RISC-V instruction set architecture (ISA) extensions for efficient neural…

Hardware Architecture · Computer Science 2025-11-11 Arya Parameshwara , Santosh Hanamappa Mokashi

Processor design and verification require a synergistic approach that combines instruction-level functional simulations with precise hardware emulations. The trade-off between speed and accuracy in the instruction set simulation poses a…

Hardware Architecture · Computer Science 2025-04-08 Kun Qin , Xiaorang Guo , Martin Schulz , Carsten Trinitis

Vector processor architectures offer an efficient solution for accelerating data-parallel workloads (e.g., ML, AI), reducing instruction count, and enhancing processing efficiency. This is evidenced by the increasing adoption of vector…

Hardware Architecture · Computer Science 2025-04-15 Matteo Perotti , Vincenzo Maisto , Moritz Imfeld , Nils Wistoff , Alessandro Cilardo , Luca Benini

RC4 can be made more secured if an additional RC4-like Post-KSA Random Shuffing (PKRS) process is introduced between KSA and PRGA. It can also be made significantly faster if RC4 bytes are processed in a FPGA embedded system using multiple…

Applications · Statistics 2016-09-21 Rourab Paul , Hemanta Dey , Amlan Chakrabarti , Ranjan Ghosh

This paper presents an optimized methodology to design and deploy Speech Enhancement (SE) algorithms based on Recurrent Neural Networks (RNNs) on a state-of-the-art MicroController Unit (MCU), with 1+8 general-purpose RISC-V cores. To…

Sound · Computer Science 2022-10-17 Manuele Rusci , Marco Fariselli , Martin Croome , Francesco Paci , Eric Flamand

Processing-in-memory (PIM) has shown extraordinary potential in accelerating neural networks. To evaluate the performance of PIM accelerators, we present an ISA-based simulation framework including a dedicated ISA targeting neural networks…

Hardware Architecture · Computer Science 2024-02-29 Xinyu Wang , Xiaotian Sun , Yinhe Han , Xiaoming Chen

The Random Phase Approximation (RPA) for correlation energy in the grid-based projector augmented wave (gpaw) code is accelerated by porting to the Graphics Processing Unit (GPU) architecture. The acceleration is achieved by grouping…

Computational Physics · Physics 2013-07-31 Jun Yan , Lin Li , Christopher O'Grady

The exponential growth of Internet of Things (IoT) applications has intensified the demand for efficient, high-throughput, and energy-efficient data processing at the edge. Conventional CPU-centric encryption methods suffer from performance…

Cryptography and Security · Computer Science 2025-06-19 Rasha Karakchi , Rye Stahle-Smith , Nishant Chinnasami , Tiffany Yu

RISC-V is a RISC based open and loyalty free instruction set architecture which has been developed since 2010, and can be used for cost-effective soft processors on FPGAs. The basic 32-bit integer instruction set in RISC-V is defined as…

Hardware Architecture · Computer Science 2020-12-30 Hiromu Miyazaki , Takuto Kanamori , Md Ashraful Islam , Kenji Kise

The recent emergence of novel computational devices, such as adiabatic quantum computers, CMOS annealers, and optical parametric oscillators, present new opportunities for hybrid-optimization algorithms that are hardware accelerated by…

Optimization and Control · Mathematics 2019-06-20 Carleton Coffrin , Harsha Nagarajan , Russell Bent

This paper introduces a computer architecture, where part of the instruction set architecture (ISA) is implemented on small highly-integrated field-programmable gate arrays (FPGAs). Small FPGAs inside a general-purpose processor (CPU) can…

Hardware Architecture · Computer Science 2022-08-23 Philippos Papaphilippou , Myrtle Shah

Vector processing is highly effective in boosting processor performance and efficiency for data-parallel workloads. In this paper, we present Ara2, the first fully open-source vector processor to support the RISC-V V 1.0 frozen ISA. We…

Hardware Architecture · Computer Science 2024-06-18 Matteo Perotti , Matheus Cavalcante , Renzo Andri , Lukas Cavigelli , Luca Benini

This paper presents a novel, non-standard set of vector instruction types for exploring custom SIMD instructions in a softcore. The new types allow simultaneous access to a relatively high number of operands, reducing the instruction count…

Hardware Architecture · Computer Science 2021-06-15 Philippos Papaphilippou , Paul H. J. Kelly , Wayne Luk
‹ Prev 1 2 3 10 Next ›