Related papers: Accelerating Viterbi Algorithm using Custom Instru…
This paper presents a novel, non-standard set of vector instruction types for exploring custom SIMD instructions in a softcore. The new types allow simultaneous access to a relatively high number of operands, reducing the instruction count…
Specialized Deep Learning (DL) acceleration stacks, designed for a specific set of frameworks, model architectures, operators, and data types, offer the allure of high performance while sacrificing flexibility. Changes in algorithms,…
The Viterbi algorithm is a key operator for structured sequence inference in modern data systems, with applications in trajectory analysis, online recommendation, and speech recognition. As these workloads increasingly migrate to…
Recent advancements in quantization and mixed-precision approaches offers substantial opportunities to improve the speed and energy efficiency of Neural Networks (NN). Research has shown that individual parameters with varying low…
The enhanced efficiency of hardware accelerators, including Single Instruction Multiple Data (SIMD) architectures and Coarse-Grained Reconfigurable Architectures (CGRAs), is driving significant advancements in Artificial Intelligence and…
For years, the open-source RISC-V instruction set has been driving innovation in processor design, spanning from high-end cores to low-cost or low-power cores. After a decade of evolution, RISC architectures are now as mature as the CISC…
The most famous error-decoding algorithm for convolutional codes is the Viterbi algorithm. In this paper, we present a new reduced complexity version of this algorithm which can be applied to a class of binary convolutional codes with…
A novel adaptive binary decoding algorithm for LDPC codes is proposed, which reduces the decoding complexity while having a comparable or even better performance than corresponding non-adaptive alternatives. In each iteration the variable…
This paper describes a parallel implementation of Viterbi decoding algorithm. Viterbi decoder is widely used in many state-of-the-art wireless systems. The proposed solution optimizes both throughput and memory usage by applying…
The rise of hardware accelerators with custom instructions necessitates custom compiler backends supporting these accelerators. This study provides detailed analyses of LLVM and its RISC-V backend, supplemented with case studies providing…
Processors with extensible instruction sets are often used today as programmable hardware accelerators for various domains. When extending RISC-V and other similar extensible processor architectures, the task of designing specialized…
The use of deep neural network for decoding error control code will encounter two problems, namely, the high-precision requirements of the error control code and the complexity of the neural network due to the long code. In this paper, a…
While Transformers are dominated by Floating-Point (FP) Matrix-Multiplications, their aggressive acceleration through dedicated hardware or many-core programmable systems has shifted the performance bottleneck to non-linear functions like…
This paper presents an automated approach for designing processors that support a subset of the RISC-V instruction set architecture (ISA) for a new class of applications at Extreme Edge. The electronics used in extreme edge applications…
In order to meet the requirement of high data rates for the next generation wireless systems, the efficient implementation of receiver algorithms is essential. On the other hand, the rapid development of technology motivates the…
Procedural planning aims to predict a sequence of actions that transforms an initial visual state into a desired goal, a fundamental ability for intelligent agents operating in complex environments. Existing approaches typically rely on…
Integrating cryptographic accelerators into modern CPU architectures presents unique microarchitectural challenges, particularly when extending instruction sets with complex and multistage operations. Hardware-assisted cryptographic…
We present a quantum Viterbi algorithm (QVA) with better than classical performance under certain conditions. In this paper the proposed algorithm is applied to decoding classical convolutional codes, for instance; large constraint length…
This report makes the case that a well-designed Reduced Instruction Set Computer (RISC) can match, and even exceed, the performance and code density of existing commercial Complex Instruction Set Computers (CISC) while maintaining the…
The development of personalized recommendation has significantly improved the accuracy of information matching and the revenue of e-commerce platforms. Recently, it has 2 trends: 1) recommender systems must be trained timely to cope with…