Related papers: High-Throughput and Memory-Efficient Parallel Vite…
Many research works have been performed on implementation of Vitrerbi decoding algorithm on GPU instead of FPGA because this platform provides considerable flexibility in addition to great performance. Recently, the recently-introduced…
In this paper, we propose a parallel block-based Viterbi decoder (PBVD) on the graphic processing unit (GPU) platform for the decoding of convolutional codes. The decoding procedure is simplified and parallelized, and the characteristic of…
We present an optimized weighted finite-state transducer (WFST) decoder capable of online streaming and offline batch processing of audio using Graphics Processing Units (GPUs). The decoder is efficient in memory utilization, input/output…
In this paper we present an optimized parallel implementation of a flexible MAP decoder for synchronization error correcting codes, supporting a very wide range of code sizes and channel conditions. On mid-range GPUs we demonstrate decoding…
A novel decoding algorithm is developed for general quantum convolutional codes. Exploiting useful ideas from classical coding theory, the new decoder introduces two innovations that drastically reduce the decoding complexity compared to…
The use of deep neural network for decoding error control code will encounter two problems, namely, the high-precision requirements of the error control code and the complexity of the neural network due to the long code. In this paper, a…
To meet the evolving data rate requirements of emerging wireless communication technologies, many parallel architectures have been proposed to implement high throughput turbo decoders. However, concurrent memory reading/writing in parallel…
The anti-interference capability of wireless links is a physical layer problem for edge computing. Although convolutional codes have inherent error correction potential due to the redundancy introduced in the data, the performance of the…
This paper presents a computationally efficient implementation of a Hamming code decoder on a graphics processing unit (GPU) to support real-time software-defined radio (SDR), which is a software alternative for realizing wireless…
The performance of convolutional codes decoding by the Viterbi algorithm should not depend on the particular distribution of zeros and ones in the input messages, as they are linear. However, it was identified that specific implementations…
We describe initial work on an extension of the Kaldi toolkit that supports weighted finite-state transducer (WFST) decoding on Graphics Processing Units (GPUs). We implement token recombination as an atomic GPU operation in order to fully…
Recently, a parallel decoding algorithm of $G_N$-coset codes was proposed.The algorithm exploits two equivalent decoding graphs.For each graph, the inner code part, which consists of independent component codes, is decoded in parallel. The…
The JPEG compression format has been the standard for lossy image compression for over multiple decades, offering high compression rates at minor perceptual loss in image quality. For GPU-accelerated computer vision and deep learning tasks,…
The Viterbi algorithm is a key operator for structured sequence inference in modern data systems, with applications in trajectory analysis, online recommendation, and speech recognition. As these workloads increasingly migrate to…
Sequential decoding, commonly applied to substitution channels, is a sub-optimal alternative to Viterbi decoding with significantly reduced memory costs. In this work, a sequential decoder for convolutional codes over channels that are…
Reduction operations are extensively employed in many computational problems. A reduction consists of, given a finite set of numeric elements, combining into a single value all elements in that set, using for this a combiner function. A…
We present a new adaptive parallel algorithm for the challenging problem of multi-dimensional numerical integration on massively parallel architectures. Adaptive algorithms have demonstrated the best performance, but efficient many-core…
Binary convolutional networks have lower computational load and lower memory foot-print compared to their full-precision counterparts. So, they are a feasible alternative for the deployment of computer vision applications on limited…
Iterative processing is widely adopted nowadays in modern wireless receivers for advanced channel codes like turbo and LDPC codes. Extension of this principle with an additional iterative feedback loop to the demapping function has proven…
Due to importance of reducing of time solution in numerical codes, we propose an algorithm for parallel LU decomposition solver for dense and sparse matrices on GPU. This algorithm is based on first bi-vectorizing a triangular matrices of…