Related papers: High-Throughput and Memory-Efficient Parallel Vite…

High-Throughput Parallel Viterbi Decoder on GPU Tensor Cores

Many research works have been performed on implementation of Vitrerbi decoding algorithm on GPU instead of FPGA because this platform provides considerable flexibility in addition to great performance. Recently, the recently-introduced…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-11-30 Alireza Mohammadidoost , Matin Hashemi

A Gb/s Parallel Block-based Viterbi Decoder for Convolutional Codes on GPU

In this paper, we propose a parallel block-based Viterbi decoder (PBVD) on the graphic processing unit (GPU) platform for the decoding of convolutional codes. The decoding procedure is simplified and parallelized, and the characteristic of…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-08-02 Hao Peng , Rongke Liu , Yi Hou , Ling Zhao

GPU-Accelerated Viterbi Exact Lattice Decoder for Batched Online and Offline Speech Recognition

We present an optimized weighted finite-state transducer (WFST) decoder capable of online streaming and offline batch processing of audio using Graphics Processing Units (GPUs). The decoder is efficient in memory utilization, input/output…

Computation and Language · Computer Science 2020-02-17 Hugo Braun , Justin Luitjens , Ryan Leary , Tim Kaldewey , Daniel Povey

GPU Implementation and Optimization of a Flexible MAP Decoder for Synchronization Correction

In this paper we present an optimized parallel implementation of a flexible MAP decoder for synchronization error correcting codes, supporting a very wide range of code sizes and channel conditions. On mid-range GPUs we demonstrate decoding…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-02-26 Johann A. Briffa

Efficient ML Decoding for Quantum Convolutional Codes

A novel decoding algorithm is developed for general quantum convolutional codes. Exploiting useful ideas from classical coding theory, the new decoder introduces two innovations that drastically reduce the decoding complexity compared to…

Quantum Physics · Physics 2015-03-13 Peiyu Tan , Jing Li

High-performance Decoder for Convolutional Code with Deep Neural Network

The use of deep neural network for decoding error control code will encounter two problems, namely, the high-precision requirements of the error control code and the complexity of the neural network due to the long code. In this paper, a…

Signal Processing · Electrical Eng. & Systems 2019-01-01 Jiang Xiaobo , Zhang Fang , Zeng Zhen

Parallel Interleaver Design for a High Throughput HSPA+/LTE Multi-Standard Turbo Decoder

To meet the evolving data rate requirements of emerging wireless communication technologies, many parallel architectures have been proposed to implement high throughput turbo decoders. However, concurrent memory reading/writing in parallel…

Information Theory · Computer Science 2014-03-27 Guohui Wang , Hao Shen , Yang Sun , Joseph R. Cavallaro , Aida Vosoughi , Yuanbin Guo

Hybrid HMM Decoder For Convolutional Codes By Joint Trellis-Like Structure and Channel Prior

The anti-interference capability of wireless links is a physical layer problem for edge computing. Although convolutional codes have inherent error correction potential due to the redundancy introduced in the data, the performance of the…

Information Theory · Computer Science 2022-11-15 Haoyu Li , Xuan Wang , Tong Liu , Dingyi Fang , Baoying Liu

Computationally Efficient Implementation of a Hamming Code Decoder using a Graphics Processing Unit

This paper presents a computationally efficient implementation of a Hamming code decoder on a graphics processing unit (GPU) to support real-time software-defined radio (SDR), which is a software alternative for realizing wireless…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-12-23 Shohidul Islam , Cheol-Hong Kim , Jong-Myon Kim

Performance Peculiarities of Viterbi Decoder in Mathworks Simulink, GNU Radio and Other Systems with Likewise Implementation

The performance of convolutional codes decoding by the Viterbi algorithm should not depend on the particular distribution of zeros and ones in the input messages, as they are linear. However, it was identified that specific implementations…

Information Theory · Computer Science 2015-10-06 Alexey Shapin , Denis Kleyko , Nikita Lyamin , Evgeny Osipov , Oleg Melentyev

A GPU-based WFST Decoder with Exact Lattice Generation

We describe initial work on an extension of the Kaldi toolkit that supports weighted finite-state transducer (WFST) decoding on Graphics Processing Units (GPUs). We implement token recombination as an atomic GPU operation in order to fully…

Computation and Language · Computer Science 2018-07-30 Zhehuai Chen , Justin Luitjens , Hainan Xu , Yiming Wang , Daniel Povey , Sanjeev Khudanpur

Toward Terabits-per-second Communications: A High-Throughput Hardware Implementation of $G_N$-Coset Codes

Recently, a parallel decoding algorithm of $G_N$-coset codes was proposed.The algorithm exploits two equivalent decoding graphs.For each graph, the inner code part, which consists of independent component codes, is decoded in parallel. The…

Information Theory · Computer Science 2020-04-22 Jiajie Tong , Xianbin Wang , Qifan Zhang , Huazi Zhang , Shengchen Dai , Rong Li , Jun Wang

Accelerating JPEG Decompression on GPUs

The JPEG compression format has been the standard for lossy image compression for over multiple decades, offering high compression rates at minor perceptual loss in image quality. For GPU-accelerated computer vision and deep learning tasks,…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-11-18 André Weißenberger , Bertil Schmidt

FLASH Viterbi: Fast and Adaptive Viterbi Decoding for Modern Data Systems

The Viterbi algorithm is a key operator for structured sequence inference in modern data systems, with applications in trajectory analysis, online recommendation, and speech recognition. As these workloads increasingly migrate to…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-10-24 Ziheng Deng , Xue Liu , Jiantong Jiang , Yankai Li , Qingxu Deng , Xiaochun Yang

Sequential Decoding of Convolutional Codes for Synchronization Errors

Sequential decoding, commonly applied to substitution channels, is a sub-optimal alternative to Viterbi decoding with significantly reduced memory costs. In this work, a sequential decoder for convolutional codes over channels that are…

Information Theory · Computer Science 2026-04-02 Anisha Banerjee , Andreas Lenz , Antonia Wachter-Zeh

A Fast and Generic GPU-Based Parallel Reduction Implementation

Reduction operations are extensively employed in many computational problems. A reduction consists of, given a finite set of numeric elements, combining into a single value all elements in that set, using for this a combiner function. A…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-10-23 Walid Jradi , Hugo do Nascimento , Wellington Martins

PAGANI: A Parallel Adaptive GPU Algorithm for Numerical

We present a new adaptive parallel algorithm for the challenging problem of multi-dimensional numerical integration on massively parallel architectures. Adaptive algorithms have demonstrated the best performance, but efficient many-core…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-06-24 Ioannis Sakiotis , Kamesh Arumugam , Marc Paterno , Desh Ranjan , Balša Terzić , Mohammad Zubair

Optimization of XNOR Convolution for Binary Convolutional Neural Networks on GPU

Binary convolutional networks have lower computational load and lower memory foot-print compared to their full-precision counterparts. So, they are a feasible alternative for the deployment of computer vision applications on limited…

Computer Vision and Pattern Recognition · Computer Science 2020-07-29 Mete Can Kaya , Alperen İnci , Alptekin Temizel

On the Convergence Speed of Turbo Demodulation with Turbo Decoding

Iterative processing is widely adopted nowadays in modern wireless receivers for advanced channel codes like turbo and LDPC codes. Extension of this principle with an additional iterative feedback loop to the demapping function has proven…

Information Theory · Computer Science 2015-06-04 Salim Haddad , Amer Baghdadi , Michel Jezequel

Equal bi-Vectorized (EBV) method to high performance on GPU

Due to importance of reducing of time solution in numerical codes, we propose an algorithm for parallel LU decomposition solver for dense and sparse matrices on GPU. This algorithm is based on first bi-vectorizing a triangular matrices of…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-07-15 Amirreza Hashemi , Mohsen Lahooti , Ebrahim Shirani