Related papers: Fast LDPC GPU Decoder for Cloud RAN

A High-Throughput Multi-Mode LDPC Decoder for 5G NR

This paper presents a partially parallel low-density parity-check (LDPC) decoder designed for the 5G New Radio (NR) standard. The design is using a multi-block parallel architecture with a flooding schedule. The decoder can support any code…

Signal Processing · Electrical Eng. & Systems 2021-03-12 Sina Pourjabar , Gwan S. Choi

DecodeX: Exploring and Benchmarking of LDPC Decoding across CPU, GPU, and ASIC Platforms

Emerging virtualized radio access networks (vRANs) demand flexible and efficient baseband processing across heterogeneous compute substrates. In this paper, we present DecodeX, a unified benchmarking framework for evaluating low-density…

Networking and Internet Architecture · Computer Science 2025-11-06 Zhenzhou Qi , Yuncheng Yao , Yiming Li , Chung-Hsuan Tung , Junyao Zheng , Danyang Zhuo , Tingjun Chen

Six Times to Spare: Characterizing GPU-Accelerated 5G LDPC Decoding for Edge-RSU Communications

Ultra-reliable low-latency vehicular communications (URLLC) require sufficient physical-layer (PHY) compute headroom at the network edge, where roadside units (RSUs) and compact next-generation base stations (gNBs) must meet strict timing…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-03-10 Ryan Barker , Julia Boone , Tolunay Seyfi , Alireza Ebrahimi Dorcheh , Fatemeh Afghah , Joseph Boccuzzi

GPU-Accelerated Syndrome Decoding for Quantum LDPC Codes below the 63 $\mu$s Latency Threshold

This paper presents a GPU-accelerated decoder for quantum low-density parity-check (QLDPC) codes that achieves sub-$63$ $\mu$s latency, below the surface code decoder's real-time threshold demonstrated on Google's Willow quantum processor.…

Quantum Physics · Physics 2025-08-12 Oscar Ferraz , Bruno Coutinho , Gabriel Falcao , Marco Gomes , Francisco A. Monteiro , Vitor Silva

Multi-Stream LDPC Decoder on GPU of Mobile Devices

Low-density parity check (LDPC) codes have been extensively applied in mobile communication systems due to their excellent error correcting capabilities. However, their broad adoption has been hindered by the high complexity of the LDPC…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-12-18 Roohollah Amiri , Hani Mehrpouyan

High-throughput GPU layered decoder of multi-edge type low density parity check codes in continuous-variable quantum key distribution systems

The decoding throughput in the postprocessing is one of the bottlenecks for a continuous-variable quantum key distribution (CV-QKD) system. In this paper, we propose a layered decoder to decode quasi-cyclic multi-edge type LDPC (QC-METLDPC)…

Quantum Physics · Physics 2020-04-21 Yang Li , Xiaofang Zhang , Yong Li , Bingjie Xu , Li Ma , Jie Yang , Wei Huang

GPU coprocessors as a service for deep learning inference in high energy physics

In the next decade, the demands for computing in large scientific experiments are expected to grow tremendously. During the same time period, CPU performance increases will be limited. At the CERN Large Hadron Collider (LHC), these two…

Computational Physics · Physics 2021-04-26 Jeffrey Krupa , Kelvin Lin , Maria Acosta Flechas , Jack Dinsmore , Javier Duarte , Philip Harris , Scott Hauck , Burt Holzman , Shih-Chieh Hsu , Thomas Klijnsma , Mia Liu , Kevin Pedro , Dylan Rankin , Natchanon Suaysom , Matt Trahms , Nhan Tran

Implementation Of Decoders for LDPC Block Codes and LDPC Convolutional Codes Based on GPUs

With the use of belief propagation (BP) decoding algorithm, low-density parity-check (LDPC) codes can achieve near-Shannon limit performance. In order to evaluate the error performance of LDPC codes, simulators running on CPUs are commonly…

Information Theory · Computer Science 2012-07-30 Yue Zhao , Francis C. M. Lau

Computationally Efficient Implementation of a Hamming Code Decoder using a Graphics Processing Unit

This paper presents a computationally efficient implementation of a Hamming code decoder on a graphics processing unit (GPU) to support real-time software-defined radio (SDR), which is a software alternative for realizing wireless…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-12-23 Shohidul Islam , Cheol-Hong Kim , Jong-Myon Kim

High-Throughput Parallel Viterbi Decoder on GPU Tensor Cores

Many research works have been performed on implementation of Vitrerbi decoding algorithm on GPU instead of FPGA because this platform provides considerable flexibility in addition to great performance. Recently, the recently-introduced…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-11-30 Alireza Mohammadidoost , Matin Hashemi

A GPU-Outperforming FPGA Accelerator Architecture for Binary Convolutional Neural Networks

FPGA-based hardware accelerators for convolutional neural networks (CNNs) have obtained great attentions due to their higher energy efficiency than GPUs. However, it is challenging for FPGA-based solutions to achieve a higher throughput…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-06-09 Yixing Li , Zichuan Liu , Kai Xu , Hao Yu , Fengbo Ren

An Efficient FPGA Accelerator for Point Cloud

Deep learning-based point cloud processing plays an important role in various vision tasks, such as autonomous driving, virtual reality (VR), and augmented reality (AR). The submanifold sparse convolutional network (SSCN) has been widely…

Signal Processing · Electrical Eng. & Systems 2022-10-17 Zilun Wang , Wendong Mao , Peixiang Yang , Zhongfeng Wang , Jun Lin

GPU acceleration and performance of the particle-beam-dynamics code Elegant

Elegant is an accelerator physics and particle-beam dynamics code widely used for modeling and design of a variety of high-energy particle accelerators and accelerator-based systems. In this paper we discuss a recently developed version of…

Computational Physics · Physics 2018-11-22 J. R. King , I. V. Pogorelov , K. M. Amyx , M. Borland , R. Soliday

High Performance Computing Applied to Logistic Regression: A CPU and GPU Implementation Comparison

We present a versatile GPU-based parallel version of Logistic Regression (LR), aiming to address the increasing demand for faster algorithms in binary classification due to large data sets. Our implementation is a direct translation of the…

Machine Learning · Computer Science 2023-08-22 Nechba Mohammed , Mouhajir Mohamed , Sedjari Yassine

Generalized LDPC codes with low-complexity decoding and fast convergence

We consider generalized low-density parity-check (GLDPC) codes with component codes that are duals of Cordaro-Wagner codes. Two efficient decoding algorithms are proposed: one based on Hartmann-Rudolph processing, analogous to Sum-Product…

Information Theory · Computer Science 2025-05-14 Dawit Simegn , Dmitry Artemasov , Kirill Andreev , Pavel Rybin , Alexey Frolov

Improving the performance of the linear systems solvers using CUDA

Parallel computing can offer an enormous advantage regarding the performance for very large applications in almost any field: scientific computing, computer vision, databases, data mining, and economics. GPUs are high performance many-core…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-11-24 Bogdan Oancea , Tudorel Andrei , Raluca Mariana Dragoescu

A Generalized Adjusted Min-Sum Decoder for 5G LDPC Codes: Algorithm and Implementation

5G New Radio (NR) has stringent demands on both performance and complexity for the design of low-density parity-check (LDPC) decoding algorithms and corresponding VLSI implementations. Furthermore, decoders must fully support the wide range…

Information Theory · Computer Science 2024-02-20 Yuqing Ren , Hassan Harb , Yifei Shen , Alexios Balatsoukas-Stimming , Andreas Burg

5G LDPC Linear Transformer for Channel Decoding

This work introduces a novel, fully differentiable linear-time complexity transformer decoder and a transformer decoder to correct 5G New Radio (NR) LDPC. We propose a scalable approach to decode linear block codes with $O(n)$ complexity…

Machine Learning · Computer Science 2025-01-27 Mario Hernandez , Fernando Pinero

GNNerator: A Hardware/Software Framework for Accelerating Graph Neural Networks

Graph Neural Networks (GNNs) use a fully-connected layer to extract features from the nodes of a graph and aggregate these features using message passing between nodes, combining two distinct computational patterns: dense, regular…

Hardware Architecture · Computer Science 2021-03-22 Jacob R. Stevens , Dipankar Das , Sasikanth Avancha , Bharat Kaul , Anand Raghunathan

Real-time, fast radio transient searches with GPU de-dispersion

The identification, and subsequent discovery, of fast radio transients through blind-search surveys requires a large amount of processing power, in worst cases scaling as $\mathcal{O}(N^3)$. For this reason, survey data are generally…

Instrumentation and Methods for Astrophysics · Physics 2014-02-04 Alessio Magro , Aris Karastergiou , Stefano Salvini , Benjamin Mort , Fred Dulwich , Kristian Zarb Adami