Related papers: Parallel FPGA Router using Sub-Gradient method and…

ParaLarH: Parallel FPGA Router based upon Lagrange Heuristics

Routing of the nets in Field Programmable Gate Array (FPGA) design flow is one of the most time consuming steps. Although Versatile Place and Route (VPR), which is a commonly used algorithm for this purpose, routes effectively, it is slow…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-10-23 Rohit Agrawal , Kapil Ahuja , Dhaarna Maheshwari , Akash Kumar

An Open-Source Fast Parallel Routing Approach for Commercial FPGAs

In the face of escalating complexity and size of contemporary FPGAs and circuits, routing emerges as a pivotal and time-intensive phase in FPGA compilation flows. In response to this challenge, we present an open-source parallel routing…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-07-02 Xinshi Zang , Wenhao Lin , Shiju Lin , Jinwei Liu , Evangeline F. Y. Young

Generating Configurable Hardware from Parallel Patterns

In recent years the computing landscape has seen an in- creasing shift towards specialized accelerators. Field pro- grammable gate arrays (FPGAs) are particularly promising as they offer significant performance and energy improvements…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-11-24 Raghu Prabhakar , David Koeplinger , Kevin Brown , HyoukJoong Lee , Christopher De Sa , Christos Kozyrakis , Kunle Olukotun

FPGA based hybrid architecture for parallelizing RRT

Field Programmable Gate Arrays(FPGA) exceed the computing power of software based implementations by breaking the paradigm of sequential execution and accomplishing more per clock cycle by enabling hardware level parallelization at an…

Robotics · Computer Science 2016-07-20 Gurshaant Malik , Krishna Gupta , Raunak Dharani , K Madhava Krishna

Parareal Neural Networks Emulating a Parallel-in-time Algorithm

As deep neural networks (DNNs) become deeper, the training time increases. In this perspective, multi-GPU parallel computing has become a key tool in accelerating the training of DNNs. In this paper, we introduce a novel methodology to…

Numerical Analysis · Mathematics 2024-07-08 Chang-Ock Lee , Youngkyu Lee , Jongho Park

Reducing the Computational Cost Scaling of Tensor Network Algorithms via Field-Programmable Gate Array Parallelism

Improving the computational efficiency of quantum many-body calculations from a hardware perspective remains a critical challenge. Although field-programmable gate arrays (FPGAs) have recently been exploited to improve the computational…

Strongly Correlated Electrons · Physics 2026-02-06 Songtai Lv , Yang Liang , Rui Zhu , Qibin Zheng , Haiyuan Zou

Efficient AllReduce with Stragglers

Distributed machine learning workloads use data and tensor parallelism for training and inference, both of which rely on the AllReduce collective to synchronize gradients or activations. However, AllReduce algorithms are delayed by the…

Machine Learning · Computer Science 2025-09-30 Arjun Devraj , Eric Ding , Abhishek Vijaya Kumar , Robert Kleinberg , Rachee Singh

Parallelizing the Approximate Minimum Degree Ordering Algorithm: Strategies and Evaluation

The approximate minimum degree algorithm is widely used before numerical factorization to reduce fill-in for sparse matrices. While considerable attention has been given to the numerical factorization process, less focus has been placed on…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-02-26 Yen-Hsiang Chang , Aydın Buluç , James Demmel

NPGA: A Unified Algorithmic Framework for Decentralized Constraint-Coupled Optimization

This work focuses on a class of general decentralized constraint-coupled optimization problems. We propose a novel nested primal-dual gradient algorithm (NPGA), which can achieve linear convergence under the weakest known condition, and its…

Optimization and Control · Mathematics 2025-05-06 Jingwang Li , Housheng Su

High-Performance Parallel Implementation of Genetic Algorithm on FPGA

Genetic Algorithms (GAs) are used to solve search and optimization problems in which an optimal solution can be found using an iterative process with probabilistic and non-deterministic transitions. However, depending on the problem's…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-01-23 Matheus F. Torquato , Marcelo A. C. Fernandes

FPGA based Parallelized Architecture of Efficient Graph based Image Segmentation Algorithm

Efficient and real time segmentation of color images has a variety of importance in many fields of computer vision such as image compression, medical imaging, mapping and autonomous navigation. Being one of the most computationally…

Computer Vision and Pattern Recognition · Computer Science 2017-10-09 Roopal Nahar , Akanksha Baranwal , K. Madhava Krishna

A Highly Parallel FPGA Implementation of Sparse Neural Network Training

We demonstrate an FPGA implementation of a parallel and reconfigurable architecture for sparse neural networks, capable of on-chip training and inference. The network connectivity uses pre-determined, structured sparsity to significantly…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-04-29 Sourya Dey , Diandian Chen , Zongyang Li , Souvik Kundu , Kuan-Wen Huang , Keith M. Chugg , Peter A. Beerel

Low-rank Parareal: a low-rank parallel-in-time integrator

In this work, the Parareal algorithm is applied to evolution problems that admit good low-rank approximations and for which the dynamical low-rank approximation (DLRA) can be used as time stepper. Many discrete integrators for DLRA have…

Numerical Analysis · Mathematics 2022-09-14 Benjamin Carrel , Martin J. Gander , Bart Vandereycken

DLA: Compiler and FPGA Overlay for Neural Network Inference Acceleration

Overlays have shown significant promise for field-programmable gate-arrays (FPGAs) as they allow for fast development cycles and remove many of the challenges of the traditional FPGA hardware design flow. However, this often comes with a…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-07-18 Mohamed S. Abdelfattah , David Han , Andrew Bitar , Roberto DiCecco , Shane OConnell , Nitika Shanker , Joseph Chu , Ian Prins , Joshua Fender , Andrew C. Ling , Gordon R. Chiu

cuHALLaR: A GPU Accelerated Low-Rank Augmented Lagrangian Method for Large-Scale Semidefinite Programming

This paper introduces cuHALLaR, a GPU-accelerated implementation of the HALLaR method proposed in Monteiro et al. 2024 for solving large-scale semidefinite programming (SDP) problems. We demonstrate how our Julia-based implementation…

Optimization and Control · Mathematics 2025-10-27 Jacob M. Aguirre , Diego Cifuentes , Vincent Guigues , Renato D. C. Monteiro , Victor Hugo Nascimento , Arnesh Sujanani

Best-Effort FPGA Programming: A Few Steps Can Go a Long Way

FPGA-based heterogeneous architectures provide programmers with the ability to customize their hardware accelerators for flexible acceleration of many workloads. Nonetheless, such advantages come at the cost of sacrificing programmability.…

Hardware Architecture · Computer Science 2018-07-05 Jason Cong , Zhenman Fang , Yuchen Hao , Peng Wei , Cody Hao Yu , Chen Zhang , Peipei Zhou

Achieving High Throughput with a Trainable Neural-Network-Based Equalizer for Communications on FPGA

The ever-increasing data rates of modern communication systems lead to severe distortions of the communication signal, imposing great challenges to state-of-the-art signal processing algorithms. In this context, neural network (NN)-based…

Signal Processing · Electrical Eng. & Systems 2024-07-04 Jonas Ney , Norbert Wehn

A Sparsity-Aware Autonomous Path Planning Accelerator with HW/SW Co-Design and Multi-Level Dataflow Optimization

Path planning is critical for autonomous driving, generating smooth, collision-free, feasible paths based on perception and localization inputs. However, its computationally intensive nature poses significant challenges for…

Hardware Architecture · Computer Science 2025-07-23 Yifan Zhang , Xiaoyu Niu , Hongzheng Tian , Yanjun Zhang , Bo Yu , Shaoshan Liu , Sitao Huang

Exploring Hidden Dimensions in Parallelizing Convolutional Neural Networks

The past few years have witnessed growth in the computational requirements for training deep convolutional neural networks. Current approaches parallelize training onto multiple devices by applying a single parallelization strategy (e.g.,…

Machine Learning · Computer Science 2018-06-12 Zhihao Jia , Sina Lin , Charles R. Qi , Alex Aiken

PDLP: A Practical First-Order Method for Large-Scale Linear Programming

We present PDLP, a practical first-order method for linear programming (LP) designed to solve large-scale LP problems. PDLP is based on the primal-dual hybrid gradient (PDHG) method applied to the minimax formulation of LP. PDLP…

Optimization and Control · Mathematics 2026-03-19 David Applegate , Mateo Díaz , Oliver Hinder , Haihao Lu , Miles Lubin , Brendan O'Donoghue , Warren Schudy