English
Related papers

Related papers: Parallel FPGA Router using Sub-Gradient method and…

200 papers

Routing of the nets in Field Programmable Gate Array (FPGA) design flow is one of the most time consuming steps. Although Versatile Place and Route (VPR), which is a commonly used algorithm for this purpose, routes effectively, it is slow…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-10-23 Rohit Agrawal , Kapil Ahuja , Dhaarna Maheshwari , Akash Kumar

In the face of escalating complexity and size of contemporary FPGAs and circuits, routing emerges as a pivotal and time-intensive phase in FPGA compilation flows. In response to this challenge, we present an open-source parallel routing…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-07-02 Xinshi Zang , Wenhao Lin , Shiju Lin , Jinwei Liu , Evangeline F. Y. Young

In recent years the computing landscape has seen an in- creasing shift towards specialized accelerators. Field pro- grammable gate arrays (FPGAs) are particularly promising as they offer significant performance and energy improvements…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-11-24 Raghu Prabhakar , David Koeplinger , Kevin Brown , HyoukJoong Lee , Christopher De Sa , Christos Kozyrakis , Kunle Olukotun

Field Programmable Gate Arrays(FPGA) exceed the computing power of software based implementations by breaking the paradigm of sequential execution and accomplishing more per clock cycle by enabling hardware level parallelization at an…

Robotics · Computer Science 2016-07-20 Gurshaant Malik , Krishna Gupta , Raunak Dharani , K Madhava Krishna

As deep neural networks (DNNs) become deeper, the training time increases. In this perspective, multi-GPU parallel computing has become a key tool in accelerating the training of DNNs. In this paper, we introduce a novel methodology to…

Numerical Analysis · Mathematics 2024-07-08 Chang-Ock Lee , Youngkyu Lee , Jongho Park

Improving the computational efficiency of quantum many-body calculations from a hardware perspective remains a critical challenge. Although field-programmable gate arrays (FPGAs) have recently been exploited to improve the computational…

Strongly Correlated Electrons · Physics 2026-02-06 Songtai Lv , Yang Liang , Rui Zhu , Qibin Zheng , Haiyuan Zou

Distributed machine learning workloads use data and tensor parallelism for training and inference, both of which rely on the AllReduce collective to synchronize gradients or activations. However, AllReduce algorithms are delayed by the…

Machine Learning · Computer Science 2025-09-30 Arjun Devraj , Eric Ding , Abhishek Vijaya Kumar , Robert Kleinberg , Rachee Singh

The approximate minimum degree algorithm is widely used before numerical factorization to reduce fill-in for sparse matrices. While considerable attention has been given to the numerical factorization process, less focus has been placed on…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-02-26 Yen-Hsiang Chang , Aydın Buluç , James Demmel

This work focuses on a class of general decentralized constraint-coupled optimization problems. We propose a novel nested primal-dual gradient algorithm (NPGA), which can achieve linear convergence under the weakest known condition, and its…

Optimization and Control · Mathematics 2025-05-06 Jingwang Li , Housheng Su

Genetic Algorithms (GAs) are used to solve search and optimization problems in which an optimal solution can be found using an iterative process with probabilistic and non-deterministic transitions. However, depending on the problem's…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-01-23 Matheus F. Torquato , Marcelo A. C. Fernandes

Efficient and real time segmentation of color images has a variety of importance in many fields of computer vision such as image compression, medical imaging, mapping and autonomous navigation. Being one of the most computationally…

Computer Vision and Pattern Recognition · Computer Science 2017-10-09 Roopal Nahar , Akanksha Baranwal , K. Madhava Krishna

We demonstrate an FPGA implementation of a parallel and reconfigurable architecture for sparse neural networks, capable of on-chip training and inference. The network connectivity uses pre-determined, structured sparsity to significantly…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-04-29 Sourya Dey , Diandian Chen , Zongyang Li , Souvik Kundu , Kuan-Wen Huang , Keith M. Chugg , Peter A. Beerel

In this work, the Parareal algorithm is applied to evolution problems that admit good low-rank approximations and for which the dynamical low-rank approximation (DLRA) can be used as time stepper. Many discrete integrators for DLRA have…

Numerical Analysis · Mathematics 2022-09-14 Benjamin Carrel , Martin J. Gander , Bart Vandereycken

Overlays have shown significant promise for field-programmable gate-arrays (FPGAs) as they allow for fast development cycles and remove many of the challenges of the traditional FPGA hardware design flow. However, this often comes with a…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-07-18 Mohamed S. Abdelfattah , David Han , Andrew Bitar , Roberto DiCecco , Shane OConnell , Nitika Shanker , Joseph Chu , Ian Prins , Joshua Fender , Andrew C. Ling , Gordon R. Chiu

This paper introduces cuHALLaR, a GPU-accelerated implementation of the HALLaR method proposed in Monteiro et al. 2024 for solving large-scale semidefinite programming (SDP) problems. We demonstrate how our Julia-based implementation…

FPGA-based heterogeneous architectures provide programmers with the ability to customize their hardware accelerators for flexible acceleration of many workloads. Nonetheless, such advantages come at the cost of sacrificing programmability.…

Hardware Architecture · Computer Science 2018-07-05 Jason Cong , Zhenman Fang , Yuchen Hao , Peng Wei , Cody Hao Yu , Chen Zhang , Peipei Zhou

The ever-increasing data rates of modern communication systems lead to severe distortions of the communication signal, imposing great challenges to state-of-the-art signal processing algorithms. In this context, neural network (NN)-based…

Signal Processing · Electrical Eng. & Systems 2024-07-04 Jonas Ney , Norbert Wehn

Path planning is critical for autonomous driving, generating smooth, collision-free, feasible paths based on perception and localization inputs. However, its computationally intensive nature poses significant challenges for…

Hardware Architecture · Computer Science 2025-07-23 Yifan Zhang , Xiaoyu Niu , Hongzheng Tian , Yanjun Zhang , Bo Yu , Shaoshan Liu , Sitao Huang

The past few years have witnessed growth in the computational requirements for training deep convolutional neural networks. Current approaches parallelize training onto multiple devices by applying a single parallelization strategy (e.g.,…

Machine Learning · Computer Science 2018-06-12 Zhihao Jia , Sina Lin , Charles R. Qi , Alex Aiken

We present PDLP, a practical first-order method for linear programming (LP) designed to solve large-scale LP problems. PDLP is based on the primal-dual hybrid gradient (PDHG) method applied to the minimax formulation of LP. PDLP…

Optimization and Control · Mathematics 2026-03-19 David Applegate , Mateo Díaz , Oliver Hinder , Haihao Lu , Miles Lubin , Brendan O'Donoghue , Warren Schudy
‹ Prev 1 2 3 10 Next ›