Related papers: Parallel algorithms development for programmable l…

Higher-Level Hardware Synthesis of The KASUMI Algorithm

Programmable Logic Devices (PLDs) continue to grow in size and currently contain several millions of gates. At the same time, research effort is going into higher-level hardware synthesis methodologies for reconfigurable computing that can…

Hardware Architecture · Computer Science 2019-04-09 Issam Damaj

Parallel Algorithms Development for Programmable Devices with Application from Cryptography

Reconfigurable devices, such as Field Programmable Gate Arrays (FPGAs), have been witnessing a considerable increase in density. State-of-the-art FPGAs are complex hybrid devices that contain up to several millions of gates. Recently,…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-04-12 Issam Damaj

Generating Configurable Hardware from Parallel Patterns

In recent years the computing landscape has seen an in- creasing shift towards specialized accelerators. Field pro- grammable gate arrays (FPGAs) are particularly promising as they offer significant performance and energy improvements…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-11-24 Raghu Prabhakar , David Koeplinger , Kevin Brown , HyoukJoong Lee , Christopher De Sa , Christos Kozyrakis , Kunle Olukotun

Reconfigurable Hardware Implementation of the Successive Overrelaxation Method

In this chapter, we study the feasibility of implementing SOR in reconfigurable hardware. We use Handel-C, a higher level design tool, to code our design, which is analyzed, synthesized, and placed and routed using the FPGAs proprietary…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-05-17 Safaa Kasbah , Ramzi Haraty , Issam Damaj

A Partitioning Methodology for Accelerating Applications in Hybrid Reconfigurable Platforms

In this paper, we propose a methodology for partitioning and mapping computational intensive applications in reconfigurable hardware blocks of different granularity. A generic hybrid reconfigurable architecture is considered so as the…

Hardware Architecture · Computer Science 2011-11-09 M. D. Galanis , A. Milidonis , G. Theodoridis , D. Soudris , C. E. Goutis

A Scalable Shared-Memory Parallel Simplex for Large-Scale Linear Programming

The Simplex tableau has been broadly used and investigated in the industry and academia. With the advent of the big data era, ever larger problems are posed to be solved in ever larger machines whose architecture type did not exist in the…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-05-29 Demetrios Coutinho , Felipe O. Lins e Silva , Daniel Aloise , Samuel , Xavier-de-Souza

HPC-Coder: Modeling Parallel Programs using Large Language Models

Parallel programs in high performance computing (HPC) continue to grow in complexity and scale in the exascale era. The diversity in hardware and parallel programming models make developing, optimizing, and maintaining parallel software…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-05-15 Daniel Nichols , Aniruddha Marathe , Harshitha Menon , Todd Gamblin , Abhinav Bhatele

Massively Parallel Algorithms for the Lattice Boltzmann Method on Non-uniform Grids

The lattice Boltzmann method exhibits excellent scalability on current supercomputing systems and has thus increasingly become an alternative method for large-scale non-stationary flow simulations, reaching up to a trillion grid nodes.…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-05-11 Florian Schornbaum , Ulrich Rüde

Design and implementation of self-adaptable parallel algorithms for scientific computing on highly heterogeneous HPC platforms

Traditional heterogeneous parallel algorithms, designed for heterogeneous clusters of workstations, are based on the assumption that the absolute speed of the processors does not depend on the size of the computational task. This assumption…

Distributed, Parallel, and Cluster Computing · Computer Science 2011-09-15 Alexey Lastovetsky , Ravi Reddy , Vladimir Rychkov , David Clarke

Generation and Validation of Custom Multiplication IP Blocks from the Web

Every CPU carries one or more arithmetical and logical units. One popular operation that is performed by these units is multiplication. Automatic generation of custom VHDL models for performing this operation, allows the designer to achieve…

Hardware Architecture · Computer Science 2015-02-27 Minas Dasygenis

Fast and Parallelizable Logical Computation with Homological Product Codes

Quantum error correction is necessary to perform large-scale quantum computation, but requires extremely large overheads in both space and time. High-rate quantum low-density-parity-check (qLDPC) codes promise a route to reduce qubit…

Quantum Physics · Physics 2024-07-29 Qian Xu , Hengyun Zhou , Guo Zheng , Dolev Bluvstein , J. Pablo Bonilla Ataides , Mikhail D. Lukin , Liang Jiang

Parallelizing Program Execution on Distributed Quantum Systems via Compiler/Hardware Co-Design

As quantum computers continue to improve and support larger, more complex computations, smart control hardware and compilers are needed to efficiently leverage the capabilities of these systems. This paper introduces a novel approach to…

Quantum Physics · Physics 2025-11-19 Folkert de Ronde , Alexander Knapen , Stephan Wong , Sebastian Feld

Parallel Programming for FPGAs

This book focuses on the use of algorithmic high-level synthesis (HLS) to build application-specific FPGA systems. Our goal is to give the reader an appreciation of the process of creating an optimized hardware design using HLS. Although…

Hardware Architecture · Computer Science 2018-05-11 Ryan Kastner , Janarbek Matai , Stephen Neuendorffer

An Efficient Algorithm for Modulus Operation and Its Hardware Implementation in Prime Number Calculation

This paper presents a novel algorithm for the modulus operation for FPGA implementation. The proposed algorithm use only addition, subtraction, logical, and bit shift operations, avoiding the complexities and hardware costs associated with…

Cryptography and Security · Computer Science 2025-01-10 W. A. Susantha Wijesinghe

Automatic Multi-GPU Code Generation applied to Simulation of Electrical Machines

The electrical and electronic engineering has used parallel programming to solve its large scale complex problems for performance reasons. However, as parallel programming requires a non-trivial distribution of tasks and data, developers…

Distributed, Parallel, and Cluster Computing · Computer Science 2011-07-05 Antonio Wendell De Oliveira Rodrigues , Frédéric Guyomarc'H , Jean-Luc Dekeyser , Yvonnick Le Menach

An Evaluation of Massively Parallel Algorithms for DFA Minimization

We study parallel algorithms for the minimization of Deterministic Finite Automata (DFAs). In particular, we implement four different massively parallel algorithms on Graphics Processing Units (GPUs). Our results confirm the expectations…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-10-31 Jan Martens , Anton Wijs

Optimizing High-Level Synthesis Designs with Retrieval-Augmented Large Language Models

High-level synthesis (HLS) allows hardware designers to create hardware designs with high-level programming languages like C/C++/OpenCL, which greatly improves hardware design productivity. However, existing HLS flows require programmers'…

Hardware Architecture · Computer Science 2024-10-11 Haocheng Xu , Haotian Hu , Sitao Huang

Parallel Computing Architectures for Robotic Applications: A Comprehensive Review

With the growing complexity and capability of contemporary robotic systems, the necessity of sophisticated computing solutions to efficiently handle tasks such as real-time processing, sensor integration, decision-making, and control…

Robotics · Computer Science 2025-09-09 Md Rafid Islam

A New and Efficient Algorithm-Based Fault Tolerance Scheme for A Million Way Parallelism

Fault tolerance overhead of high performance computing (HPC) applications is becoming critical to the efficient utilization of HPC systems at large scale. HPC applications typically tolerate fail-stop failures by checkpointing. Another…

Distributed, Parallel, and Cluster Computing · Computer Science 2011-06-22 Erlin Yao , Mingyu Chen , Rui Wang , Wenli Zhang , Guangming Tan

A Highly Parallel FPGA Implementation of Sparse Neural Network Training

We demonstrate an FPGA implementation of a parallel and reconfigurable architecture for sparse neural networks, capable of on-chip training and inference. The network connectivity uses pre-determined, structured sparsity to significantly…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-04-29 Sourya Dey , Diandian Chen , Zongyang Li , Souvik Kundu , Kuan-Wen Huang , Keith M. Chugg , Peter A. Beerel