Related papers: ForOpenCL: Transformations Exploiting Array Syntax…

Towards Automatic Transformation of Legacy Scientific Code into OpenCL for Optimal Performance on FPGAs

There is a large body of legacy scientific code written in languages like Fortran that is not optimised to get the best performance out of heterogeneous acceleration devices like GPUs and FPGAs, and manually porting such code into parallel…

Performance · Computer Science 2019-01-25 Wim Vanderbauwhede , Syed Waqar Nabi

Loo.py: From Fortran to performance via transformation and substitution rules

A large amount of numerically-oriented code is written and is being written in legacy languages. Much of this code could, in principle, make good use of data-parallel throughput-oriented computer architectures. Loo.py, a…

Programming Languages · Computer Science 2015-05-19 Andreas Klöckner

Accelerating Fortran Codes: A Method for Integrating Coarray Fortran with CUDA Fortran and OpenMP

Fortran's prominence in scientific computing requires strategies to ensure both that legacy codes are efficient on high-performance computing systems, and that the language remains attractive for the development of new high-performance…

Instrumentation and Methods for Astrophysics · Physics 2024-09-13 James McKevitt , Eduard I. Vorobyov , Igor Kulikov

Fortran High-Level Synthesis: Reducing the barriers to accelerating HPC codes on FPGAs

In recent years the use of FPGAs to accelerate scientific applications has grown, with numerous applications demonstrating the benefit of FPGAs for high performance workloads. However, whilst High Level Synthesis (HLS) has significantly…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-08-28 Gabriel Rodriguez-Canal , Nick Brown , Tim Dykes , Jessica R. Jones , Utz-Uwe Haus

Fortran performance optimisation and auto-parallelisation by leveraging MLIR-based domain specific abstractions in Flang

MLIR has become popular since it was open sourced in 2019. A sub-project of LLVM, the flexibility provided by MLIR to represent Intermediate Representations (IR) as dialects at different abstraction levels, to mix these, and to leverage…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-10-04 Nick Brown , Maurice Jamieson , Anton Lydike , Emilien Bauer , Tobias Grosser

Domain-Specific Acceleration and Auto-Parallelization of Legacy Scientific Code in FORTRAN 77 using Source-to-Source Compilation

Massively parallel accelerators such as GPGPUs, manycores and FPGAs represent a powerful and affordable tool for scientists who look to speed up simulations of complex systems. However, porting code to such devices requires a detailed…

Mathematical Software · Computer Science 2017-11-15 Wim Vanderbauwhede , Gavin Davidson

Array Program Transformation with Loo.py by Example: High-Order Finite Elements

To concisely and effectively demonstrate the capabilities of our program transformation system Loo.py, we examine a transformation path from two real-world Fortran subroutines as found in a weather model to a single high-performance…

Programming Languages · Computer Science 2018-10-05 Andreas Klöckner , Lucas C. Wilcox , T. Warburton

FFCNN: Fast FPGA based Acceleration for Convolution neural network inference

We present a new efficient OpenCL-based Accelerator for large scale Convolutional Neural Networks called Fast Inference on FPGAs for Convolution Neural Network (FFCNN). FFCNN is based on a deeply pipelined OpenCL kernels architecture. As…

Machine Learning · Computer Science 2022-08-30 F. Keddous , H-N. Nguyen , A. Nakib

Patterns and Rewrite Rules for Systematic Code Generation (From High-Level Functional Patterns to High-Performance OpenCL Code)

Computing systems have become increasingly complex with the emergence of heterogeneous hardware combining multicore CPUs and GPUs. These parallel systems exhibit tremendous computational power at the cost of increased programming effort.…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-02-10 Michel Steuwer , Christian Fensch , Christophe Dubach

Generating Configurable Hardware from Parallel Patterns

In recent years the computing landscape has seen an in- creasing shift towards specialized accelerators. Field pro- grammable gate arrays (FPGAs) are particularly promising as they offer significant performance and energy improvements…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-11-24 Raghu Prabhakar , David Koeplinger , Kevin Brown , HyoukJoong Lee , Christopher De Sa , Christos Kozyrakis , Kunle Olukotun

Performance of FORTRAN and C GPU Extensions for a Benchmark Suite of Fourier Pseudospectral Algorithms

A comparison of PGI OpenACC, FORTRAN CUDA, and Nvidia CUDA pseudospectral methods on a single GPU and GCC FORTRAN on single and multiple CPU cores is reported. The GPU implementations use CuFFT and the CPU implementations use FFTW. Porting…

Computational Physics · Physics 2012-08-14 B. Cloutier , B. K. Muite , P. Rigge

Exploring Thread Coarsening on FPGA

Over the past few years, there has been an increased interest in including FPGAs in data centers and high-performance computing clusters along with GPUs and other accelerators. As a result, it has become increasingly important to have a…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-09-14 Mostafa Eghbali Zarch , Reece Neff , Michela Becchi

Locally-Oriented Programming: A Simple Programming Model for Stencil-Based Computations on Multi-Level Distributed Memory Architectures

Emerging hybrid accelerator architectures for high performance computing are often suited for the use of a data-parallel programming model. Unfortunately, programmers of these architectures face a steep learning curve that frequently…

Programming Languages · Computer Science 2015-02-13 Craig Rasmussen , Matthew Sottile , Daniel Nagle , Soren Rasmussen

PipeCNN: An OpenCL-Based FPGA Accelerator for Large-Scale Convolution Neuron Networks

Convolutional neural networks (CNNs) have been widely employed in many applications such as image classification, video analysis and speech recognition. Being compute-intensive, CNN computations are mainly accelerated by GPUs with high…

Hardware Architecture · Computer Science 2016-11-09 Dong Wang , Jianjing An , Ke Xu

New High Performance GPGPU Code Transformation Framework Applied to Large Production Weather Prediction Code

We introduce "Hybrid Fortran", a new approach that allows a high performance GPGPU port for structured grid Fortran codes. This technique only requires minimal changes for a CPU targeted codebase, which is a significant advancement in terms…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-02-19 Michel Müller , Takayuki Aoki

Dwarfs on Accelerators: Enhancing OpenCL Benchmarking for Heterogeneous Computing Architectures

For reasons of both performance and energy efficiency, high-performance computing (HPC) hardware is becoming increasingly heterogeneous. The OpenCL framework supports portable programming across a wide range of computing devices and is…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-11-01 Beau Johnston , Josh Milthorpe

Optimization and Portability of a Fusion OpenACC-based FORTRAN HPC Code from NVIDIA to AMD GPUs

NVIDIA has been the main provider of GPU hardware in HPC systems for over a decade. Most applications that benefit from GPUs have thus been developed and optimized for the NVIDIA software stack. Recent exascale HPC systems are, however,…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-09-12 Igor Sfiligoi , Emily A. Belli , Jeff Candy , Reuben D. Budiardja

SparkCL: A Unified Programming Framework for Accelerators on Heterogeneous Clusters

We introduce SparkCL, an open source unified programming framework based on Java, OpenCL and the Apache Spark framework. The motivation behind this work is to bring unconventional compute cores such as FPGAs/GPUs/APUs/DSPs and future core…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-05-06 Oren Segal , Philip Colangelo , Nasibeh Nasiri , Zhuo Qian , Martin Margala

Comparing Parallel Functional Array Languages: Programming and Performance

Parallel functional array languages are an emerging class of programming languages that promise to combine low-effort parallel programming with good performance and performance portability. We systematically compare the designs and…

Programming Languages · Computer Science 2025-05-15 David van Balen , Tiziano De Matteis , Clemens Grelck , Troels Henriksen , Aaron W. Hsu , Gabriele K. Keller , Thomas Koopman , Trevor L. McDonell , Cosmin Oancea , Sven-Bodo Scholz , Artjoms Sinkarovs , Tom Smeding , Phil Trinder , Ivo Gabe de Wolff , Alexandros Nikolaos Ziogas

Algorithms and Hardware for Efficient Processing of Logic-based Neural Networks

Recent efforts to improve the performance of neural network (NN) accelerators that meet today's application requirements have given rise to a new trend of logic-based NN inference relying on fixed-function combinational logic (FFCL). This…

Hardware Architecture · Computer Science 2023-04-14 Jingkai Hong , Arash Fayyazi , Amirhossein Esmaili , Mahdi Nazemi , Massoud Pedram