English
Related papers

Related papers: MKPipe: A Compiler Framework for Optimizing Multi-…

200 papers

Over the past few years, there has been an increased interest in including FPGAs in data centers and high-performance computing clusters along with GPUs and other accelerators. As a result, it has become increasingly important to have a…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-09-14 Mostafa Eghbali Zarch , Reece Neff , Michela Becchi

There is a large body of legacy scientific code written in languages like Fortran that is not optimised to get the best performance out of heterogeneous acceleration devices like GPUs and FPGAs, and manually porting such code into parallel…

Performance · Computer Science 2019-01-25 Wim Vanderbauwhede , Syed Waqar Nabi

This paper presents a meta-compilation framework, the MCompiler. The main idea is that different segments of a program can be compiled with different compilers/optimizers and combined into a single executable. The MCompiler can be used in a…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-05-31 Aniket Shivam , Alexandru Nicolau , Alexander V. Veidenbaum

FPGA vendors have recently started focusing on OpenCL for FPGAs because of its ability to leverage the parallelism inherent to heterogeneous computing platforms. OpenCL allows programs running on a host computer to launch accelerator…

Hardware Architecture · Computer Science 2017-05-09 Abhishek Kumar Jain , Douglas L. Maskell , Suhaib A. Fahmy

Current computational systems are heterogeneous by nature, featuring a combination of CPUs and GPUs. As the latter are becoming an established platform for high-performance computing, the focus is shifting towards the seamless programming…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-10-23 Fábio Soldado , Fernando Alexandre , Hervé Paulino

Parallelization schemes are essential in order to exploit the full benefits of multi-core architectures. In said architectures, the most comprehensive parallelization API is OpenMP. However, the introduction of correct and optimal OpenMP…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-05-28 Idan Mosseri , Lee-or Alon , Re'em Harel , Gal Oren

Nowadays, we are living in an era of extreme device heterogeneity. Despite the high variety of conventional CPU architectures, accelerator devices, such as GPUs and FPGAs, also appear in the foreground exploding the pool of available…

Machine Learning · Computer Science 2022-08-31 Petros Vavaroutsos , Ioannis Oroutzoglou , Dimosthenis Masouros , Dimitrios Soudris

The electrical and electronic engineering has used parallel programming to solve its large scale complex problems for performance reasons. However, as parallel programming requires a non-trivial distribution of tasks and data, developers…

Distributed, Parallel, and Cluster Computing · Computer Science 2011-07-05 Antonio Wendell De Oliveira Rodrigues , Frédéric Guyomarc'H , Jean-Luc Dekeyser , Yvonnick Le Menach

High parallel framework has been proved to be very suitable for graph processing. There are various work to optimize the implementation in FPGAs, a pipeline parallel device. The key to make use of the parallel performance of FPGAs is to…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-07-02 Chengbo Yang

Automatic compiler phase selection/ordering has traditionally been focused on CPUs and, to a lesser extent, FPGAs. We present experiments regarding compiler phase ordering specialization of OpenCL kernels targeting a GPU. We use iterative…

Performance · Computer Science 2018-10-25 Ricardo Nobre , Luís Reis , João M. P. Cardoso

In an effort to lower the barrier to the adoption of FPGAs by a broader community, today major FPGA vendors offer compiler toolchains for OpenCL code. While using these toolchain allows porting existing code to FPGAs, ensuring performance…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-01-09 Mostafa Eghbali Zarch , Michela Becchi

Pipelining between data loading and computation is a critical tensor program optimization for GPUs. In order to unleash the high performance of latest GPUs, we must perform a synergetic optimization of multi-stage pipelining across the…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-05-09 Guyue Huang , Yang Bai , Liu Liu , Yuke Wang , Bei Yu , Yufei Ding , Yuan Xie

To increase performance and efficiency, systems use FPGAs as reconfigurable accelerators. A key challenge in designing these systems is partitioning computation between processors and an FPGA. An appropriate division of labor may be…

Hardware Architecture · Computer Science 2021-07-21 Endri Bezati , Mahyar Emami , Jörn Janneck , James Larus

FPGA-based hardware accelerators have received increasing attention mainly due to their ability to accelerate deep pipelined applications, thus resulting in higher computational performance and energy efficiency. Nevertheless, the amount of…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-03-23 R. Nepomuceno , R. Sterle , G. Valarini , M. Pereira , H. Yviquel , G. Araujo

Computing systems have become increasingly complex with the emergence of heterogeneous hardware combining multicore CPUs and GPUs. These parallel systems exhibit tremendous computational power at the cost of increased programming effort.…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-02-10 Michel Steuwer , Christian Fensch , Christophe Dubach

Medical image processing is often limited by the computational cost of the involved algorithms. Whereas dedicated computing devices (GPUs in particular) exist and do provide significant efficiency boosts, they have an extra cost of use in…

The analysis of source code through machine learning techniques is an increasingly explored research topic aiming at increasing smartness in the software toolchain to exploit modern architectures in the best possible way. In the case of…

Machine Learning · Computer Science 2020-12-15 Emanuele Parisi , Francesco Barchi , Andrea Bartolini , Giuseppe Tagliavini , Andrea Acquaviva

As the interest in FPGA-based accelerators for HPC applications increases, new challenges also arise, especially concerning different programming and portability issues. This paper aims to provide a snapshot of the current state of the FPGA…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-09-06 Manuel de Castro , Francisco J. andújar , Roberto R. Osorio , Rocío Carratalá-Sáez , Diego R. Llanos

As inference workloads for large language models (LLMs) scale to meet growing user demand, pipeline parallelism (PP) has become a widely adopted strategy for multi-GPU deployment, particularly in cross-node setups, to improve key-value (KV)…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-06-30 Yongchao He , Bohan Zhao , Zheng Cao

Modern computer systems typically conbine multicore CPUs with accelerators like GPUs for inproved performance and energy efficiency. However, these sys- tems suffer from poor performance portability, code tuned for one device must be…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-05-23 Thomas L. Falch , Anne C. Elster
‹ Prev 1 2 3 10 Next ›