Related papers: High performance computing on Android devices -- a…

GPU backed Data Mining on Android Devices

Choosing an appropriate programming paradigm for high-performance computing on low-power devices can be useful to speed up calculations. Many Android devices have an integrated GPU and - although not officially supported - the OpenCL…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-12-10 Robert Fritze , Claudia Plant

Towards Green Computing: A Survey of Performance and Energy Efficiency of Different Platforms using OpenCL

When considering different hardware platforms, not just the time-to-solution can be of importance but also the energy necessary to reach it. This is not only the case with battery powered and mobile devices but also with high-performance…

Performance · Computer Science 2020-06-30 Philip Heinisch , Katharina Ostaszewski , Hendrik Ranocha

A Study of Performance Programming of CPU, GPU accelerated Computers and SIMD Architecture

Parallel computing is a standard approach to achieving high-performance computing (HPC). Three commonly used methods to implement parallel computing include: 1) applying multithreading technology on single-core or multi-core CPUs; 2)…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-09-18 Xinyao Yi

EngineCL: Usability and Performance in Heterogeneous Computing

Heterogeneous systems have become one of the most common architectures today, thanks to their excellent performance and energy consumption. However, due to their heterogeneity they are very complex to program and even more to achieve…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-10-26 Raúl Nozal , Jose Luis Bosque , Ramón Beivide

A Performance Comparison of Different Graphics Processing Units Running Direct N-Body Simulations

Hybrid computational architectures based on the joint power of Central Processing Units and Graphic Processing Units (GPUs) are becoming popular and powerful hardware tools for a wide range of simulations in biology, chemistry, engineering,…

Instrumentation and Methods for Astrophysics · Physics 2015-06-15 Roberto Capuzzo-Dolcetta , Mario Spera

Power Consumption Analysis of Parallel Algorithms on GPUs

Due to their highly parallel multi-cores architecture, GPUs are being increasingly used in a wide range of computationally intensive applications. Compared to CPUs, GPUs can achieve higher performances at accelerating the programs'…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-10-05 Frédéric Magoulès , Abal-Kassim Cheik Ahamed , Alban Desmaison , Jean-Christophe Léchenet , François Mayer , Haifa Ben Salem , Thomas Zhu

CPU and/or GPU: Revisiting the GPU Vs. CPU Myth

Parallel computing using accelerators has gained widespread research attention in the past few years. In particular, using GPUs for general purpose computing has brought forth several success stories with respect to time taken, cost, power,…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-03-12 Kishore Kothapalli , Dip Sankar Banerjee , P. J. Narayanan , Surinder Sood , Aman Kumar Bahl , Shashank Sharma , Shrenik Lad , Krishna Kumar Singh , Kiran Matam , Sivaramakrishna Bharadwaj , Rohit Nigam , Parikshit Sakurikar , Aditya Deshpande , Ishan Misra , Siddharth Choudhary , Shubham Gupta

High Performance Computing with FPGAs and OpenCL

In this work we evaluate the potential of FPGAs for accelerating HPC workloads as a more power-efficient alternative to GPUs. Using High-Level Synthesis and a large set of optimization techniques, we show that FPGAs can achieve better…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-09-17 Hamid Reza Zohouri

Comparison of OpenMP & OpenCL Parallel Processing Technologies

This paper presents a comparison of OpenMP and OpenCL based on the parallel implementation of algorithms from various fields of computer applications. The focus of our study is on the performance of benchmark comparing OpenMP and OpenCL. We…

Distributed, Parallel, and Cluster Computing · Computer Science 2012-11-12 Krishnahari Thouti , S. R. Sathe

OpenCLIPER: an OpenCL-based C++ Framework for Overhead-Reduced Medical Image Processing and Reconstruction on Heterogeneous Devices

Medical image processing is often limited by the computational cost of the involved algorithms. Whereas dedicated computing devices (GPUs in particular) exist and do provide significant efficiency boosts, they have an extra cost of use in…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-08-01 Federico Simmross-Wattenberg , Manuel Rodríguez-Cayetano , Javier Royuela-del-Val , Elena Martín-González , Elisa Moya-Sáez , Marcos Martín-Fernández , Carlos Alberola-López

Benchmarking OpenCL, OpenACC, OpenMP, and CUDA: programming productivity, performance, and energy consumption

Many modern parallel computing systems are heterogeneous at their node level. Such nodes may comprise general purpose CPUs and accelerators (such as, GPU, or Intel Xeon Phi) that provide high performance with suitable energy-consumption…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-04-19 Suejb Memeti , Lu Li , Sabri Pllana , Joanna Kolodziej , Christoph Kessler

Investigation of heterogeneous computing platforms for real-time data analysis in the CBM experiment

Future experiments in high-energy physics will pose stringent requirements to computing, in particular to real-time data processing. As an example, the CBM experiment at FAIR Germany intends to perform online data selection exclusively in…

Computational Physics · Physics 2020-02-06 V. Singhal , S. Chattopadhyay , V. Friese

Accelerating Mobile Inference through Fine-Grained CPU-GPU Co-Execution

Deploying deep neural networks on mobile devices is increasingly important but remains challenging due to limited computing resources. On the other hand, their unified memory architecture and narrower gap between CPU and GPU performance…

Machine Learning · Computer Science 2026-02-20 Zhuojin Li , Marco Paolieri , Leana Golubchik

Building an Accelerated OpenFOAM Proof-of-Concept Application using Modern C++

The modern trend in High-Performance Computing (HPC) involves the use of accelerators such as Graphics Processing Units (GPUs) alongside Central Processing Units (CPUs) to speed up numerical operations in various applications. Leading…

Mathematical Software · Computer Science 2025-07-25 Giulio Malenza , Giovanni Stabile , Filippo Spiga , Robert Birke , Marco Aldinucci

PySchedCL: Leveraging Concurrency in Heterogeneous Data-Parallel Systems

In the past decade, high performance compute capabilities exhibited by heterogeneous GPGPU platforms have led to the popularity of data parallel programming languages such as CUDA and OpenCL. Such languages, however, involve a steep…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-09-17 Anirban Ghose , Siddharth Singh , Vivek Kulaharia , Lokesh Dokara , Srijeeta Maity , Soumyajit Dey

Parallelizing Workload Execution in Embedded and High-Performance Heterogeneous Systems

In this paper, we introduce a software-defined framework that enables the parallel utilization of all the programmable processing resources available in heterogeneous system-on-chip (SoC) including FPGA-based hardware accelerators and…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-02-12 Jose Nunez-Yanez , Mohammad Hosseinabady , Moslem Amiri , Andrés Rodríguez , Rafael Asenjo , Angeles Navarro , Rubén Gran-Tejero , Darío Suárez-Gracia

The Feasibility of Using OpenCL Instead of OpenMP for Parallel CPU Programming

OpenCL, along with CUDA, is one of the main tools used to program GPGPUs. However, it allows running the same code on multi-core CPUs too, making it a rival for the long-established OpenMP. In this paper we compare OpenCL and OpenMP when…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-03-24 Kamran Karimi

A Performance Comparison of CUDA and OpenCL

CUDA and OpenCL are two different frameworks for GPU programming. OpenCL is an open standard that can be used to program CPUs, GPUs, and other devices from different vendors, while CUDA is specific to NVIDIA GPUs. Although OpenCL promises a…

Performance · Computer Science 2011-05-17 Kamran Karimi , Neil G. Dickson , Firas Hamze

Open SYCL on heterogeneous GPU systems: A case of study

Computational platforms for high-performance scientific applications are becoming more heterogenous, including hardware accelerators such as multiple GPUs. Applications in a wide variety of scientific fields require an efficient and careful…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-10-12 Rocío Carratalá-Sáez , Francisco J. andújar , Yuri Torres , Arturo Gonzalez-Escribano , Diego R. Llanos

Enabling On-Device Smartphone GPU based Training: Lessons Learned

Deep Learning (DL) has shown impressive performance in many mobile applications. Most existing works have focused on reducing the computational and resource overheads of running Deep Neural Networks (DNN) inference on resource-constrained…

Machine Learning · Computer Science 2022-02-22 Anish Das , Young D. Kwon , Jagmohan Chauhan , Cecilia Mascolo