Related papers: Exploiting Data Parallelism in the yConvex Hypergr…

Parallel Statistical Multi-resolution Estimation

We discuss several strategies to implement Dykstra's projection algorithm on NVIDIA's compute unified device architecture (CUDA). Dykstra's algorithm is the central step in and the computationally most expensive part of statistical…

Computational Physics · Physics 2015-03-13 Jan Lebert , Lutz Künneke , Johannes Hagemann , Stephan C. Kramer

PAGANI: A Parallel Adaptive GPU Algorithm for Numerical

We present a new adaptive parallel algorithm for the challenging problem of multi-dimensional numerical integration on massively parallel architectures. Adaptive algorithms have demonstrated the best performance, but efficient many-core…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-06-24 Ioannis Sakiotis , Kamesh Arumugam , Marc Paterno , Desh Ranjan , Balša Terzić , Mohammad Zubair

Performance evaluation in the reconstruction of 2D images of computed tomography using massively parallel programming CUDA

Analysis of processing time and similarity of images generated between CPU and GPU architectures and sequential and parallel programming. For image processing a computer with AMD FX-8350 processor and an Nvidia GTX 960 Maxwell GPU was used,…

Medical Physics · Physics 2022-02-10 Alexssandro Ferreira Cordeiro , Pedro Luiz de Paula Filho , Hamilton Pereira da Silva , Arnaldo Candido Junior , Edresson Casanova , Jandrei Sartori Spancerski

Computation of gray-level co-occurrence matrix based on CUDA and its optimization

As in various fields like scientific research and industrial application, the computation time optimization is becoming a task that is of increasing importance because of its highly parallel architecture. The graphics processing unit is…

Performance · Computer Science 2017-10-18 Huichao Hong , Lixin Zheng , Shuwan Pan

Accelerating the Convex Hull Computation with a Parallel GPU Algorithm

The convex hull is a fundamental geometrical structure for many applications where groups of points must be enclosed or represented by a convex polygon. Although efficient sequential convex hull algorithms exist, and are constantly being…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-09-27 Alan Keith , Héctor Ferrada , Cristóbal A. Navarro

Scalable and accurate multi-GPU based image reconstruction of large-scale ptychography data

While the advances in synchrotron light sources, together with the development of focusing optics and detectors, allow nanoscale ptychographic imaging of materials and biological specimens, the corresponding experiments can yield…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-04-05 Xiaodong Yu , Viktor Nikitin , Daniel J. Ching , Selin Aslan , Doga Gursoy , Tekin Bicer

Deep Learning and Machine Learning with GPGPU and CUDA: Unlocking the Power of Parallel Computing

General Purpose Graphics Processing Unit (GPGPU) computing plays a transformative role in deep learning and machine learning by leveraging the computational advantages of parallel processing. Through the power of Compute Unified Device…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-11-20 Ming Li , Ziqian Bi , Tianyang Wang , Yizhu Wen , Qian Niu , Xinyuan Song , Zekun Jiang , Junyu Liu , Benji Peng , Sen Zhang , Xuanhe Pan , Jiawei Xu , Jinlang Wang , Keyu Chen , Caitlyn Heqi Yin , Pohsun Feng , Ming Liu

A Review of CUDA, MapReduce, and Pthreads Parallel Computing Models

The advent of high performance computing (HPC) and graphics processing units (GPU), present an enormous computation resource for Large data transactions (big data) that require parallel processing for robust and prompt data analysis. While…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-10-17 Kato Mivule , Benjamin Harvey , Crystal Cobb , Hoda El Sayed

Recognition of convolutional neural network based on CUDA Technology

For the problem whether Graphic Processing Unit(GPU),the stream processor with high performance of floating-point computing is applicable to neural networks, this paper proposes the parallel recognition algorithm of Convolutional Neural…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-08-28 Yi-bin Huang , Kang Li , Ge Wang , Min Cao , Pin Li , Yu-jia Zhang

The Maximum Common Subgraph Problem: A Portfolio Approach

The Maximum Common Subgraph is a computationally challenging problem with countless practical applications. Even if it has been long proven NP-hard, its importance still motivates searching for exact solutions. This work starts by…

Data Structures and Algorithms · Computer Science 2020-11-09 Andrea Marcelli , Stefano Quer , Giovanni Squillero

High-Performance Parallelization of Dijkstra's Algorithm Using MPI and CUDA

This paper investigates the parallelization of Dijkstra's algorithm for computing the shortest paths in large-scale graphs using MPI and CUDA. The primary hypothesis is that by leveraging parallel computing, the computation time can be…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-04-08 Boyang Song

Performance Comparison Between OpenCV Built in CPU and GPU Functions on Image Processing Operations

Image Processing is a specialized area of Digital Signal Processing which contains various mathematical and algebraic operations such as matrix inversion, transpose of matrix, derivative, convolution, Fourier Transform etc. Operations like…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-06-24 Batuhan Hangün , Önder Eyecioğlu

Heterogeneous Highly Parallel Implementation of Matrix Exponentiation Using GPU

The vision of super computer at every desk can be realized by powerful and highly parallel CPUs or GPUs or APUs. Graphics processors once specialized for the graphics applications only, are now used for the highly computational intensive…

Distributed, Parallel, and Cluster Computing · Computer Science 2012-04-16 Chittampally Vasanth Raja , Srinivas Balasubramanian , Prakash S Raghavendra

High-Performance GPU-to-CPU Transpilation and Optimization via High-Level Parallel Constructs

While parallelism remains the main source of performance, architectural implementations and programming models change with each new hardware generation, often leading to costly application re-engineering. Most tools for performance…

Programming Languages · Computer Science 2022-07-04 William S. Moses , Ivan R. Ivanov , Jens Domke , Toshio Endo , Johannes Doerfert , Oleksandr Zinenko

Accelerating Matrix Multiplication: A Performance Comparison Between Multi-Core CPU and GPU

Matrix multiplication is a foundational operation in scientific computing and machine learning, yet its computational complexity makes it a significant bottleneck for large-scale applications. The shift to parallel architectures, primarily…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-07-30 Mufakir Qamar Ansari , Mudabir Qamar Ansari

Towards a GPU-Parallelization of the neXtSIM-DG Dynamical Core

The cryosphere plays a significant role in Earth's climate system. Therefore, an accurate simulation of sea ice is of great importance to improve climate projections. To enable higher resolution simulations, graphics processing units (GPUs)…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-02-29 Robert Jendersie , Christian Lessig , Thomas Richter

Manycore processing of repeated range queries over massive moving objects observations

The ability to timely process significant amounts of continuously updated spatial data is mandatory for an increasing number of applications. Parallelism enables such applications to face this data-intensive challenge and allows the devised…

Databases · Computer Science 2014-11-13 Francesco Lettich , Salvatore Orlando , Claudio Silvestri , Christian S. Jensen

GPGPU Processing in CUDA Architecture

The future of computation is the Graphical Processing Unit, i.e. the GPU. The promise that the graphics cards have shown in the field of image processing and accelerated rendering of 3D scenes, and the computational capability that these…

Distributed, Parallel, and Cluster Computing · Computer Science 2012-02-21 Jayshree Ghorpade , Jitendra Parande , Madhura Kulkarni , Amit Bawaskar

Parallel-in-Time Nonlinear Optimal Control via GPU-native Sequential Convex Programming

Real-time trajectory optimization for nonlinear constrained autonomous systems is critical and typically performed by CPU-based sequential solvers. Specifically, reliance on global sparse linear algebra or the serial nature of dynamic…

Robotics · Computer Science 2026-03-13 Yilin Zou , Zhong Zhang , Maxime Robic , Fanghua Jiang

Using Hierarchical Parallelism to Accelerate the Solution of Many Small Partial Differential Equations

This paper presents efforts to improve the hierarchical parallelism of a two scale simulation code. Two methods to improve the GPU parallel performance were developed and compared. The first used the NVIDIA Multi-Process Service and the…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-05-15 Jacob Merson , Mark S. Shephard