Related papers: Exploiting Data Parallelism in the yConvex Hypergr…
We discuss several strategies to implement Dykstra's projection algorithm on NVIDIA's compute unified device architecture (CUDA). Dykstra's algorithm is the central step in and the computationally most expensive part of statistical…
We present a new adaptive parallel algorithm for the challenging problem of multi-dimensional numerical integration on massively parallel architectures. Adaptive algorithms have demonstrated the best performance, but efficient many-core…
Analysis of processing time and similarity of images generated between CPU and GPU architectures and sequential and parallel programming. For image processing a computer with AMD FX-8350 processor and an Nvidia GTX 960 Maxwell GPU was used,…
As in various fields like scientific research and industrial application, the computation time optimization is becoming a task that is of increasing importance because of its highly parallel architecture. The graphics processing unit is…
The convex hull is a fundamental geometrical structure for many applications where groups of points must be enclosed or represented by a convex polygon. Although efficient sequential convex hull algorithms exist, and are constantly being…
While the advances in synchrotron light sources, together with the development of focusing optics and detectors, allow nanoscale ptychographic imaging of materials and biological specimens, the corresponding experiments can yield…
General Purpose Graphics Processing Unit (GPGPU) computing plays a transformative role in deep learning and machine learning by leveraging the computational advantages of parallel processing. Through the power of Compute Unified Device…
The advent of high performance computing (HPC) and graphics processing units (GPU), present an enormous computation resource for Large data transactions (big data) that require parallel processing for robust and prompt data analysis. While…
For the problem whether Graphic Processing Unit(GPU),the stream processor with high performance of floating-point computing is applicable to neural networks, this paper proposes the parallel recognition algorithm of Convolutional Neural…
The Maximum Common Subgraph is a computationally challenging problem with countless practical applications. Even if it has been long proven NP-hard, its importance still motivates searching for exact solutions. This work starts by…
This paper investigates the parallelization of Dijkstra's algorithm for computing the shortest paths in large-scale graphs using MPI and CUDA. The primary hypothesis is that by leveraging parallel computing, the computation time can be…
Image Processing is a specialized area of Digital Signal Processing which contains various mathematical and algebraic operations such as matrix inversion, transpose of matrix, derivative, convolution, Fourier Transform etc. Operations like…
The vision of super computer at every desk can be realized by powerful and highly parallel CPUs or GPUs or APUs. Graphics processors once specialized for the graphics applications only, are now used for the highly computational intensive…
While parallelism remains the main source of performance, architectural implementations and programming models change with each new hardware generation, often leading to costly application re-engineering. Most tools for performance…
Matrix multiplication is a foundational operation in scientific computing and machine learning, yet its computational complexity makes it a significant bottleneck for large-scale applications. The shift to parallel architectures, primarily…
The cryosphere plays a significant role in Earth's climate system. Therefore, an accurate simulation of sea ice is of great importance to improve climate projections. To enable higher resolution simulations, graphics processing units (GPUs)…
The ability to timely process significant amounts of continuously updated spatial data is mandatory for an increasing number of applications. Parallelism enables such applications to face this data-intensive challenge and allows the devised…
The future of computation is the Graphical Processing Unit, i.e. the GPU. The promise that the graphics cards have shown in the field of image processing and accelerated rendering of 3D scenes, and the computational capability that these…
Real-time trajectory optimization for nonlinear constrained autonomous systems is critical and typically performed by CPU-based sequential solvers. Specifically, reliance on global sparse linear algebra or the serial nature of dynamic…
This paper presents efforts to improve the hierarchical parallelism of a two scale simulation code. Two methods to improve the GPU parallel performance were developed and compared. The first used the NVIDIA Multi-Process Service and the…