Related papers: Computing Optimal Cycle Mean in Parallel on CUDA

Improving the performance of the linear systems solvers using CUDA

Parallel computing can offer an enormous advantage regarding the performance for very large applications in almost any field: scientific computing, computer vision, databases, data mining, and economics. GPUs are high performance many-core…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-11-24 Bogdan Oancea , Tudorel Andrei , Raluca Mariana Dragoescu

An Efficient Parallel Data Clustering Algorithm Using Isoperimetric Number of Trees

We propose a parallel graph-based data clustering algorithm using CUDA GPU, based on exact clustering of the minimum spanning tree in terms of a minimum isoperimetric criteria. We also provide a comparative performance analysis of our…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-02-17 Ramin Javadi , Saleh Ashkboos

High-Performance Parallelization of Dijkstra's Algorithm Using MPI and CUDA

This paper investigates the parallelization of Dijkstra's algorithm for computing the shortest paths in large-scale graphs using MPI and CUDA. The primary hypothesis is that by leveraging parallel computing, the computation time can be…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-04-08 Boyang Song

Design and optimization of DBSCAN Algorithm based on CUDA

DBSCAN is a very classic algorithm for data clus- tering, which is widely used in many fields. However, with the data scale growing much more bigger than before, the traditional serial algorithm can not meet the performance requirement.…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-06-09 Bingchen Wang , Chenglong Zhang , Lei Song , Lianhe Zhao , Yu Dou , Zihao Yu

Asynchronous Parallel Computing Algorithm implemented in 1D Heat Equation with CUDA

In this note, we present the stability as well as performance analysis of asynchronous parallel computing algorithm implemented in 1D heat equation with CUDA. The primary objective of this note lies in dissemination of asynchronous parallel…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-11-03 Kooktae Lee , Raktim Bhattacharya

Generating Binary Optimal Codes Using Heterogeneous Parallel Computing

Generation of optimal codes is a well known problem in coding theory. Many computational approaches exist in the literature for finding record breaking codes. However generating codes with long lengths $n$ using serial algorithms is…

Information Theory · Computer Science 2015-07-21 Srajan Paliwal , Saurabh Tiwary , Bhaskar Chaudhury , Manish K. Gupta

ML-Based Optimum Number of CUDA Streams for the GPU Implementation of the Tridiagonal Partition Method

This paper presents a heuristic for finding the optimum number of CUDA streams by using tools common to the modern AI-oriented approaches and applied to the parallel partition algorithm. A time complexity model for the GPU realization of…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-22 Milena Veneva , Toshiyuki Imamura

Speculative Parallel Evaluation Of Classification Trees On GPGPU Compute Engines

We examine the problem of optimizing classification tree evaluation for on-line and real-time applications by using GPUs. Looking at trees with continuous attributes often used in image segmentation, we first put the existing algorithms for…

Distributed, Parallel, and Cluster Computing · Computer Science 2011-11-08 Jason Spencer

Boosting Performance of Iterative Applications on GPUs: Kernel Batching with CUDA Graphs

Graphics Processing Units (GPUs) have become the standard in accelerating scientific applications on heterogeneous systems. However, as GPUs are getting faster, one potential performance bottleneck with GPU-accelerated applications is the…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-05-01 Jonah Ekelund , Stefano Markidis , Ivy Peng

Bounds on Longest Simple Cycles in Weighted Directed Graphs via Optimum Cycle Means

The problem of finding the longest simple cycle in a directed graph is NP-hard, with critical applications in computational biology, scheduling, and network analysis. Existing approaches include exact algorithms with exponential runtimes,…

Data Structures and Algorithms · Computer Science 2026-01-13 Ali Dasdan

Computation of gray-level co-occurrence matrix based on CUDA and its optimization

As in various fields like scientific research and industrial application, the computation time optimization is becoming a task that is of increasing importance because of its highly parallel architecture. The graphics processing unit is…

Performance · Computer Science 2017-10-18 Huichao Hong , Lixin Zheng , Shuwan Pan

Finding Optimal Longest Paths by Dynamic Programming in Parallel

We propose an exact algorithm for solving the longest simple path problem between two given vertices in undirected weighted graphs. By using graph partitioning and dynamic programming, we obtain an algorithm that is significantly faster…

Data Structures and Algorithms · Computer Science 2019-05-10 Kai Fieger , Tomas Balyo , Christian Schulz , Dominik Schreiber

Accelerating numerical solution of Stochastic Differential Equations with CUDA

Numerical integration of stochastic differential equations is commonly used in many branches of science. In this paper we present how to accelerate this kind of numerical calculations with popular NVIDIA Graphics Processing Units using the…

Computational Physics · Physics 2011-05-31 M. Januszewski , M. Kostur

Efficient Parallel Algorithms for Hypergraph Matching

We present efficient parallel algorithms for computing maximal matchings in hypergraphs. Our algorithm finds locally maximal edges in the hypergraph and adds them in parallel to the matching. In the CRCW PRAM models our algorithms achieve…

Data Structures and Algorithms · Computer Science 2026-03-13 Henrik Reinstädtler , Christian Schulz , Nodari Sitchinava , Fabian Walliser

Parallelization of Kmeans++ using CUDA

K-means++ is an algorithm which is invented to improve the process of finding initial seeds in K-means algorithm. In this algorithm, initial seeds are chosen consecutively by a probability which is proportional to the distance to the…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-08-07 Maliheh Heydarpour Shahrezaei , Reza Tavoli

On Parallel Solution of Sparse Triangular Linear Systems in CUDA

The acceleration of sparse matrix computations on modern many-core processors, such as the graphics processing units (GPUs), has been recognized and studied over a decade. Significant performance enhancements have been achieved for many…

Mathematical Software · Computer Science 2017-10-16 Ruipeng Li

Multistep schemes for solving backward stochastic differential equations on GPU

The goal of this work is to parallelize the multistep scheme for the numerical approximation of the backward stochastic differential equations (BSDEs) in order to achieve both, a high accuracy and a reduction of the computation time as…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-04-18 Lorenc Kapllani , Long Teng

Parallel and distributed optimization methods for estimation and control in networks

System performance for networks composed of interconnected subsystems can be increased if the traditionally separated subsystems are jointly optimized. Recently, parallel and distributed optimization methods have emerged as a powerful tool…

Optimization and Control · Mathematics 2013-02-14 Ion Necoara , Valentin Nedelcu , Ioan Dumitrache

cuAPO: A CUDA-based Parallelization of Artificial Protozoa Optimizer

Metaheuristic algorithms are widely used for solving complex problems due to their ability to provide near-optimal solutions. But the execution time of these algorithms increases with the problem size and/or solution space. And, to get more…

Neural and Evolutionary Computing · Computer Science 2025-12-16 Henish Soliya , Anugrah Jain

Stochastic first passage time accelerated with CUDA

The numerical integration of stochastic trajectories to estimate the time to pass a threshold is an interesting physical quantity, for instance in Josephson junctions and atomic force microscopy, where the full trajectory is not accessible.…

Computational Physics · Physics 2018-02-15 Vincenzo Pierro , Luigi Troiano , Elena Mejuto , Giovannni Filatrella