English
Related papers

Related papers: Parallel Sub-Structuring Methods for solving Spars…

200 papers

The acceleration of sparse matrix computations on modern many-core processors, such as the graphics processing units (GPUs), has been recognized and studied over a decade. Significant performance enhancements have been achieved for many…

Mathematical Software · Computer Science 2017-10-16 Ruipeng Li

Parallel computing can offer an enormous advantage regarding the performance for very large applications in almost any field: scientific computing, computer vision, databases, data mining, and economics. GPUs are high performance many-core…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-11-24 Bogdan Oancea , Tudorel Andrei , Raluca Mariana Dragoescu

Nowadays, several industrial applications are being ported to parallel architectures. In fact, these platforms allow acquire more performance for system modelling and simulation. In the electric machines area, there are many problems which…

Distributed, Parallel, and Cluster Computing · Computer Science 2010-10-25 Antonio Wendell De Oliveira Rodrigues , Frédéric Guyomarch , Yvonnick Le Menach , Jean-Luc Dekeyser

Matrix multiplication is a foundational operation in scientific computing and machine learning, yet its computational complexity makes it a significant bottleneck for large-scale applications. The shift to parallel architectures, primarily…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-07-30 Mufakir Qamar Ansari , Mudabir Qamar Ansari

We propose a new hybrid topology optimization algorithm based on multigrid approach that combines the parallelization strategy of CPU using OpenMP and heavily multithreading capabilities of modern Graphics Processing Units (GPU). In…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-02-01 Arya Prakash Padhi , Souvik Chakraborty , Anupam Chakrabarti , Rajib Chowdhury

Nowadays, the paradigm of parallel computing is changing. CUDA is now a popular programming model for general purpose computations on GPUs and a great number of applications were ported to CUDA obtaining speedups of orders of magnitude…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-02-09 Bogdan Oancea , Tudorel Andrei

This paper investigates the parallelization of Dijkstra's algorithm for computing the shortest paths in large-scale graphs using MPI and CUDA. The primary hypothesis is that by leveraging parallel computing, the computation time can be…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-04-08 Boyang Song

We describe the GPU implementation of shifted or multimass iterative solvers for sparse linear systems of the sort encountered in lattice gauge theory. We provide a generic tool that can be used by those without GPU programming experience…

High Energy Physics - Lattice · Physics 2011-02-16 Richard Galvez , Greg van Anders

We discuss an approach for solving sparse or dense banded linear systems ${\bf A} {\bf x} = {\bf b}$ on a Graphics Processing Unit (GPU) card. The matrix ${\bf A} \in {\mathbb{R}}^{N \times N}$ is possibly nonsymmetric and moderately large;…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-09-29 Ang Li , Radu Serban , Dan Negrut

We present a new adaptive parallel algorithm for the challenging problem of multi-dimensional numerical integration on massively parallel architectures. Adaptive algorithms have demonstrated the best performance, but efficient many-core…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-06-24 Ioannis Sakiotis , Kamesh Arumugam , Marc Paterno , Desh Ranjan , Balša Terzić , Mohammad Zubair

In this paper, we use graphics processing units(GPU) to accelerate sparse and arbitrary structured neural networks. Sparse networks have nodes in the network that are not fully connected with nodes in preceding and following layers, and…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-05-12 Aavaas Gajurel , Sushil J. Louis , Frederick C Harris

Linear Programs (LPs) appear in a large number of applications and offloading them to a GPU is viable to gain performance. Existing work on offloading and solving an LP on a GPU suggests that there is performance gain generally on large…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-02-26 Amit Gurung , Rajarshi Ray

This paper presents a parallel preconditioning approach based on incomplete LU (ILU) factorizations in the framework of Domain Decomposition (DD) for general sparse linear systems. We focus on distributed memory parallel architectures,…

Numerical Analysis · Mathematics 2023-03-17 Tianshi Xu , Ruipeng Li , Daniel Osei-Kuffuor

General Purpose Graphics Processing Unit (GPGPU) computing plays a transformative role in deep learning and machine learning by leveraging the computational advantages of parallel processing. Through the power of Compute Unified Device…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-11-20 Ming Li , Ziqian Bi , Tianyang Wang , Yizhu Wen , Qian Niu , Xinyuan Song , Zekun Jiang , Junyu Liu , Benji Peng , Sen Zhang , Xuanhe Pan , Jiawei Xu , Jinlang Wang , Keyu Chen , Caitlyn Heqi Yin , Pohsun Feng , Ming Liu

In this paper, we aim to introduce a new perspective when comparing highly parallelized algorithms on GPU: the energy consumption of the GPU. We give an analysis of the performance of linear algebra operations, including addition of…

Numerical Analysis · Mathematics 2021-12-22 Abal-Kassim Cheik Ahamed , Alban Desmaison , Frederic Magoules

We propose a parallel graph-based data clustering algorithm using CUDA GPU, based on exact clustering of the minimum spanning tree in terms of a minimum isoperimetric criteria. We also provide a comparative performance analysis of our…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-02-17 Ramin Javadi , Saleh Ashkboos

The simplex algorithm has been successfully used for many years in solving linear programming (LP) problems. Due to the intensive computations required (especially for the solution of large LP problems), parallel approaches have also…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-11-22 Basilis Mamalis , Marios Perlitis

An existing hybrid MPI-OpenMP scheme is augmented with a CUDA-based fine grain parallelization approach for multidimensional distributed Fourier transforms, in a well-characterized pseudospectral fluid turbulence code. Basics of the hybrid…

Computational Physics · Physics 2018-08-07 Duane Rosenberg , Pablo D. Mininni , Raghu Reddy , Annick Pouquet

Parallel computing is a standard approach to achieving high-performance computing (HPC). Three commonly used methods to implement parallel computing include: 1) applying multithreading technology on single-core or multi-core CPUs; 2)…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-09-18 Xinyao Yi

Principal component analysis (PCA) is a statistical technique commonly used in multivariate data analysis. However, PCA can be difficult to interpret and explain since the principal components (PCs) are linear combinations of the original…

Mathematical Software · Computer Science 2013-12-24 W. Liu , H. Zhang , D. Tao , Y. Wang , K. Lu
‹ Prev 1 2 3 10 Next ›