Related papers: Equal bi-Vectorized (EBV) method to high performan…

Efficient GPU implementation of randomized SVD and its applications

Matrix decompositions are ubiquitous in machine learning, including applications in dimensionality reduction, data compression and deep learning algorithms. Typical solutions for matrix decompositions have polynomial complexity which…

Machine Learning · Computer Science 2024-03-13 Łukasz Struski , Paweł Morkisz , Przemysław Spurek , Samuel Rodriguez Bernabeu , Tomasz Trzciński

Parallel Sparse Matrix Solver on the GPU Applied to Simulation of Electrical Machines

Nowadays, several industrial applications are being ported to parallel architectures. In fact, these platforms allow acquire more performance for system modelling and simulation. In the electric machines area, there are many problems which…

Distributed, Parallel, and Cluster Computing · Computer Science 2010-10-25 Antonio Wendell De Oliveira Rodrigues , Frédéric Guyomarch , Yvonnick Le Menach , Jean-Luc Dekeyser

Randomized LU Decomposition Using Sparse Projections

A fast algorithm for the approximation of a low rank LU decomposition is presented. In order to achieve a low complexity, the algorithm uses sparse random projections combined with FFT-based random projections. The asymptotic approximation…

Numerical Analysis · Mathematics 2016-01-19 Yariv Aizenbud , Gil Shabat , Amir Averbuch

A GPU-Accelerated Distributed Algorithm for Optimal Power Flow in Distribution Systems

We propose a GPU-accelerated distributed optimization algorithm for controlling multi-phase optimal power flow in active distribution systems with dynamically changing topologies. To handle varying network configurations and enable…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-01-15 Minseok Ryu , Geunyeong Byeon , Kibaek Kim

High-Throughput and Memory-Efficient Parallel Viterbi Decoder for Convolutional Codes on GPU

This paper describes a parallel implementation of Viterbi decoding algorithm. Viterbi decoder is widely used in many state-of-the-art wireless systems. The proposed solution optimizes both throughput and memory usage by applying…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-11-19 Alireza Mohammadidoost , Matin Hashemi

A Two-level GPU-Accelerated Incomplete LU Preconditioner for General Sparse Linear Systems

This paper presents a parallel preconditioning approach based on incomplete LU (ILU) factorizations in the framework of Domain Decomposition (DD) for general sparse linear systems. We focus on distributed memory parallel architectures,…

Numerical Analysis · Mathematics 2023-03-17 Tianshi Xu , Ruipeng Li , Daniel Osei-Kuffuor

A GPU-based Multi-level Algorithm for Boundary Value Problems

A novel and scalable geometric multi-level algorithm is presented for the numerical solution of elliptic partial differential equations, specially designed to run with high occupancy of streaming processors inside Graphics Processing…

Mathematical Software · Computer Science 2017-03-22 J. T. Becerra-Sagredo , F. Mandujano , C. Malaga

A GPU-based Distributed Algorithm for Linearized Optimal Power Flow in Distribution Systems

We propose a GPU-based distributed optimization algorithm, aimed at controlling optimal power flow in multi-phase and unbalanced distribution systems. Typically, conventional distributed optimization algorithms employed in such scenarios…

Optimization and Control · Mathematics 2023-10-17 Minseok Ryu , Geunyeong Byeon , Kibaek Kim

Analysis of A Splitting Approach for the Parallel Solution of Linear Systems on GPU Cards

We discuss an approach for solving sparse or dense banded linear systems ${\bf A} {\bf x} = {\bf b}$ on a Graphics Processing Unit (GPU) card. The matrix ${\bf A} \in {\mathbb{R}}^{N \times N}$ is possibly nonsymmetric and moderately large;…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-09-29 Ang Li , Radu Serban , Dan Negrut

A Decomposition Based Approach for Solving a General Bilevel Linear Programming

Bilevel optimization has been widely used in decision-making process. However, there still lacks an efficient algorithm to determine an optimal solution of a bilevel optimization problem, especially for a large-size problem. To bridge the…

Optimization and Control · Mathematics 2016-05-18 Xuan Liu , Zuyi Li

An Efficient Batch Solver for the Singular Value Decomposition on GPUs

The singular value decomposition (SVD) is a powerful tool in modern numerical linear algebra, which underpins computational methods such as principal component analysis (PCA), low-rank approximations, and randomized algorithms. Many…

Mathematical Software · Computer Science 2026-04-10 Ahmad Abdelfattah , Massimiliano Fasi

Randomized LU Decomposition

We present a fast randomized algorithm that computes a low rank LU decomposition. Our algorithm uses random projections type techniques to efficiently compute a low rank approximation of large matrices. The randomized LU algorithm can be…

Numerical Analysis · Mathematics 2016-02-02 Gil Shabat , Yaniv Shmueli , Yariv Aizenbud , Amir Averbuch

GLU3.0: Fast GPU-based Parallel Sparse LU Factorization for Circuit Simulation

LU factorization for sparse matrices is the most important computing step for many engineering and scientific computing problems such as circuit simulation. But parallelizing LU factorization with the Graphic Processing Units (GPU) still…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-02-14 Shaoyi Peng , Sheldon X. -D. Tan

High-Throughput Parallel Viterbi Decoder on GPU Tensor Cores

Many research works have been performed on implementation of Vitrerbi decoding algorithm on GPU instead of FPGA because this platform provides considerable flexibility in addition to great performance. Recently, the recently-introduced…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-11-30 Alireza Mohammadidoost , Matin Hashemi

Parallel Triangular Solvers on GPU

In this paper, we investigate GPU based parallel triangular solvers systematically. The parallel triangular solvers are fundamental to incomplete LU factorization family preconditioners and algebraic multigrid solvers. We develop a new…

Mathematical Software · Computer Science 2016-06-03 Zhangxin Chen , Hui Liu , Bo Yang

Dielectric breakdown prediction with GPU-accelerated BEM

The prediction of a dielectric breakdown in a high-voltage device is based on criteria that evaluate the electric field along field lines. Therefore it is necessary to efficiently compute the electric field at arbitrary points in space. A…

Numerical Analysis · Mathematics 2020-11-03 Cedric Münger , Steffen Börm , Jörg Ostrowski

An inherently parallel H2-ULV factorization for solving dense linear systems on GPUs

Hierarchical low-rank approximation of dense matrices can reduce the complexity of their factorization from O(N^3) to O(N). However, the complex structure of such hierarchical matrices makes them difficult to parallelize. The block size and…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-02-05 Qianxiang Ma , Rio Yokota

A Geometric Approach to Matrix Ordering

We present a recursive way to partition hypergraphs which creates and exploits hypergraph geometry and is suitable for many-core parallel architectures. Such partitionings are then used to bring sparse matrices in a recursive Bordered Block…

Data Structures and Algorithms · Computer Science 2011-05-24 B. O. Fagginger Auer , R. H. Bisseling

A Parallel Boundary Element Method for the Electromagnetic Analysis of Large Structures With Lossy Conductors

In this paper, we propose an efficient parallelization strategy for boundary element method (BEM) solvers that perform the electromagnetic analysis of structures with lossy conductors. The proposed solver is accelerated with the adaptive…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-11-30 Damian Marek , Shashwat Sharma , Piero Triverio

Efficient GPU-Centered Singular Value Decomposition Using the Divide-and-Conquer Method

Singular Value Decomposition (SVD) is a fundamental matrix factorization technique in linear algebra, widely applied in numerous matrix-related problems. However, traditional SVD approaches are hindered by slow panel factorization and…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-08-18 Shifang Liu , Huiyuan Li , Hongjiao Sheng , Haoyuan Gui , Xiaoyu Zhang