Related papers: Equal bi-Vectorized (EBV) method to high performan…
Matrix decompositions are ubiquitous in machine learning, including applications in dimensionality reduction, data compression and deep learning algorithms. Typical solutions for matrix decompositions have polynomial complexity which…
Nowadays, several industrial applications are being ported to parallel architectures. In fact, these platforms allow acquire more performance for system modelling and simulation. In the electric machines area, there are many problems which…
A fast algorithm for the approximation of a low rank LU decomposition is presented. In order to achieve a low complexity, the algorithm uses sparse random projections combined with FFT-based random projections. The asymptotic approximation…
We propose a GPU-accelerated distributed optimization algorithm for controlling multi-phase optimal power flow in active distribution systems with dynamically changing topologies. To handle varying network configurations and enable…
This paper describes a parallel implementation of Viterbi decoding algorithm. Viterbi decoder is widely used in many state-of-the-art wireless systems. The proposed solution optimizes both throughput and memory usage by applying…
This paper presents a parallel preconditioning approach based on incomplete LU (ILU) factorizations in the framework of Domain Decomposition (DD) for general sparse linear systems. We focus on distributed memory parallel architectures,…
A novel and scalable geometric multi-level algorithm is presented for the numerical solution of elliptic partial differential equations, specially designed to run with high occupancy of streaming processors inside Graphics Processing…
We propose a GPU-based distributed optimization algorithm, aimed at controlling optimal power flow in multi-phase and unbalanced distribution systems. Typically, conventional distributed optimization algorithms employed in such scenarios…
We discuss an approach for solving sparse or dense banded linear systems ${\bf A} {\bf x} = {\bf b}$ on a Graphics Processing Unit (GPU) card. The matrix ${\bf A} \in {\mathbb{R}}^{N \times N}$ is possibly nonsymmetric and moderately large;…
Bilevel optimization has been widely used in decision-making process. However, there still lacks an efficient algorithm to determine an optimal solution of a bilevel optimization problem, especially for a large-size problem. To bridge the…
The singular value decomposition (SVD) is a powerful tool in modern numerical linear algebra, which underpins computational methods such as principal component analysis (PCA), low-rank approximations, and randomized algorithms. Many…
We present a fast randomized algorithm that computes a low rank LU decomposition. Our algorithm uses random projections type techniques to efficiently compute a low rank approximation of large matrices. The randomized LU algorithm can be…
LU factorization for sparse matrices is the most important computing step for many engineering and scientific computing problems such as circuit simulation. But parallelizing LU factorization with the Graphic Processing Units (GPU) still…
Many research works have been performed on implementation of Vitrerbi decoding algorithm on GPU instead of FPGA because this platform provides considerable flexibility in addition to great performance. Recently, the recently-introduced…
In this paper, we investigate GPU based parallel triangular solvers systematically. The parallel triangular solvers are fundamental to incomplete LU factorization family preconditioners and algebraic multigrid solvers. We develop a new…
The prediction of a dielectric breakdown in a high-voltage device is based on criteria that evaluate the electric field along field lines. Therefore it is necessary to efficiently compute the electric field at arbitrary points in space. A…
Hierarchical low-rank approximation of dense matrices can reduce the complexity of their factorization from O(N^3) to O(N). However, the complex structure of such hierarchical matrices makes them difficult to parallelize. The block size and…
We present a recursive way to partition hypergraphs which creates and exploits hypergraph geometry and is suitable for many-core parallel architectures. Such partitionings are then used to bring sparse matrices in a recursive Bordered Block…
In this paper, we propose an efficient parallelization strategy for boundary element method (BEM) solvers that perform the electromagnetic analysis of structures with lossy conductors. The proposed solver is accelerated with the adaptive…
Singular Value Decomposition (SVD) is a fundamental matrix factorization technique in linear algebra, widely applied in numerous matrix-related problems. However, traditional SVD approaches are hindered by slow panel factorization and…