Related papers: A Left-Looking Selected Inversion Algorithm and Ta…

PSelInv -- A Distributed Memory Parallel Algorithm for Selected Inversion : the Symmetric Case

We describe an efficient parallel implementation of the selected inversion algorithm for distributed memory computer systems, which we call \texttt{PSelInv}. The \texttt{PSelInv} method computes selected elements of a general sparse matrix…

Numerical Analysis · Mathematics 2015-06-01 Mathias Jacquelin , Lin Lin , Chao Yang

Blockwise inversion and algorithms for inverting large partitioned matrices

Block matrix structure is commonly arising is various physics and engineering applications. There are various advantages in preserving the blocks structure while computing the inversion of such partitioned matrices. In this context, using…

Numerical Analysis · Mathematics 2023-11-22 R. Thiru Senthil

PSelInv - A Distributed Memory Parallel Algorithm for Selected Inversion: the non-symmetric Case

This paper generalizes the parallel selected inversion algorithm called PSelInv to sparse non- symmetric matrices. We assume a general sparse matrix A has been decomposed as PAQ = LU on a distributed memory parallel machine, where L, U are…

Mathematical Software · Computer Science 2017-08-16 Mathias Jacquelin , Lin Lin , Chao Yang

Enhancing the scalability and load balancing of the parallel selected inversion algorithm via tree-based asynchronous communication

We develop a method for improving the parallel scalability of the recently developed parallel selected inversion algorithm [Jacquelin, Lin and Yang 2014], named PSelInv, on massively parallel distributed memory machines. In the PSelInv…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-04-21 Mathias Jacquelin , Lin Lin , Nathan Wichmann , Chao Yang

Accelerated Parallel and Distributed Algorithm using Limited Internal Memory for Nonnegative Matrix Factorization

Nonnegative matrix factorization (NMF) is a powerful technique for dimension reduction, extracting latent factors and learning part-based representation. For large datasets, NMF performance depends on some major issues: fast algorithms,…

Optimization and Control · Mathematics 2015-07-01 Duy-Khuong Nguyen , Tu-Bao Ho

Programming Parallel Dense Matrix Factorizations with Look-Ahead and OpenMP

We investigate a parallelization strategy for dense matrix factorization (DMF) algorithms, using OpenMP, that departs from the legacy (or conventional) solution, which simply extracts concurrency from a multithreaded version of BLAS. This…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-04-20 Sandra Catalán , Adrián Castelló , Francisco D. Igual , Rafael Rodríguez-Sánchez , Enrique S. Quintana-Ortí

Engineering Shared-Memory Parallel Shuffling to Generate Random Permutations In-Place

Shuffling is the process of rearranging a sequence of elements into a random order such that any permutation occurs with equal probability. It is an important building block in a plethora of techniques used in virtually all scientific…

Data Structures and Algorithms · Computer Science 2023-02-08 Manuel Penschuck

Parallelization and scalability analysis of inverse factorization using the Chunks and Tasks programming model

We present three methods for distributed memory parallel inverse factorization of block-sparse Hermitian positive definite matrices. The three methods are a recursive variant of the AINV inverse Cholesky algorithm, iterative refinement, and…

Numerical Analysis · Mathematics 2024-12-20 Anton G. Artemov , Elias Rudberg , Emanuel H. Rubensson

A Case for Malleable Thread-Level Linear Algebra Libraries: The LU Factorization with Partial Pivoting

We propose two novel techniques for overcoming load-imbalance encountered when implementing so-called look-ahead mechanisms in relevant dense matrix factorizations for the solution of linear systems. Both techniques target the scenario…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-11-22 Sandra Catalán , José R. Herrero , Enrique S. Quintana-Ortí , Rafael Rodríguez-Sánchez , Robert van de Geijn

Towards Efficient OpenMP Strategies for Non-Uniform Architectures

Parallel processing is considered as todays and future trend for improving performance of computers. Computing devices ranging from small embedded systems to big clusters of computers rely on parallelizing applications to reduce execution…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-11-27 Oussama Tahan

A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures

As multicore systems continue to gain ground in the High Performance Computing world, linear algebra algorithms have to be reformulated or new algorithms have to be developed in order to take advantage of the architectural features on these…

Mathematical Software · Computer Science 2008-06-12 Alfredo Buttari , Julien Langou , Jakub Kurzak , Jack Dongarra

Parallel Random Search Algorithm of Constrained Pseudo-Boolean Optimization for Some Distinctive Large-Scale Problems

In this paper, we consider an approach to the parallelizing of the algorithms realizing the modified probability changigng method with adaptation and partial rollback procedure for constrained pseudo-Boolean optimization problems. Existing…

Distributed, Parallel, and Cluster Computing · Computer Science 2012-09-03 Lev Kazakovtsev

Towards an Efficient Tile Matrix Inversion of Symmetric Positive Definite Matrices on Multicore Architectures

The algorithms in the current sequential numerical linear algebra libraries (e.g. LAPACK) do not parallelize well on multicore architectures. A new family of algorithms, the tile algorithms, has recently been introduced. Previous research…

Mathematical Software · Computer Science 2010-02-23 Emmanuel Agullo , Henricus Bouwmeester , Jack Dongarra , Jakub Kurzak , Julien Langou , Lee Rosenberg

DeepPCR: Parallelizing Sequential Operations in Neural Networks

Parallelization techniques have become ubiquitous for accelerating inference and training of deep neural networks. Despite this, several operations are still performed in a sequential manner. For instance, the forward and backward passes…

Machine Learning · Computer Science 2023-10-30 Federico Danieli , Miguel Sarabia , Xavier Suau , Pau Rodríguez , Luca Zappella

A Provable Splitting Approach for Symmetric Nonnegative Matrix Factorization

The symmetric Nonnegative Matrix Factorization (NMF), a special but important class of the general NMF, has found numerous applications in data analysis such as various clustering tasks. Unfortunately, designing fast algorithms for the…

Machine Learning · Computer Science 2023-01-26 Xiao Li , Zhihui Zhu , Qiuwei Li , Kai Liu

Memory-Usage Advantageous Block Recursive Matrix Inverse

The inversion of extremely high order matrices has been a challenging task because of the limited processing and memory capacity of conventional computers. In a scenario in which the data does not fit in memory, it is worth to consider…

Numerical Analysis · Mathematics 2018-05-08 Iria C. S. Cosme , Isaac F. Fernandes , João L. de Carvalho , Samuel Xavier-de-Souza

A Parallel Task-based Approach to Linear Algebra

Processors with large numbers of cores are becoming commonplace. In order to take advantage of the available resources in these systems, the programming paradigm has to move towards increased parallelism. However, increasing the level of…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-10-07 Ashkan Tousimojarad , Wim Vanderbauwhede

LDU factorization

LU-factorization of matrices is one of the fundamental algorithms of linear algebra. The widespread use of supercomputers with distributed memory requires a review of traditional algorithms, which were based on the common memory of a…

Symbolic Computation · Computer Science 2020-11-10 Gennadi Malaschonok

Parallel Random Block-Coordinate Forward-Backward Algorithm: A Unified Convergence Analysis

We study the block-coordinate forward-backward algorithm in which the blocks are updated in a random and possibly parallel manner, according to arbitrary probabilities. The algorithm allows different stepsizes along the block-coordinates to…

Optimization and Control · Mathematics 2020-11-30 Saverio Salzo , Silvia Villa

BatchLayout: A Batch-Parallel Force-Directed Graph Layout Algorithm in Shared Memory

Force-directed algorithms are widely used to generate aesthetically pleasing layouts of graphs or networks arisen in many scientific disciplines. To visualize large-scale graphs, several parallel algorithms have been discussed in the…

Social and Information Networks · Computer Science 2020-02-26 Md. Khaledur Rahman , Majedul Haque Sujon , Ariful Azad