Related papers: A Left-Looking Selected Inversion Algorithm and Ta…
We describe an efficient parallel implementation of the selected inversion algorithm for distributed memory computer systems, which we call \texttt{PSelInv}. The \texttt{PSelInv} method computes selected elements of a general sparse matrix…
Block matrix structure is commonly arising is various physics and engineering applications. There are various advantages in preserving the blocks structure while computing the inversion of such partitioned matrices. In this context, using…
This paper generalizes the parallel selected inversion algorithm called PSelInv to sparse non- symmetric matrices. We assume a general sparse matrix A has been decomposed as PAQ = LU on a distributed memory parallel machine, where L, U are…
We develop a method for improving the parallel scalability of the recently developed parallel selected inversion algorithm [Jacquelin, Lin and Yang 2014], named PSelInv, on massively parallel distributed memory machines. In the PSelInv…
Nonnegative matrix factorization (NMF) is a powerful technique for dimension reduction, extracting latent factors and learning part-based representation. For large datasets, NMF performance depends on some major issues: fast algorithms,…
We investigate a parallelization strategy for dense matrix factorization (DMF) algorithms, using OpenMP, that departs from the legacy (or conventional) solution, which simply extracts concurrency from a multithreaded version of BLAS. This…
Shuffling is the process of rearranging a sequence of elements into a random order such that any permutation occurs with equal probability. It is an important building block in a plethora of techniques used in virtually all scientific…
We present three methods for distributed memory parallel inverse factorization of block-sparse Hermitian positive definite matrices. The three methods are a recursive variant of the AINV inverse Cholesky algorithm, iterative refinement, and…
We propose two novel techniques for overcoming load-imbalance encountered when implementing so-called look-ahead mechanisms in relevant dense matrix factorizations for the solution of linear systems. Both techniques target the scenario…
Parallel processing is considered as todays and future trend for improving performance of computers. Computing devices ranging from small embedded systems to big clusters of computers rely on parallelizing applications to reduce execution…
As multicore systems continue to gain ground in the High Performance Computing world, linear algebra algorithms have to be reformulated or new algorithms have to be developed in order to take advantage of the architectural features on these…
In this paper, we consider an approach to the parallelizing of the algorithms realizing the modified probability changigng method with adaptation and partial rollback procedure for constrained pseudo-Boolean optimization problems. Existing…
The algorithms in the current sequential numerical linear algebra libraries (e.g. LAPACK) do not parallelize well on multicore architectures. A new family of algorithms, the tile algorithms, has recently been introduced. Previous research…
Parallelization techniques have become ubiquitous for accelerating inference and training of deep neural networks. Despite this, several operations are still performed in a sequential manner. For instance, the forward and backward passes…
The symmetric Nonnegative Matrix Factorization (NMF), a special but important class of the general NMF, has found numerous applications in data analysis such as various clustering tasks. Unfortunately, designing fast algorithms for the…
The inversion of extremely high order matrices has been a challenging task because of the limited processing and memory capacity of conventional computers. In a scenario in which the data does not fit in memory, it is worth to consider…
Processors with large numbers of cores are becoming commonplace. In order to take advantage of the available resources in these systems, the programming paradigm has to move towards increased parallelism. However, increasing the level of…
LU-factorization of matrices is one of the fundamental algorithms of linear algebra. The widespread use of supercomputers with distributed memory requires a review of traditional algorithms, which were based on the common memory of a…
We study the block-coordinate forward-backward algorithm in which the blocks are updated in a random and possibly parallel manner, according to arbitrary probabilities. The algorithm allows different stepsizes along the block-coordinates to…
Force-directed algorithms are widely used to generate aesthetically pleasing layouts of graphs or networks arisen in many scientific disciplines. To visualize large-scale graphs, several parallel algorithms have been discussed in the…