Related papers: Supercomputer Environment for Recursive Matrix Alg…
The report is devoted to the concept of creating block-recursive matrix algorithms for computing on a supercomputer with distributed memory and dynamic decentralized control.
We give an overview of the theoretical results for matrix block-recursive algorithms in commutative domains and present the results of experiments that we conducted with new parallel programs based on these algorithms on a supercomputer…
Direct factorization methods for the solution of large, sparse linear systems that arise from PDE discretizations are robust, but typically show poor time and memory scalability for large systems. In this paper, we describe an efficient…
The inversion of extremely high order matrices has been a challenging task because of the limited processing and memory capacity of conventional computers. In a scenario in which the data does not fit in memory, it is worth to consider…
In recent years, randomized algorithms have established themselves as fundamental tools in computational linear algebra, with applications in scientific computing, machine learning, and quantum information science. Many randomized matrix…
Recurrence equations lie at the heart of many computational paradigms including dynamic programming, graph analysis, and linear solvers. These equations are often expensive to compute and much work has gone into optimizing them for…
Current high-performance computer systems used for scientific computing typically combine shared memory computational nodes in a distributed memory environment. Extracting high performance from these complex systems requires tailored…
We present a fast sparse matrix permutation algorithm tailored to linear systems arising from triangle meshes. Our approach produces nested-dissection-style permutations while significantly reducing permutation runtime overhead. Rather than…
Recursive blocked algorithms have proven to be highly efficient at the numerical solution of the Sylvester matrix equation and its generalizations. In this work, we show that these algorithms extend in a seamless fashion to…
In this paper, we study the nonnegative matrix factorization problem under the separability assumption (that is, there exists a cone spanned by a small subset of the columns of the input nonnegative data matrix containing all columns),…
The solution of sparse symmetric positive definite linear systems is an important computational kernel in large-scale scientific and engineering modeling and simulation. We will solve the linear systems using a direct method, in which a…
Randomly pivoted Cholesky (RPCholesky) is an algorithm for constructing a low-rank approximation of a positive-semidefinite matrix using a small number of columns. This paper develops an accelerated version of RPCholesky that employs block…
The algorithms in the current sequential numerical linear algebra libraries (e.g. LAPACK) do not parallelize well on multicore architectures. A new family of algorithms, the tile algorithms, has recently been introduced. Previous research…
To exploit both memory locality and the full performance potential of highly tuned kernels, dense linear algebra libraries such as LAPACK commonly implement operations as blocked algorithms. However, to achieve next-to-optimal performance…
This work is about rounding error analysis of randomized CholeskyQR-type algorithms for sparse matrices. We often encounter QR factorization of the sparse matrices in many real problems. In this work, we focus on some typical…
Ordering vertices of a graph is key to minimize fill-in and data structure size in sparse direct solvers, maximize locality in iterative solvers, and improve performance in graph algorithms. Except for naturally parallelizable ordering…
The parallel linear equations solver capable of effectively using 1000+ processors becomes the bottleneck of large-scale implicit engineering simulations. In this paper, we present a new hierarchical parallel master-slave-structural…
Block matrix structure is commonly arising is various physics and engineering applications. There are various advantages in preserving the blocks structure while computing the inversion of such partitioned matrices. In this context, using…
This paper presents a new algorithmic framework for computing sparse solutions to large-scale linear discrete ill-posed problems. The approach is motivated by recent perspectives on iteratively reweighted norm schemes, viewed through the…
We present a highly scalable algorithm for multiplying sparse multivariate polynomials represented in a distributed format. This algo- rithm targets not only the shared memory multicore computers, but also computers clusters or specialized…