Related papers: Parallel computation of echelon forms
As multicore systems continue to gain ground in the High Performance Computing world, linear algebra algorithms have to be reformulated or new algorithms have to be developed in order to take advantage of the architectural features on these…
The current computer architecture has moved towards the multi/many-core structure. However, the algorithms in the current sequential dense numerical linear algebra libraries (e.g. LAPACK) do not parallelize well on multi/many-core…
Matrix multiplication is a fundamental computation in many scientific disciplines. In this paper, we show that novel fast matrix multiplication algorithms can significantly outperform vendor implementations of the classical algorithm and…
As multicore systems continue to gain ground in the High Performance Computing world, linear algebra algorithms have to be reformulated or new algorithms have to be developed in order to take advantage of the architectural features on these…
The algorithms in the current sequential numerical linear algebra libraries (e.g. LAPACK) do not parallelize well on multicore architectures. A new family of algorithms, the tile algorithms, has recently been introduced. Previous research…
In this paper we describe a parallel Gaussian elimination algorithm for matrices with entries in a finite field. Unlike previous approaches, our algorithm subdivides a very large input matrix into smaller submatrices by subdividing both…
Hierarchical low-rank approximation of dense matrices can reduce the complexity of their factorization from O(N^3) to O(N). However, the complex structure of such hierarchical matrices makes them difficult to parallelize. The block size and…
To design efficient parallel algorithms, some recent papers showed that many sequential iterative algorithms can be directly parallelized but there are still challenges in achieving work-efficiency and high-parallelism. Work-efficiency can…
This report provides an introduction to algorithms for fundamental linear algebra problems on various parallel computer architectures, with the emphasis on distributed-memory MIMD machines. To illustrate the basic concepts and key issues,…
Many tasks in data mining and related fields can be formalized as matching between objects in two heterogeneous domains, including collaborative filtering, link prediction, image tagging, and web search. Machine learning techniques,…
The Simplex tableau has been broadly used and investigated in the industry and academia. With the advent of the big data era, ever larger problems are posed to be solved in ever larger machines whose architecture type did not exist in the…
Efficient solutions of large-scale, ill-conditioned and indefinite algebraic equations are ubiquitously needed in numerous computational fields, including multiphysics simulations, machine learning, and data science. Because of their…
An extremely common bottleneck encountered in statistical learning algorithms is inversion of huge covariance matrices, examples being in evaluating Gaussian likelihoods for a large number of data points. We propose general parallel…
Processors with large numbers of cores are becoming commonplace. In order to take advantage of the available resources in these systems, the programming paradigm has to move towards increased parallelism. However, increasing the level of…
Persistent homology is a leading tool in topological data analysis (TDA). Many problems in TDA can be solved via homological -- and indeed, linear -- algebra. However, matrices in this domain are typically large, with rows and columns…
Multi-task learning, which optimizes performance across multiple tasks, is inherently a multi-objective optimization problem. Various algorithms are developed to provide discrete trade-off solutions on the Pareto front. Recently, continuous…
Sequential computation is well understood but does not scale well with current technology. Within the next decade, systems will contain large numbers of processors with potentially thousands of processors per chip. Despite this, many…
We report on the implementation of an algorithm for computing the set of all regular triangulations of finitely many points in Euclidean space. This algorithm, which we call down-flip reverse search, can be restricted, e.g., to computing…
We present new algorithms to detect and correct errors in the lower-upper factorization of a matrix, or the triangular linear system solution, over an arbitrary field. Our main algorithms do not require any additional information or…
Sparse tensor algebra is challenging to efficiently parallelize due to the irregular, data-dependent, and potentially skewed structure of sparse computation. We propose the first partitioning algorithm that provably load balances the…