Related papers: A 3D Parallel Algorithm for QR Decomposition
The QR Decomposition (QRD) of communication channel matrices is a fundamental prerequisite to several detection schemes in Multiple-Input Multiple-Output (MIMO) communication systems. Herein, the main feature of the QRD is to transform the…
This paper presents a reexamination of the research paper titled "Communication-Avoiding Parallel Algorithms for \proc{TRSM}" by Wicky et al. We focus on the communication bandwidth cost analysis presented in the original work and identify…
In this article, we focus on the parallel communication cost of multiplying the same vector along two modes of a $3$-dimensional symmetric tensor. This is a key computation in the higher-order power method for determining eigenpairs of a…
Numerical algorithms have two kinds of costs: arithmetic and communication, by which we mean either moving data between levels of a memory hierarchy (in the sequential case) or over a network connecting processors (in the parallel case).…
Many large-scale scientific computations require eigenvalue solvers in a scaling regime where efficiency is limited by data movement. We introduce a parallel algorithm for computing the eigenvalues of a dense symmetric matrix, which…
An efficient decoding algorithm named `divided decoder' is proposed in this paper. Divided decoding can be combined with any decoder using QR-decomposition and offers different pairs of performance and complexity. Divided decoding provides…
We investigate iterative low-resolution message-passing algorithms for quasi-cyclic LDPC codes with horizontal and vertical layered schedules. Coarse quantization and layered scheduling are highly relevant for hardware implementations to…
This paper introduces fast R updating algorithms specifically designed for statistical applications, including regression, filtering, and model selection, where data structures change frequently. Although traditional QR decomposition is…
Algorithms have two costs: arithmetic and communication. The latter represents the cost of moving data, either between levels of a memory hierarchy, or between processors over a network. Communication often dominates arithmetic and…
We address the problem of performing message-passing-based decoding of quantum LDPC codes under hardware latency limitations. We propose a novel way to do layered decoding that suits quantum constraints and outperforms flooded scheduling,…
Scalable QR factorization algorithms for solving least squares and eigenvalue problems are critical given the increasing parallelism within modern machines. We introduce a more general parallelization of the CholeskyQR2 algorithm and show…
In this paper we study the tradeoff between parallelism and communication cost in a map-reduce computation. For any problem that is not "embarrassingly parallel," the finer we partition the work of the reducers so that more parallelism can…
Communication-avoiding algorithms allow redundant computations to minimize the number of inter-process communications. In this paper, we propose to exploit this redundancy for fault-tolerance purpose. We illustrate this idea with QR…
Detection algorithms for multiple-input multiple-output (MIMO) wireless systems based on orthogonal frequency-division multiplexing (OFDM) typically require the computation of a QR decomposition for each of the data-carrying OFDM tones. The…
We present parallel and sequential dense QR factorization algorithms that are both optimal (up to polylogarithmic factors) in the amount of communication they perform, and just as stable as Householder QR. We prove optimality by extending…
On modern parallel architectures, the cost of synchronization among processors can often dominate the cost of floating-point computation. Several modifications of the existing methods have been proposed in order to keep the communication…
We propose a novel algorithm for using Hopfield networks to denoise QR codes. Hopfield networks have mostly been used as a noise tolerant memory or to solve difficult combinatorial problems. One of the major drawbacks in their use in noise…
While the proper orthogonal decomposition (POD) is optimal under certain norms it's also expensive to compute. For large matrix sizes, it is well known that the QR decomposition provides a tractable alternative. Under the assumption that it…
We present parallel and sequential dense QR factorization algorithms for tall and skinny matrices and general rectangular matrices that both minimize communication, and are as stable as Householder QR. The sequential and parallel algorithms…
As multicore systems continue to gain ground in the High Performance Computing world, linear algebra algorithms have to be reformulated or new algorithms have to be developed in order to take advantage of the architectural features on these…