English
Related papers

Related papers: I/O Efficient Algorithms for Matrix Computations

200 papers

This work revisits existing algorithms for the QR factorization of rectangular matrices composed of p-by-q tiles, where p >= q. Within this framework, we study the critical paths and performance of algorithms such as Sameh and Kuck, Modi…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-03-19 Henricus Bouwmeester , Mathias Jacquelin , Julien Langou , Yves Robert

When designing an algorithm, one cares about arithmetic/computational complexity, but data movement (I/O) complexity plays an increasingly important role that highly impacts performance and energy consumption. For a given algorithm and a…

Computational Complexity · Computer Science 2024-04-26 Lionel Eyraud-Dubois , Guillaume Iooss , Julien Langou , Fabrice Rastello

Asymptotically tight lower bounds are derived for the I/O complexity of a general class of hybrid algorithms computing the product of $n \times n$ square matrices combining ``\emph{Strassen-like}'' fast matrix multiplication approach with…

Data Structures and Algorithms · Computer Science 2019-04-30 Lorenzo De Stefani

Matrix multiplication is a fundamental classical computing operation whose efficiency becomes a major challenge at scale, especially for machine learning applications. Quantum computing, with its inherent parallelism and exponential storage…

Quantum Physics · Physics 2026-02-10 Jiaqi Yao , Ding Liu

The current computer architecture has moved towards the multi/many-core structure. However, the algorithms in the current sequential dense numerical linear algebra libraries (e.g. LAPACK) do not parallelize well on multi/many-core…

Numerical Analysis · Computer Science 2013-03-14 Henricus Bouwmeester

Neuromorphic computing with crossbar arrays has emerged as a promising alternative to improve computing efficiency for machine learning. Previous work has focused on implementing crossbar arrays to perform basic mathematical operations.…

Machine Learning · Computer Science 2024-11-08 W. Haensch

We consider the problem of computing a QR (or QZ) decomposition of a real, dense, tall and very skinny matrix. That is, the number of columns is tiny compared to the number of rows, rendering most computations completely or partially…

Mathematical Software · Computer Science 2026-03-24 Jonas Thies , Melven Röhrig-Zöllner

This paper initiates the study of I/O algorithms (minimizing cache misses) from the perspective of fine-grained complexity (conditional polynomial lower bounds). Specifically, we aim to answer why sparse graph problems are so hard, and why…

Data Structures and Algorithms · Computer Science 2017-12-06 Erik D. Demaine , Andrea Lincoln , Quanquan C. Liu , Jayson Lynch , Virginia Vassilevska Williams

The manuscript describes efficient algorithms for the computation of the CUR and ID decompositions. The methods used are based on simple modifications to the classical truncated pivoted QR decomposition, which means that highly optimized…

Numerical Analysis · Mathematics 2016-10-20 Sergey Voronin , Per-Gunnar Martinsson

The QR algorithm is one of the three phases in the process of computing the eigenvalues and the eigenvectors of a dense nonsymmetric matrix. This paper describes a task-based QR algorithm for reducing an upper Hessenberg matrix to real…

Mathematical Software · Computer Science 2021-12-17 Mirko Myllykoski

We consider algorithms for going from a "full" matrix to a condensed "band bidiagonal" form using orthogonal transformations. We use the framework of "algorithms by tiles". Within this framework, we study: (i) the tiled bidiagonalization…

Mathematical Software · Computer Science 2016-11-23 Mathieu Faverge , Julien Langou , Yves Robert , Jack Dongarra

The algorithms in the current sequential numerical linear algebra libraries (e.g. LAPACK) do not parallelize well on multicore architectures. A new family of algorithms, the tile algorithms, has recently been introduced. Previous research…

Mathematical Software · Computer Science 2010-02-23 Emmanuel Agullo , Henricus Bouwmeester , Jack Dongarra , Jakub Kurzak , Julien Langou , Lee Rosenberg

We explore new approaches for finding matrix multiplication algorithms in the commutative setting by adapting the flip graph technique: a method previously shown to be effective for discovering fast algorithms in the non-commutative case.…

Symbolic Computation · Computer Science 2025-06-30 Isaac Wood

Many important applications across science, data analytics, and AI workloads depend on distributed matrix multiplication. Prior work has developed a large array of algorithms suitable for different problem sizes and partitionings including…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-10-13 Benjamin Brock , Renato Golin

We propose efficient parallel algorithms and implementations on shared memory architectures of LU factorization over a finite field. Compared to the corresponding numerical routines, we have identified three main difficulties specific to…

Symbolic Computation · Computer Science 2014-02-17 Jean-Guillaume Dumas , Thierry Gautier , Clément Pernet , Ziad Sultan

Some fast algorithms for computing the eigenvalues of a block companion matrix $A = U + XY^H$, where $U\in \mathbb C^{n\times n}$ is unitary block circulant and $X, Y \in\mathbb{C}^{n \times k}$, have recently appeared in the literature.…

Numerical Analysis · Mathematics 2019-08-30 Roberto Bevilacqua , Gianna M. Del Corso , Luca Gemignani

We present in this paper two different classes of general $K$-splitting algorithms for solving finite-dimensional convex optimization problems. Under the assumption that the function being minimized has a Lipschitz continuous gradient, we…

Optimization and Control · Mathematics 2015-03-13 Donald Goldfarb , Shiqian Ma

Solving and visualizing the potential roots of complex functions is essential in both theoretical and applied domains, yet often computationally intensive. We present a hardware-accelerated algorithm for complex function roots density graph…

Mathematical Software · Computer Science 2025-12-04 Ruibai Tang , Chengbin Quan

There is a recent trend in artificial intelligence (AI) inference towards lower precision data formats down to 8 bits and less. As multiplication is the most complex operation in typical inference tasks, there is a large demand for…

Hardware Architecture · Computer Science 2024-05-06 Andreas Böttcher , Martin Kumm

An efficient decoding algorithm named `divided decoder' is proposed in this paper. Divided decoding can be combined with any decoder using QR-decomposition and offers different pairs of performance and complexity. Divided decoding provides…

Information Theory · Computer Science 2009-01-23 In Sook Park
‹ Prev 1 2 3 10 Next ›