Related papers: Tiled Algorithms for Matrix Computations on Multic…

Towards an Efficient Tile Matrix Inversion of Symmetric Positive Definite Matrices on Multicore Architectures

The algorithms in the current sequential numerical linear algebra libraries (e.g. LAPACK) do not parallelize well on multicore architectures. A new family of algorithms, the tile algorithms, has recently been introduced. Previous research…

Mathematical Software · Computer Science 2010-02-23 Emmanuel Agullo , Henricus Bouwmeester , Jack Dongarra , Jakub Kurzak , Julien Langou , Lee Rosenberg

A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures

As multicore systems continue to gain ground in the High Performance Computing world, linear algebra algorithms have to be reformulated or new algorithms have to be developed in order to take advantage of the architectural features on these…

Mathematical Software · Computer Science 2008-06-12 Alfredo Buttari , Julien Langou , Jakub Kurzak , Jack Dongarra

Parallel Tiled QR Factorization for Multicore Architectures

As multicore systems continue to gain ground in the High Performance Computing world, linear algebra algorithms have to be reformulated or new algorithms have to be developed in order to take advantage of the architectural features on these…

Numerical Analysis · Mathematics 2008-08-12 Alfredo Buttari , Julien Langou , Jakub Kurzak , Jack Dongarra

Hierarchical QR factorization algorithms for multi-core cluster systems

This paper describes a new QR factorization algorithm which is especially designed for massively parallel platforms combining parallel distributed multi-core nodes. These platforms make the present and the foreseeable future of…

Distributed, Parallel, and Cluster Computing · Computer Science 2012-08-27 Jack Dongarra , Mathieu Faverge , Thomas Herault , Julien Langou , and Yves Robert

Parallel computation of echelon forms

We propose efficient parallel algorithms and implementations on shared memory architectures of LU factorization over a finite field. Compared to the corresponding numerical routines, we have identified three main difficulties specific to…

Symbolic Computation · Computer Science 2014-02-17 Jean-Guillaume Dumas , Thierry Gautier , Clément Pernet , Ziad Sultan

Cache-aware Parallel Programming for Manycore Processors

With rapidly evolving technology, multicore and manycore processors have emerged as promising architectures to benefit from increasing transistor numbers. The transition towards these parallel architectures makes today an exciting time to…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-04-01 Ashkan Tousimojarad , Wim Vanderbauwhede

On the Parallel I/O Optimality of Linear Algebra Kernels: Near-Optimal Matrix Factorizations

Matrix factorizations are among the most important building blocks of scientific computing. State-of-the-art libraries, however, are not communication-optimal, underutilizing current parallel architectures. We present novel algorithms for…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-04-26 Grzegorz Kwasniewski , Marko Kabić , Tal Ben-Nun , Alexandros Nikolaos Ziogas , Jens Eirik Saethre , André Gaillard , Timo Schneider , Maciej Besta , Anton Kozhevnikov , Joost VandeVondele , Torsten Hoefler

Improving Locality in Sparse and Dense Matrix Multiplications

Consecutive matrix multiplications are commonly used in graph neural networks and sparse linear solvers. These operations frequently access the same matrices for both reading and writing. While reusing these matrices improves data locality,…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-07-02 Mohammad Mahdi Salehi Dezfuli , Kazem Cheshmi

Distributed Matrix Tiling Using A Hypergraph Labeling Formulation

Partitioning large matrices is an important problem in distributed linear algebra computing (used in ML among others). Briefly, our goal is to perform a sequence of matrix algebra operations in a distributed manner (whenever possible) on…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-06-30 Avah Banerjee , Guoli Ding , Maxwell Reeser

The PRIMPing Routine -- Tiling through Proximal Alternating Linearized Minimization

Mining and exploring databases should provide users with knowledge and new insights. Tiles of data strive to unveil true underlying structure and distinguish valuable information from various kinds of noise. We propose a novel Boolean…

Artificial Intelligence · Computer Science 2019-06-25 Sibylle Hess , Katharina Morik , Nico Piatkowski

sTiles: An Accelerated Computational Framework for Sparse Factorizations of Structured Matrices

This paper introduces sTiles, a GPU-accelerated framework for factorizing sparse structured symmetric matrices. By leveraging tile algorithms for fine-grained computations, sTiles uses a structure-aware task execution flow to handle…

Performance · Computer Science 2025-01-07 Esmail Abdul Fattah , Hatem Ltaief , Havard Rue , David Keyes

H2OPUS-TLR: High Performance Tile Low Rank Symmetric Factorizations using Adaptive Randomized Approximation

Tile low rank representations of dense matrices partition them into blocks of roughly uniform size, where each off-diagonal tile is compressed and stored as its own low rank factorization. They offer an attractive representation for many…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-08-27 Wajih Boukaram , Stefano Zampini , George Turkiyyah , David Keyes

QR Factorization of Tall and Skinny Matrices in a Grid Computing Environment

Previous studies have reported that common dense linear algebra operations do not achieve speed up by using multiple geographical sites of a computational grid. Because such operations are the building blocks of most scientific…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-11-15 Emmanuel Agullo , Camille Coti , Jack Dongarra , Thomas Herault , Julien Langou

Mapping quantum algorithms to multi-core quantum computing architectures

Current monolithic quantum computer architectures have limited scalability. One promising approach for scaling them up is to use a modular or multi-core architecture, in which different quantum processors (cores) are connected via quantum…

Quantum Physics · Physics 2023-03-29 Anabel Ovide , Santiago Rodrigo , Medina Bandic , Hans Van Someren , Sebastian Feld , Sergi Abadal , Eduard Alarcon , Carmen G. Almudever

Task Parallel Incomplete Cholesky Factorization using 2D Partitioned-Block Layout

We introduce a task-parallel algorithm for sparse incomplete Cholesky factorization that utilizes a 2D sparse partitioned-block layout of a matrix. Our factorization algorithm follows the idea of algorithms-by-blocks by using the block…

Mathematical Software · Computer Science 2016-01-26 Kyungjoo Kim , Sivasankaran Rajamanickam , George Stelle , H. Carter Edwards , Stephen L. Olivier

Make the most of what you have: Resource-efficient randomized algorithms for matrix computations

In recent years, randomized algorithms have established themselves as fundamental tools in computational linear algebra, with applications in scientific computing, machine learning, and quantum information science. Many randomized matrix…

Numerical Analysis · Mathematics 2025-12-19 Ethan N. Epperly

Parallel QR Factorization of Block Low-Rank Matrices

We present two new algorithms for Householder QR factorization of Block Low-Rank (BLR) matrices: one that performs block-column-wise QR, and another that is based on tiled QR. We show how the block-column-wise algorithm exploits BLR…

Numerical Analysis · Mathematics 2022-08-15 M. Ridwan Apriansyah , Rio Yokota

Dalorex: A Data-Local Program Execution and Architecture for Memory-bound Applications

Applications with low data reuse and frequent irregular memory accesses, such as graph or sparse linear algebra workloads, fail to scale well due to memory bottlenecks and poor core utilization. While prior work with prefetching,…

Hardware Architecture · Computer Science 2023-05-05 Marcelo Orenes-Vera , Esin Tureci , David Wentzlaff , Margaret Martonosi

Geostatistical Modeling and Prediction Using Mixed-Precision Tile Cholesky Factorization

Geostatistics represents one of the most challenging classes of scientific applications due to the desire to incorporate an ever increasing number of geospatial locations to accurately model and predict environmental phenomena. For example,…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-03-12 Sameh Abdulah , Hatem Ltaief , Ying Sun , Marc G. Genton , David E. Keyes

TileLang: A Composable Tiled Programming Model for AI Systems

Modern AI workloads rely heavily on optimized computing kernels for both training and inference. These AI kernels follow well-defined data-flow patterns, such as moving tiles between DRAM and SRAM and performing a sequence of computations…

Machine Learning · Computer Science 2025-04-29 Lei Wang , Yu Cheng , Yining Shi , Zhengju Tang , Zhiwen Mo , Wenhao Xie , Lingxiao Ma , Yuqing Xia , Jilong Xue , Fan Yang , Zhi Yang