English
Related papers

Related papers: Tiled Algorithms for Matrix Computations on Multic…

200 papers

The algorithms in the current sequential numerical linear algebra libraries (e.g. LAPACK) do not parallelize well on multicore architectures. A new family of algorithms, the tile algorithms, has recently been introduced. Previous research…

Mathematical Software · Computer Science 2010-02-23 Emmanuel Agullo , Henricus Bouwmeester , Jack Dongarra , Jakub Kurzak , Julien Langou , Lee Rosenberg

As multicore systems continue to gain ground in the High Performance Computing world, linear algebra algorithms have to be reformulated or new algorithms have to be developed in order to take advantage of the architectural features on these…

Mathematical Software · Computer Science 2008-06-12 Alfredo Buttari , Julien Langou , Jakub Kurzak , Jack Dongarra

As multicore systems continue to gain ground in the High Performance Computing world, linear algebra algorithms have to be reformulated or new algorithms have to be developed in order to take advantage of the architectural features on these…

Numerical Analysis · Mathematics 2008-08-12 Alfredo Buttari , Julien Langou , Jakub Kurzak , Jack Dongarra

This paper describes a new QR factorization algorithm which is especially designed for massively parallel platforms combining parallel distributed multi-core nodes. These platforms make the present and the foreseeable future of…

Distributed, Parallel, and Cluster Computing · Computer Science 2012-08-27 Jack Dongarra , Mathieu Faverge , Thomas Herault , Julien Langou , and Yves Robert

We propose efficient parallel algorithms and implementations on shared memory architectures of LU factorization over a finite field. Compared to the corresponding numerical routines, we have identified three main difficulties specific to…

Symbolic Computation · Computer Science 2014-02-17 Jean-Guillaume Dumas , Thierry Gautier , Clément Pernet , Ziad Sultan

With rapidly evolving technology, multicore and manycore processors have emerged as promising architectures to benefit from increasing transistor numbers. The transition towards these parallel architectures makes today an exciting time to…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-04-01 Ashkan Tousimojarad , Wim Vanderbauwhede

Matrix factorizations are among the most important building blocks of scientific computing. State-of-the-art libraries, however, are not communication-optimal, underutilizing current parallel architectures. We present novel algorithms for…

Consecutive matrix multiplications are commonly used in graph neural networks and sparse linear solvers. These operations frequently access the same matrices for both reading and writing. While reusing these matrices improves data locality,…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-07-02 Mohammad Mahdi Salehi Dezfuli , Kazem Cheshmi

Partitioning large matrices is an important problem in distributed linear algebra computing (used in ML among others). Briefly, our goal is to perform a sequence of matrix algebra operations in a distributed manner (whenever possible) on…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-06-30 Avah Banerjee , Guoli Ding , Maxwell Reeser

Mining and exploring databases should provide users with knowledge and new insights. Tiles of data strive to unveil true underlying structure and distinguish valuable information from various kinds of noise. We propose a novel Boolean…

Artificial Intelligence · Computer Science 2019-06-25 Sibylle Hess , Katharina Morik , Nico Piatkowski

This paper introduces sTiles, a GPU-accelerated framework for factorizing sparse structured symmetric matrices. By leveraging tile algorithms for fine-grained computations, sTiles uses a structure-aware task execution flow to handle…

Performance · Computer Science 2025-01-07 Esmail Abdul Fattah , Hatem Ltaief , Havard Rue , David Keyes

Tile low rank representations of dense matrices partition them into blocks of roughly uniform size, where each off-diagonal tile is compressed and stored as its own low rank factorization. They offer an attractive representation for many…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-08-27 Wajih Boukaram , Stefano Zampini , George Turkiyyah , David Keyes

Previous studies have reported that common dense linear algebra operations do not achieve speed up by using multiple geographical sites of a computational grid. Because such operations are the building blocks of most scientific…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-11-15 Emmanuel Agullo , Camille Coti , Jack Dongarra , Thomas Herault , Julien Langou

Current monolithic quantum computer architectures have limited scalability. One promising approach for scaling them up is to use a modular or multi-core architecture, in which different quantum processors (cores) are connected via quantum…

We introduce a task-parallel algorithm for sparse incomplete Cholesky factorization that utilizes a 2D sparse partitioned-block layout of a matrix. Our factorization algorithm follows the idea of algorithms-by-blocks by using the block…

Mathematical Software · Computer Science 2016-01-26 Kyungjoo Kim , Sivasankaran Rajamanickam , George Stelle , H. Carter Edwards , Stephen L. Olivier

In recent years, randomized algorithms have established themselves as fundamental tools in computational linear algebra, with applications in scientific computing, machine learning, and quantum information science. Many randomized matrix…

Numerical Analysis · Mathematics 2025-12-19 Ethan N. Epperly

We present two new algorithms for Householder QR factorization of Block Low-Rank (BLR) matrices: one that performs block-column-wise QR, and another that is based on tiled QR. We show how the block-column-wise algorithm exploits BLR…

Numerical Analysis · Mathematics 2022-08-15 M. Ridwan Apriansyah , Rio Yokota

Applications with low data reuse and frequent irregular memory accesses, such as graph or sparse linear algebra workloads, fail to scale well due to memory bottlenecks and poor core utilization. While prior work with prefetching,…

Hardware Architecture · Computer Science 2023-05-05 Marcelo Orenes-Vera , Esin Tureci , David Wentzlaff , Margaret Martonosi

Geostatistics represents one of the most challenging classes of scientific applications due to the desire to incorporate an ever increasing number of geospatial locations to accurately model and predict environmental phenomena. For example,…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-03-12 Sameh Abdulah , Hatem Ltaief , Ying Sun , Marc G. Genton , David E. Keyes

Modern AI workloads rely heavily on optimized computing kernels for both training and inference. These AI kernels follow well-defined data-flow patterns, such as moving tiles between DRAM and SRAM and performing a sequence of computations…

Machine Learning · Computer Science 2025-04-29 Lei Wang , Yu Cheng , Yining Shi , Zhengju Tang , Zhiwen Mo , Wenhao Xie , Lingxiao Ma , Yuqing Xia , Jilong Xue , Fan Yang , Zhi Yang
‹ Prev 1 2 3 10 Next ›