English
Related papers

Related papers: Parallel Tiled QR Factorization for Multicore Arch…

200 papers

As multicore systems continue to gain ground in the High Performance Computing world, linear algebra algorithms have to be reformulated or new algorithms have to be developed in order to take advantage of the architectural features on these…

Mathematical Software · Computer Science 2008-06-12 Alfredo Buttari , Julien Langou , Jakub Kurzak , Jack Dongarra

This paper describes a new QR factorization algorithm which is especially designed for massively parallel platforms combining parallel distributed multi-core nodes. These platforms make the present and the foreseeable future of…

Distributed, Parallel, and Cluster Computing · Computer Science 2012-08-27 Jack Dongarra , Mathieu Faverge , Thomas Herault , Julien Langou , and Yves Robert

The current computer architecture has moved towards the multi/many-core structure. However, the algorithms in the current sequential dense numerical linear algebra libraries (e.g. LAPACK) do not parallelize well on multi/many-core…

Numerical Analysis · Computer Science 2013-03-14 Henricus Bouwmeester

The algorithms in the current sequential numerical linear algebra libraries (e.g. LAPACK) do not parallelize well on multicore architectures. A new family of algorithms, the tile algorithms, has recently been introduced. Previous research…

Mathematical Software · Computer Science 2010-02-23 Emmanuel Agullo , Henricus Bouwmeester , Jack Dongarra , Jakub Kurzak , Julien Langou , Lee Rosenberg

Efficient task scheduling is paramount in parallel programming on multi-core architectures, where tasks are fundamental computational units. QR factorization is a critical sub-routine in Sequential Least Squares Quadratic Programming…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-06-12 Soumyajit Chatterjee , Rahul Utkoor , Uppu Eshwar , Sathya Peri , V. Krishna Nandivada

We present two new algorithms for Householder QR factorization of Block Low-Rank (BLR) matrices: one that performs block-column-wise QR, and another that is based on tiled QR. We show how the block-column-wise algorithm exploits BLR…

Numerical Analysis · Mathematics 2022-08-15 M. Ridwan Apriansyah , Rio Yokota

Shared memory programming models usually provide worksharing and task constructs. The former relies on the efficient fork-join execution model to exploit structured parallelism; while the latter relies on fine-grained synchronization among…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-04-08 M. Maronas , K. Sala , S. Mateo , E. Ayguadé , V. Beltran Barcelona Supercomputing Center

Previous studies have reported that common dense linear algebra operations do not achieve speed up by using multiple geographical sites of a computational grid. Because such operations are the building blocks of most scientific…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-11-15 Emmanuel Agullo , Camille Coti , Jack Dongarra , Thomas Herault , Julien Langou

We present parallel and sequential dense QR factorization algorithms for tall and skinny matrices and general rectangular matrices that both minimize communication, and are as stable as Householder QR. The sequential and parallel algorithms…

Numerical Analysis · Mathematics 2008-09-16 James Demmel , Laura Grigori , Mark Hoemmen , Julien Langou

Computation of a signal's estimated covariance matrix is an important building block in signal processing, e.g., for spectral estimation. Each matrix element is a sum of products of elements in the input matrix taken over a sliding window.…

Data Structures and Algorithms · Computer Science 2013-03-12 Oded Green , Lior David , Ami Galperin , Yitzhak Birk

We propose efficient parallel algorithms and implementations on shared memory architectures of LU factorization over a finite field. Compared to the corresponding numerical routines, we have identified three main difficulties specific to…

Symbolic Computation · Computer Science 2014-02-17 Jean-Guillaume Dumas , Thierry Gautier , Clément Pernet , Ziad Sultan

There are two intertwined factors that affect performance of concurrent data structures: the ability of processes to access the data in parallel and the cost of synchronization. It has been observed that for a large class of…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-05-10 Vitaly Aksenov , Petr Kuznetsov

Graph clustering has many important applications in computing, but due to growing sizes of graphs, even traditionally fast clustering methods such as spectral partitioning can be computationally expensive for real-world graphs of interest.…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-06-11 Julian Shun , Farbod Roosta-Khorasani , Kimon Fountoulakis , Michael W. Mahoney

We present parallel and sequential dense QR factorization algorithms that are both optimal (up to polylogarithmic factors) in the amount of communication they perform, and just as stable as Householder QR. Our first algorithm, Tall Skinny…

Numerical Analysis · Computer Science 2008-08-29 James Demmel , Laura Grigori , Mark Hoemmen , Julien Langou

Enumerating simple cycles has important applications in computational biology, network science, and financial crime analysis. In this work, we focus on parallelising the state-of-the-art simple cycle enumeration algorithms by Johnson and…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-06-01 Jovan Blanuša , Paolo Ienne , Kubilay Atasu

Tuning numerical libraries has become more difficult over time, as systems get more sophisticated. In particular, modern multicore machines make the behaviour of algorithms hard to forecast and model. In this paper, we tackle the issue of…

Distributed, Parallel, and Cluster Computing · Computer Science 2011-02-28 Emmanuel Agullo , Jack Dongarra , Rajib Nath , Stanimire Tomov

Data processing systems offer an ever increasing degree of parallelism on the levels of cores, CPUs, and processing nodes. Query optimization must exploit high degrees of parallelism in order not to gradually become the bottleneck of query…

Databases · Computer Science 2015-11-06 Immanuel Trummer , Christoph Koch

The sheer sizes of modern datasets are forcing data-structure designers to consider seriously both parallel construction and compactness. To achieve those goals we need to design a parallel algorithm with good scalability and with low…

Data Structures and Algorithms · Computer Science 2017-05-02 Leo Ferres , José Fuentes-Sepúlveda , Travis Gagie , Meng He , Gonzalo Navarro

We present parallel and sequential dense QR factorization algorithms that are both optimal (up to polylogarithmic factors) in the amount of communication they perform, and just as stable as Householder QR. We prove optimality by extending…

Numerical Analysis · Mathematics 2008-08-21 James Demmel , Laura Grigori , Mark Hoemmen , Julien Langou

Scalable QR factorization algorithms for solving least squares and eigenvalue problems are critical given the increasing parallelism within modern machines. We introduce a more general parallelization of the CholeskyQR2 algorithm and show…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-06-18 Edward Hutter , Edgar Solomonik
‹ Prev 1 2 3 10 Next ›