English
Related papers

Related papers: Exploiting nested task-parallelism in the $\mathca…

200 papers

Structured dense matrices result from boundary integral problems in electrostatics and geostatistics, and also Schur complements in sparse preconditioners such as multi-frontal methods. Exploiting the structure of such matrices can reduce…

Numerical Analysis · Mathematics 2023-11-03 Sameer Deshmukh , Qinxiang Ma , Rio Yokota , George Bosilca

Shared memory programming models usually provide worksharing and task constructs. The former relies on the efficient fork-join execution model to exploit structured parallelism; while the latter relies on fine-grained synchronization among…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-04-08 M. Maronas , K. Sala , S. Mateo , E. Ayguadé , V. Beltran Barcelona Supercomputing Center

We investigate a parallelization strategy for dense matrix factorization (DMF) algorithms, using OpenMP, that departs from the legacy (or conventional) solution, which simply extracts concurrency from a multithreaded version of BLAS. This…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-04-20 Sandra Catalán , Adrián Castelló , Francisco D. Igual , Rafael Rodríguez-Sánchez , Enrique S. Quintana-Ortí

Nested parallelism exists in scientific codes that are searching multi-dimensional spaces. However, implementations of nested parallelism often have overhead and load balance issues. The Orbital Analysis code we present exhibits a sparse…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-08-01 Benjamin James Gaska , Neha Jothi , Mahdi Soltan Mohammadi , Kat Volk , Michelle Mills Strout

Task-based programming models like OmpSs-2 and OpenMP provide a flexible data-flow execution model to exploit dynamic, irregular and nested parallelism. Providing an efficient implementation that scales well with small granularity tasks…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-05-18 David Álvarez , Kevin Sala , Marcos Maroñas , Aleix Roca , Vicenç Beltran

Hierarchical low-rank approximation of dense matrices can reduce the complexity of their factorization from O(N^3) to O(N). However, the complex structure of such hierarchical matrices makes them difficult to parallelize. The block size and…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-02-05 Qianxiang Ma , Rio Yokota

Several methods exist today to accelerate Machine Learning(ML) or Deep-Learning(DL) model performance for training and inference. However, modern techniques that rely on various graph and operator parallelism methodologies rely on search…

Machine Learning · Computer Science 2023-08-23 Srinjoy Das , Lawrence Rauchwerger

Unstructured meshes are characterized by data points irregularly distributed in the Euclidian space. Due to the irregular nature of these data, computing connectivity information between the mesh elements requires much more time and memory…

Data Structures and Algorithms · Computer Science 2025-04-03 Guoxi Liu , Federico Iuricich

MPI applications matter. However, with the advent of many-core processors, traditional MPI applications are challenged to achieve satisfactory performance. This is due to the inability of these applications to respond to load imbalances, to…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-12-20 Jan Ciesko , Pedro J. Martínez-Ferrer , Raúl Peñacoba Veigas , Xavier Teruel , Vicenç Beltran

The hierarchical matrix framework partitions matrices into subblocks that are either small or of low numerical rank, enabling linear storage complexity and efficient matrix-vector multiplication. This work focuses on the $H^2$-matrix format…

Numerical Analysis · Mathematics 2026-02-02 Anna Yesypenko , Per-Gunnar Martinsson

In advancing parallel programming, particularly with OpenMP, the shift towards NLP-based methods marks a significant innovation beyond traditional S2S tools like Autopar and Cetus. These NLP approaches train on extensive datasets of…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-05-07 Weidong Wang , Haoran Zhu

Factorization of large dense matrices are ubiquitous in engineering and data science applications, e.g. preconditioners for iterative boundary integral solvers, frontal matrices in sparse multifrontal solvers, and computing the determinant…

Numerical Analysis · Mathematics 2022-08-24 Qianxiang Ma , Sameer Deshmukh , Rio Yokota

The Map-Reduce computing framework rose to prominence with datasets of such size that dozens of machines on a single cluster were needed for individual jobs. As datasets approach the exabyte scale, a single job may need distributed…

Data Structures and Algorithms · Computer Science 2016-10-31 Riley Murray , Samir Khuller , Megan Chao

As multicore systems continue to gain ground in the High Performance Computing world, linear algebra algorithms have to be reformulated or new algorithms have to be developed in order to take advantage of the architectural features on these…

Mathematical Software · Computer Science 2008-06-12 Alfredo Buttari , Julien Langou , Jakub Kurzak , Jack Dongarra

Randomized sampling has recently been proven a highly efficient technique for computing approximate factorizations of matrices that have low numerical rank. This paper describes an extension of such techniques to a wider class of matrices…

Numerical Analysis · Mathematics 2015-03-25 Per-Gunnar Martinsson

Despite the various research initiatives and proposed programming models, efficient solutions for parallel programming in HPC clusters still rely on a complex combination of different programming models (e.g., OpenMP and MPI), languages…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-08-16 Hervé Yviquel , Marcio Pereira , Emílio Francesquini , Guilherme Valarini , Gustavo Leite , Pedro Rosso , Rodrigo Ceccato , Carla Cusihualpa , Vitoria Dias , Sandro Rigo , Alan Souza , Guido Araujo

We present a sparse linear system solver that is based on a multifrontal variant of Gaussian elimination, and exploits low-rank approximation of the resulting dense frontal matrices. We use hierarchically semiseparable (HSS) matrices, which…

Mathematical Software · Computer Science 2015-02-27 Pieter Ghysels , Xiaoye S. Li , Francois-Henry Rouet , Samuel Williams , Artem Napov

We introduce a task-parallel algorithm for sparse incomplete Cholesky factorization that utilizes a 2D sparse partitioned-block layout of a matrix. Our factorization algorithm follows the idea of algorithms-by-blocks by using the block…

Mathematical Software · Computer Science 2016-01-26 Kyungjoo Kim , Sivasankaran Rajamanickam , George Stelle , H. Carter Edwards , Stephen L. Olivier

We propose two novel techniques for overcoming load-imbalance encountered when implementing so-called look-ahead mechanisms in relevant dense matrix factorizations for the solution of linear systems. Both techniques target the scenario…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-11-22 Sandra Catalán , José R. Herrero , Enrique S. Quintana-Ortí , Rafael Rodríguez-Sánchez , Robert van de Geijn

Spatial decomposition is a popular basis for parallelising code. Cast in the frame of task parallelism, calculations on a spatial domain can be treated as a task. If neighbouring domains interact and share results, access to the specific…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-01-20 Christoph Niethammer , Colin W. Glass , Jose Gracia
‹ Prev 1 2 3 10 Next ›