Related papers: Programming Parallel Dense Matrix Factorizations w…

A Parallel Task-based Approach to Linear Algebra

Processors with large numbers of cores are becoming commonplace. In order to take advantage of the available resources in these systems, the programming paradigm has to move towards increased parallelism. However, increasing the level of…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-10-07 Ashkan Tousimojarad , Wim Vanderbauwhede

Accelerated Parallel and Distributed Algorithm using Limited Internal Memory for Nonnegative Matrix Factorization

Nonnegative matrix factorization (NMF) is a powerful technique for dimension reduction, extracting latent factors and learning part-based representation. For large datasets, NMF performance depends on some major issues: fast algorithms,…

Optimization and Control · Mathematics 2015-07-01 Duy-Khuong Nguyen , Tu-Bao Ho

OpenMP Parallelization of Dynamic Programming and Greedy Algorithms

Multicore has emerged as a typical architecture model since its advent and stands now as a standard. The trend is to increase the number of cores and improve the performance of the memory system. Providing an efficient multicore…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-01-22 Claude Tadonki

Multi-Threaded Dense Linear Algebra Libraries for Low-Power Asymmetric Multicore Processors

Dense linear algebra libraries, such as BLAS and LAPACK, provide a relevant collection of numerical tools for many scientific and engineering applications. While there exist high performance implementations of the BLAS (and LAPACK)…

Mathematical Software · Computer Science 2015-11-09 Sandra Catalán , José R. Herrero , Francisco D. Igual , Rafael Rodríguez-Sánchez , Enrique S. Quintana-Ortí

Parallel Cholesky Factorization for Banded Matrices using OpenMP Tasks

Cholesky factorization is a widely used method for solving linear systems involving symmetric, positive-definite matrices, and can be an attractive choice in applications where a high degree of numerical stability is needed. One such…

Numerical Analysis · Mathematics 2023-05-09 Felix Liu , Albin Fredriksson , Stefano Markidis

Parallel Tiled QR Factorization for Multicore Architectures

As multicore systems continue to gain ground in the High Performance Computing world, linear algebra algorithms have to be reformulated or new algorithms have to be developed in order to take advantage of the architectural features on these…

Numerical Analysis · Mathematics 2008-08-12 Alfredo Buttari , Julien Langou , Jakub Kurzak , Jack Dongarra

Towards Efficient OpenMP Strategies for Non-Uniform Architectures

Parallel processing is considered as todays and future trend for improving performance of computers. Computing devices ranging from small embedded systems to big clusters of computers rely on parallelizing applications to reduce execution…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-11-27 Oussama Tahan

Automatic Task Parallelization of Dataflow Graphs in ML/DL models

Several methods exist today to accelerate Machine Learning(ML) or Deep-Learning(DL) model performance for training and inference. However, modern techniques that rely on various graph and operator parallelism methodologies rely on search…

Machine Learning · Computer Science 2023-08-23 Srinjoy Das , Lawrence Rauchwerger

Dynamic Loop Parallelisation

Regions of nested loops are a common feature of High Performance Computing (HPC) codes. In shared memory programming models, such as OpenMP, these structure are the most common source of parallelism. Parallelising these structures requires…

Programming Languages · Computer Science 2012-05-14 Adrian Jackson , Orestis Agathokleous

Analysis of Distributed Algorithms for Big-data

The parallel and distributed processing are becoming de facto industry standard, and a large part of the current research is targeted on how to make computing scalable and distributed, dynamically, without allocating the resources on…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-04-10 Rajendra Purohit , K R Chowdhary , S D Purohit

Source-to-Source Automatic Differentiation of OpenMP Parallel Loops

This paper presents our work toward correct and efficient automatic differentiation of OpenMP parallel worksharing loops in forward and reverse mode. Automatic differentiation is a method to obtain gradients of numerical programs, which are…

Mathematical Software · Computer Science 2021-11-04 Jan Hückelheim , Laurent Hascoët

A Left-Looking Selected Inversion Algorithm and Task Parallelism on Shared Memory Systems

Given a sparse matrix $A$, the selected inversion algorithm is an efficient method for computing certain selected elements of $A^{-1}$. These selected elements correspond to all or some nonzero elements of the LU factors of $A$. In many…

Mathematical Software · Computer Science 2016-04-12 Mathias Jacquelin , Lin Lin , Weile Jia , Yonghua Zhao , Chao Yang

Faster and Cheaper: Parallelizing Large-Scale Matrix Factorization on GPUs

Matrix factorization (MF) is employed by many popular algorithms, e.g., collaborative filtering. The emerging GPU technology, with massively multicore and high intra-chip memory bandwidth but limited memory capacity, presents an opportunity…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-10-25 Wei Tan , Liangliang Cao , Liana Fong

Advising OpenMP Parallelization via a Graph-Based Approach with Transformers

There is an ever-present need for shared memory parallelization schemes to exploit the full potential of multi-core architectures. The most common parallelization API addressing this need today is OpenMP. Nevertheless, writing parallel code…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-05-23 Tal Kadosh , Nadav Schneider , Niranjan Hasabnis , Timothy Mattson , Yuval Pinter , Gal Oren

A Scalable Shared-Memory Parallel Simplex for Large-Scale Linear Programming

The Simplex tableau has been broadly used and investigated in the industry and academia. With the advent of the big data era, ever larger problems are posed to be solved in ever larger machines whose architecture type did not exist in the…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-05-29 Demetrios Coutinho , Felipe O. Lins e Silva , Daniel Aloise , Samuel , Xavier-de-Souza

OMPar: Automatic Parallelization with AI-Driven Source-to-Source Compilation

Manual parallelization of code remains a significant challenge due to the complexities of modern software systems and the widespread adoption of multi-core architectures. This paper introduces OMPar, an AI-driven tool designed to automate…

Computation and Language · Computer Science 2024-09-24 Tal Kadosh , Niranjan Hasabnis , Prema Soundararajan , Vy A. Vo , Mihai Capota , Nesreen Ahmed , Yuval Pinter , Gal Oren

Co-Design of the Dense Linear AlgebravSoftware Stack for Multicore Processors

This paper advocates for an intertwined design of the dense linear algebra software stack that breaks down the strict barriers between the high-level, blocked algorithms in LAPACK (Linear Algebra PACKage) and the low-level,…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-05-01 Héctor Martínez , Sandra Catalán , Francisco D. Igual , José R. Herrero , Rafael Rodríguez-Sánchez , Enrique S. Quintana-Ortí

Parallel MMF: a Multiresolution Approach to Matrix Computation

Multiresolution Matrix Factorization (MMF) was recently introduced as a method for finding multiscale structure and defining wavelets on graphs/matrices. In this paper we derive pMMF, a parallel algorithm for computing the MMF…

Numerical Analysis · Computer Science 2015-07-17 Risi Kondor , Nedelina Teneva , Pramod K. Mudrakarta

Parallel Strategies for Direct Multisearch

Direct Multisearch (DMS) is a Derivative-free Optimization class of algorithms suited for computing approximations to the complete Pareto front of a given Multiobjective Optimization problem. It has a well-supported convergence analysis and…

Optimization and Control · Mathematics 2021-05-10 S. Tavares , C. P. Brás , A. L. Custódio , V. Duarte , P. Medeiros

Exploiting nested task-parallelism in the $\mathcal{H}-LU$ factorization

We address the parallelization of the LU factorization of hierarchical matrices ($\mathcal{H}$-matrices) arising from boundary element methods. Our approach exploits task-parallelism via the OmpSs programming model and runtime, which…

Mathematical Software · Computer Science 2024-09-23 Rocío Carratalá-Sáez , Sven Christophersen , José I. Aliaga , Vicenç Beltran , Steffen Börm , Enrique S. Quintana-Ortí