Related papers: Getting More From Your Multicore: Exploiting OpenM…

We introduce SLIRP, a module generator for the S-Lang numerical scripting language, with a focus on its vectorization capabilities. We demonstrate how both SLIRP and S-Lang were easily adapted to exploit the inherent parallelism of…

Distributed, Parallel, and Cluster Computing · Computer Science 2007-06-28 Michael S. Noble

Using the Parallel Virtual Machine for Everyday Analysis

A review of the literature reveals that while parallel computing is sometimes employed by astronomers for custom, large-scale calculations, no package fosters the routine application of parallel methods to standard problems in astronomical…

Astrophysics · Physics 2007-05-23 M. S. Noble , J. C. Houck , J. E. Davis , A. Young , M. Nowak

Parallel Astronomical Data Processing with Python: Recipes for multicore machines

High performance computing has been used in various fields of astrophysical research. But most of it is implemented on massively parallel systems (supercomputers) or graphical processing unit clusters. With the advent of multicore…

Instrumentation and Methods for Astrophysics · Physics 2013-07-30 Navtej Singh , Lisa-Marie Browne , Ray Butler

Interfacing Interpreted and Compiled Languages to Support Applications on a Massively Parallel Network of Workstations (MP-NOW)

Astronomers are increasingly using Massively Parallel Network of Workstations (MP-NOW) to address their most challenging computing problems. Fully exploiting these systems is made more difficult as more and more modeling and data analysis…

Astrophysics · Physics 2015-05-26 Jeremy Kepner , Maya Gokhale , Ron Minnich , Aaron Marks , John DeGood

Exploiting VSIPL and OpenMP for Parallel Image Processing

VSIPL and OpenMP are two open standards for portable high performance computing. VSIPL delivers optimized single processor performance while OpenMP provides a low overhead mechanism for executing thread based parallelism on shared memory…

Astrophysics · Physics 2015-05-26 Jeremy Kepner

A Parallel Task-based Approach to Linear Algebra

Processors with large numbers of cores are becoming commonplace. In order to take advantage of the available resources in these systems, the programming paradigm has to move towards increased parallelism. However, increasing the level of…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-10-07 Ashkan Tousimojarad , Wim Vanderbauwhede

Enabling performance portability of data-parallel OpenMP applications on asymmetric multicore processors

Asymmetric multicore processors (AMPs) couple high-performance big cores and low-power small cores with the same instruction-set architecture but different features, such as clock frequency or microarchitecture. Previous work has shown that…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-02-13 Juan Carlos Saez , Fernando Castro , Manuel Prieto-Matias

OpenMP Parallelization of Dynamic Programming and Greedy Algorithms

Multicore has emerged as a typical architecture model since its advent and stands now as a standard. The trend is to increase the number of cores and improve the performance of the memory system. Providing an efficient multicore…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-01-22 Claude Tadonki

Mixed-mode implementation of PETSc for scalable linear algebra on multi-core processors

With multi-core processors a ubiquitous building block of modern supercomputers, it is now past time to enable applications to embrace these developments in processor design. To achieve exascale performance, applications will need ways of…

Distributed, Parallel, and Cluster Computing · Computer Science 2012-08-13 Michele Weiland , Lawrence Mitchell , Gerard Gorman , Stephan Kramer , Mark Parsons , James Southern

Handling Nested Parallelism and Extreme Load Imbalance in an Orbital Analysis Code

Nested parallelism exists in scientific codes that are searching multi-dimensional spaces. However, implementations of nested parallelism often have overhead and load balance issues. The Orbital Analysis code we present exhibits a sparse…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-08-01 Benjamin James Gaska , Neha Jothi , Mahdi Soltan Mohammadi , Kat Volk , Michelle Mills Strout

Parallel Logic Programming: A Sequel

Multi-core and highly-connected architectures have become ubiquitous, and this has brought renewed interest in language-based approaches to the exploitation of parallelism. Since its inception, logic programming has been recognized as a…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-01-25 Agostino Dovier , Andrea Formisano , Gopal Gupta , Manuel V. Hermenegildo , Enrico Pontelli , Ricardo Rocha

Parallel Computing With R: A Brief Review

Parallel computing has established itself as another standard method for applied research and data analysis. The R system, being internally constrained to mostly singly-threaded operations, can nevertheless be used along with different…

Computation · Statistics 2020-04-07 Dirk Eddelbuettel

OMP-Engineer: Bridging Syntax Analysis and In-Context Learning for Efficient Automated OpenMP Parallelization

In advancing parallel programming, particularly with OpenMP, the shift towards NLP-based methods marks a significant innovation beyond traditional S2S tools like Autopar and Cetus. These NLP approaches train on extensive datasets of…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-05-07 Weidong Wang , Haoran Zhu

Cimple: Instruction and Memory Level Parallelism

Modern out-of-order processors have increased capacity to exploit instruction level parallelism (ILP) and memory level parallelism (MLP), e.g., by using wide superscalar pipelines and vector execution units, as well as deep buffers for…

Programming Languages · Computer Science 2018-07-05 Vladimir Kiriansky , Haoran Xu , Martin Rinard , Saman Amarasinghe

RAMSES-yOMP: Performance Optimizations for the Astrophysical Hydrodynamic Simulation Code RAMSES

Developing an efficient code for large, multiscale astrophysical simulations is crucial in preparing the upcoming era of exascale computing. RAMSES is an astrophysical simulation code that employs parallel processing based on the Message…

Instrumentation and Methods for Astrophysics · Physics 2024-11-25 San Han , Yohan Dubois , Jaehyun Lee , Juhan Kim , Corentin Cadiou , Sukyoung K. Yi

Using Cognitive Computing for Learning Parallel Programming: An IBM Watson Solution

While modern parallel computing systems provide high performance resources, utilizing them to the highest extent requires advanced programming expertise. Programming for parallel computing systems is much more difficult than programming for…

Programming Languages · Computer Science 2017-04-06 Adrian Calvo Chozas , Suejb Memeti , Sabri Pllana

Exploring Fine-grained Task Parallelism on Simultaneous Multithreading Cores

Nowadays, latency-critical, high-performance applications are parallelized even on power-constrained client systems to improve performance. However, an important scenario of fine-grained tasking on simultaneous multithreading CPU cores in…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-10-03 Denis Los , Igor Petushkov

Towards Autotuning of OpenMP Applications on Multicore Architectures

In this paper we describe an autotuning tool for optimization of OpenMP applications on highly multicore and multithreaded architectures. Our work was motivated by in-depth performance analysis of scientific applications and synthetic…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-01-17 Jakub Katarzyński , Maciej Cytowski

Analysing Astronomy Algorithms for GPUs and Beyond

Astronomy depends on ever increasing computing power. Processor clock-rates have plateaued, and increased performance is now appearing in the form of additional processor cores on a single chip. This poses significant challenges to the…

Instrumentation and Methods for Astrophysics · Physics 2015-05-19 Benjamin R. Barsdell , David G. Barnes , Christopher J. Fluke

Programming Parallel Dense Matrix Factorizations with Look-Ahead and OpenMP

We investigate a parallelization strategy for dense matrix factorization (DMF) algorithms, using OpenMP, that departs from the legacy (or conventional) solution, which simply extracts concurrency from a multithreaded version of BLAS. This…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-04-20 Sandra Catalán , Adrián Castelló , Francisco D. Igual , Rafael Rodríguez-Sánchez , Enrique S. Quintana-Ortí