Related papers: Getting More From Your Multicore: Exploiting OpenM…
We introduce SLIRP, a module generator for the S-Lang numerical scripting language, with a focus on its vectorization capabilities. We demonstrate how both SLIRP and S-Lang were easily adapted to exploit the inherent parallelism of…
A review of the literature reveals that while parallel computing is sometimes employed by astronomers for custom, large-scale calculations, no package fosters the routine application of parallel methods to standard problems in astronomical…
High performance computing has been used in various fields of astrophysical research. But most of it is implemented on massively parallel systems (supercomputers) or graphical processing unit clusters. With the advent of multicore…
Astronomers are increasingly using Massively Parallel Network of Workstations (MP-NOW) to address their most challenging computing problems. Fully exploiting these systems is made more difficult as more and more modeling and data analysis…
VSIPL and OpenMP are two open standards for portable high performance computing. VSIPL delivers optimized single processor performance while OpenMP provides a low overhead mechanism for executing thread based parallelism on shared memory…
Processors with large numbers of cores are becoming commonplace. In order to take advantage of the available resources in these systems, the programming paradigm has to move towards increased parallelism. However, increasing the level of…
Asymmetric multicore processors (AMPs) couple high-performance big cores and low-power small cores with the same instruction-set architecture but different features, such as clock frequency or microarchitecture. Previous work has shown that…
Multicore has emerged as a typical architecture model since its advent and stands now as a standard. The trend is to increase the number of cores and improve the performance of the memory system. Providing an efficient multicore…
With multi-core processors a ubiquitous building block of modern supercomputers, it is now past time to enable applications to embrace these developments in processor design. To achieve exascale performance, applications will need ways of…
Nested parallelism exists in scientific codes that are searching multi-dimensional spaces. However, implementations of nested parallelism often have overhead and load balance issues. The Orbital Analysis code we present exhibits a sparse…
Multi-core and highly-connected architectures have become ubiquitous, and this has brought renewed interest in language-based approaches to the exploitation of parallelism. Since its inception, logic programming has been recognized as a…
Parallel computing has established itself as another standard method for applied research and data analysis. The R system, being internally constrained to mostly singly-threaded operations, can nevertheless be used along with different…
In advancing parallel programming, particularly with OpenMP, the shift towards NLP-based methods marks a significant innovation beyond traditional S2S tools like Autopar and Cetus. These NLP approaches train on extensive datasets of…
Modern out-of-order processors have increased capacity to exploit instruction level parallelism (ILP) and memory level parallelism (MLP), e.g., by using wide superscalar pipelines and vector execution units, as well as deep buffers for…
Developing an efficient code for large, multiscale astrophysical simulations is crucial in preparing the upcoming era of exascale computing. RAMSES is an astrophysical simulation code that employs parallel processing based on the Message…
While modern parallel computing systems provide high performance resources, utilizing them to the highest extent requires advanced programming expertise. Programming for parallel computing systems is much more difficult than programming for…
Nowadays, latency-critical, high-performance applications are parallelized even on power-constrained client systems to improve performance. However, an important scenario of fine-grained tasking on simultaneous multithreading CPU cores in…
In this paper we describe an autotuning tool for optimization of OpenMP applications on highly multicore and multithreaded architectures. Our work was motivated by in-depth performance analysis of scientific applications and synthetic…
Astronomy depends on ever increasing computing power. Processor clock-rates have plateaued, and increased performance is now appearing in the form of additional processor cores on a single chip. This poses significant challenges to the…
We investigate a parallelization strategy for dense matrix factorization (DMF) algorithms, using OpenMP, that departs from the legacy (or conventional) solution, which simply extracts concurrency from a multithreaded version of BLAS. This…