Related papers: Generalizing Hierarchical Parallelism

Dynamic Loop Parallelisation

Regions of nested loops are a common feature of High Performance Computing (HPC) codes. In shared memory programming models, such as OpenMP, these structure are the most common source of parallelism. Parallelising these structures requires…

Programming Languages · Computer Science 2012-05-14 Adrian Jackson , Orestis Agathokleous

An Efficient OpenMP Runtime System for Hierarchical Arch

Exploiting the full computational power of always deeper hierarchical multiprocessor machines requires a very careful distribution of threads and data among the underlying non-uniform architecture. The emergence of multi-core chips and NUMA…

Programming Languages · Computer Science 2007-06-15 Samuel Thibault , François Broquedis , Brice Goglin , Raymond Namyst , Pierre-André Wacrenier

Towards High Performance Computing (Hpc) Through Parallel Programming Paradigms and Their Principles

Nowadays, we are to find out solutions to huge computing problems very rapidly. It brings the idea of parallel computing in which several machines or processors work cooperatively for computational tasks. In the past decades, there are a…

Programming Languages · Computer Science 2014-02-07 Brijender Kahanwal

Analysis of Distributed Algorithms for Big-data

The parallel and distributed processing are becoming de facto industry standard, and a large part of the current research is targeted on how to make computing scalable and distributed, dynamically, without allocating the resources on…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-04-10 Rajendra Purohit , K R Chowdhary , S D Purohit

OpenMP parallelization of multiple precision Taylor series method

OpenMP parallelization of multiple precision Taylor series method is proposed. A very good parallel performance scalability and parallel efficiency inside one computation node of a CPU-cluster is observed. We explain the details of the…

Mathematical Software · Computer Science 2019-08-27 S. Dimova , I. Hristov , R. Hristova , I. Puzynin , T. Puzynina , Z. Sharipov , N. Shegunov , Z. Tukhliev

Towards Efficient OpenMP Strategies for Non-Uniform Architectures

Parallel processing is considered as todays and future trend for improving performance of computers. Computing devices ranging from small embedded systems to big clusters of computers rely on parallelizing applications to reduce execution…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-11-27 Oussama Tahan

OpenMP Parallelization of Dynamic Programming and Greedy Algorithms

Multicore has emerged as a typical architecture model since its advent and stands now as a standard. The trend is to increase the number of cores and improve the performance of the memory system. Providing an efficient multicore…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-01-22 Claude Tadonki

Parallel Algorithm for Longest Common Subsequence in a String

In the area of Pattern Recognition and Matching, finding a Longest Common Subsequence plays an important role. In this paper, we have proposed one algorithm based on parallel computation. We have used OpenMP API package as middleware to…

Data Structures and Algorithms · Computer Science 2013-06-20 Tirtharaj Dash , Tanistha Nayak

A Survey on Hardware and Software Support for Thread Level Parallelism

To support growing massive parallelism, functional components and also the capabilities of current processors are changing and continue to do so. Todays computers are built upon multiple processing cores and run applications consisting of a…

Programming Languages · Computer Science 2016-04-07 Somnath Mazumdar , Roberto Giorgi

Mixed-mode implementation of PETSc for scalable linear algebra on multi-core processors

With multi-core processors a ubiquitous building block of modern supercomputers, it is now past time to enable applications to embrace these developments in processor design. To achieve exascale performance, applications will need ways of…

Distributed, Parallel, and Cluster Computing · Computer Science 2012-08-13 Michele Weiland , Lawrence Mitchell , Gerard Gorman , Stephan Kramer , Mark Parsons , James Southern

Frustrated with MPI+Threads? Try MPIxThreads!

MPI+Threads, embodied by the MPI/OpenMP hybrid programming model, is a parallel programming paradigm where threads are used for on-node shared-memory parallelization and MPI is used for multi-node distributed-memory parallelization. OpenMP…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-01-31 Hui Zhou , Ken Raffenetti , Junchao Zhang , Yanfei Guo , Rajeev Thakur

The Multi-Core Era - Trends and Challenges

Since the very beginning of hardware development, computer processors were invented with ever-increasing clock frequencies and sophisticated in-build optimization strategies. Due to physical limitations, this 'free lunch' of speedup has…

Distributed, Parallel, and Cluster Computing · Computer Science 2008-10-31 Peter Tröger

On the Design and Analysis of Parallel and Distributed Algorithms

Arrival of multicore systems has enforced a new scenario in computing, the parallel and distributed algorithms are fast replacing the older sequential algorithms, with many challenges of these techniques. The distributed algorithms provide…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-11-13 Rajendra Purohit , K R Chowdhary , S D Purohit

OMP-Engineer: Bridging Syntax Analysis and In-Context Learning for Efficient Automated OpenMP Parallelization

In advancing parallel programming, particularly with OpenMP, the shift towards NLP-based methods marks a significant innovation beyond traditional S2S tools like Autopar and Cetus. These NLP approaches train on extensive datasets of…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-05-07 Weidong Wang , Haoran Zhu

AutOMP: An Automatic OpenMP Parallelization Generator for Variable-Oriented High-Performance Scientific Codes

OpenMP is a cross-platform API that extends C, C++ and Fortran and provides shared-memory parallelism platform for those languages. The use of many cores and HPC technologies for scientific computing has been spread since the 1990s, and now…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-07-25 Gal Oren , Yehuda Ganan , Guy Malamud

Toward parallel intelligence: an interdisciplinary solution for complex systems

The growing complexity of real-world systems necessitates interdisciplinary solutions to confront myriad challenges in modeling, analysis, management, and control. To meet these demands, the parallel systems method rooted in Artificial…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-03-26 Yong Zhao , Zhengqiu Zhu , Bin Chen , Sihang Qiu , Jincai Huang , Xin Lu , Weiyi Yang , Chuan Ai , Kuihua Huang , Cheng He , Yucheng Jin , Zhong Liu , Fei-Yue Wang

Effect of Thread Level Parallelism on the Performance of Optimum Architecture for Embedded Applications

According to the increasing complexity of network application and internet traffic, network processor as a subset of embedded processors have to process more computation intensive tasks. By scaling down the feature size and emersion of chip…

Hardware Architecture · Computer Science 2012-04-13 Mehdi Alipour , Hojjat Taghdisi

Emulating a large memory with a collection of small ones

Sequential computation is well understood but does not scale well with current technology. Within the next decade, systems will contain large numbers of processors with potentially thousands of processors per chip. Despite this, many…

Hardware Architecture · Computer Science 2015-11-17 James Hanlon

Performance Evaluation of Parallel Message Passing and Thread Programming Model on Multicore Architectures

The current trend of multicore architectures on shared memory systems underscores the need of parallelism. While there are some programming model to express parallelism, thread programming model has become a standard to support these system…

Distributed, Parallel, and Cluster Computing · Computer Science 2010-12-13 D. T. Hasta , A. B. Mutiara

Source-to-Source Automatic Differentiation of OpenMP Parallel Loops

This paper presents our work toward correct and efficient automatic differentiation of OpenMP parallel worksharing loops in forward and reverse mode. Automatic differentiation is a method to obtain gradients of numerical programs, which are…

Mathematical Software · Computer Science 2021-11-04 Jan Hückelheim , Laurent Hascoët