Related papers: AutoParallel: A Python module for automatic parall…

Automatic Parallelization of Python Programs for Distributed Heterogeneous Computing

This paper introduces a novel approach to automatic ahead-of-time (AOT) parallelization and optimization of sequential Python programs for execution on distributed heterogeneous platforms. Our approach enables AOT source-to-source…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-03-15 Jun Shirako , Akihiro Hayashi , Sri Raj Paul , Alexey Tumanov , Vivek Sarkar

Simplifying Parallelization of Scientific Codes by a Function-Centric Approach in Python

The purpose of this paper is to show how existing scientific software can be parallelized using a separate thin layer of Python code where all parallel communication is implemented. We provide specific examples on such layers of code, and…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-05-18 Jon K. Nilsen , Xing Cai , Bjorn Hoyland , Hans Petter Langtangen

Asynchronous Execution of Python Code on Task Based Runtime Systems

Despite advancements in the areas of parallel and distributed computing, the complexity of programming on High Performance Computing (HPC) resources has deterred many domain experts, especially in the areas of machine learning and…

Programming Languages · Computer Science 2019-03-08 R. Tohid , Bibek Wagle , Shahrzad Shirzad , Patrick Diehl , Adrian Serio , Alireza Kheirkhahan , Parsa Amini , Katy Williams , Kate Isaacs , Kevin Huck , Steven Brandt , Hartmut Kaiser

Automatic Parallelization: Executing Sequential Programs on a Task-Based Parallel Runtime

There are billions of lines of sequential code inside nowadays' software which do not benefit from the parallelism available in modern multicore architectures. Automatically parallelizing sequential code, to promote an efficient use of the…

Programming Languages · Computer Science 2016-04-13 Alcides Fonseca , Bruno Cabral , João Rafael , Ivo Correia

Getting to the Point. Index Sets and Parallelism-Preserving Autodiff for Pointful Array Programming

We present a novel programming language design that attempts to combine the clarity and safety of high-level functional languages with the efficiency and parallelism of low-level numerical languages. We treat arrays as eagerly-memoized…

Programming Languages · Computer Science 2021-04-13 Adam Paszke , Daniel Johnson , David Duvenaud , Dimitrios Vytiniotis , Alexey Radul , Matthew Johnson , Jonathan Ragan-Kelley , Dougal Maclaurin

The Potential of Synergistic Static, Dynamic and Speculative Loop Nest Optimizations for Automatic Parallelization

Research in automatic parallelization of loop-centric programs started with static analysis, then broadened its arsenal to include dynamic inspection-execution and speculative execution, the best results involving hybrid static-dynamic…

Distributed, Parallel, and Cluster Computing · Computer Science 2011-11-30 Riyadh Baghdadi , Albert Cohen , Cedric Bastoul , Louis-Noel Pouchet , Lawrence Rauchwerger

Automatic Operator-level Parallelism Planning for Distributed Deep Learning -- A Mixed-Integer Programming Approach

As the artificial intelligence community advances into the era of large models with billions of parameters, distributed training and inference have become essential. While various parallelism strategies-data, model, sequence, and…

Machine Learning · Computer Science 2025-03-13 Ruifeng She , Bowen Pang , Kai Li , Zehua Liu , Tao Zhong

Extended Abstract: Productive Parallel Programming with Parsl

Parsl is a parallel programming library for Python that aims to make it easy to specify parallelism in programs and to realize that parallelism on arbitrary parallel and distributed computing systems. Parsl relies on developers annotating…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-05-05 Kyle Chard , Yadu Babuji , Anna Woodard , Ben Clifford , Zhuozhao Li , Mihael Hategan , Ian Foster , Mike Wilde , Daniel S. Katz

PopPy: Opportunistically Exploiting Parallelism in Python Compound AI Applications

Compound AI applications, which compose calls to ML models using a general-purpose programming language like Python, are widely used for a variety of user-facing tasks, from software engineering to enterprise automation, making their…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-19 Stephen Mell , David Mell , Konstantinos Kallas , Steve Zdancewic , Osbert Bastani

Automated Synthesis of Divide and Conquer Parallelism

This paper focuses on automated synthesis of divide-and-conquer parallelism, which is a common parallel programming skeleton supported by many cross-platform multithreaded libraries. The challenges of producing (manually or automatically) a…

Programming Languages · Computer Science 2017-01-31 Azadeh Farzan , Victor Nicolet

Modular Synthesis of Divide-and-Conquer Parallelism for Nested Loops (Extended Version)

We propose a methodology for automatic generation of divide-and-conquer parallel implementations of sequential nested loops. We focus on a class of loops that traverse read-only multidimensional collections (lists or arrays) and compute a…

Programming Languages · Computer Science 2019-04-03 Azadeh Farzan , Victor Nicolet

Parsl: Pervasive Parallel Programming in Python

High-level programming languages such as Python are increasingly used to provide intuitive interfaces to libraries written in lower-level languages and for assembling applications from various components. This migration towards…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-05-21 Yadu Babuji , Anna Woodard , Zhuozhao Li , Daniel S. Katz , Ben Clifford , Rohan Kumar , Lukasz Lacinski , Ryan Chard , Justin M. Wozniak , Ian Foster , Michael Wilde , Kyle Chard

Automatic Identification of Parallelizable Loops Using Transformer-Based Source Code Representations

Automatic parallelization remains a challenging problem in software engineering, particularly in identifying code regions where loops can be safely executed in parallel on modern multi-core architectures. Traditional static analysis…

Software Engineering · Computer Science 2026-04-01 Izavan dos S. Correia , Henrique C. T. Santos , Tiago A. E. Ferreira

Comparing Parallel Functional Array Languages: Programming and Performance

Parallel functional array languages are an emerging class of programming languages that promise to combine low-effort parallel programming with good performance and performance portability. We systematically compare the designs and…

Programming Languages · Computer Science 2025-05-15 David van Balen , Tiziano De Matteis , Clemens Grelck , Troels Henriksen , Aaron W. Hsu , Gabriele K. Keller , Thomas Koopman , Trevor L. McDonell , Cosmin Oancea , Sven-Bodo Scholz , Artjoms Sinkarovs , Tom Smeding , Phil Trinder , Ivo Gabe de Wolff , Alexandros Nikolaos Ziogas

Approaches to the Parallelization of Merge Sort in Python

The theory of divide-and-conquer parallelization has been well-studied in the past, providing a solid basis upon which to explore different approaches to the parallelization of merge sort in Python. Python's simplicity and extensive…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-11-30 Alexandra Yang

Framework for the hybrid parallelisation of simulation codes

Writing efficient hybrid parallel code is tedious, error-prone, and requires good knowledge of both parallel programming and multithreading such as MPI and OpenMP, resp. Therefore, we present a framework which is based on a job model that…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-07-03 Ralf-Peter Mundani , Marko Ljucović , Ernst Rank

Automatic task-based parallelization of C++ applications by source-to-source transformations

Currently, multi/many-core CPUs are considered standard in most types of computers including, mobile phones, PCs or supercomputers. However, the parallelization of applications as well as refactoring/design of applications for efficient…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-05-25 Garip Kusoglu , Berenger Bramas , Stephane Genaud

Affine Transformations of Loop Nests for Parallel Execution and Distribution of Data over Processors

The paper is devoted to the problem of mapping affine loop nests onto distributed memory parallel computers. A method to find affine transformations of loop nests for parallel execution and distribution of data over processors is presented.…

Distributed, Parallel, and Cluster Computing · Computer Science 2007-05-23 E. V. Adutskevich , S. V. Bakhanovich , N. A. Likhoded

Automated Synthesis of Asynchronizations

Asynchronous programming is widely adopted for building responsive and efficient software, and modern languages such as C# provide async/await primitives to simplify the use of asynchrony. In this paper, we propose an approach for…

Programming Languages · Computer Science 2022-09-15 Sidi Mohamed Beillahi , Ahmed Bouajjani , Constantin Enea , Shuvendu Lahiri

AsyncMesh: Fully Asynchronous Optimization for Data and Pipeline Parallelism

Data and pipeline parallelism are key strategies for scaling neural network training across distributed devices, but their high communication cost necessitates co-located computing clusters with fast interconnects, limiting their…

Machine Learning · Computer Science 2026-02-02 Thalaiyasingam Ajanthan , Sameera Ramasinghe , Gil Avraham , Hadi Mohaghegh Dolatabadi , Chamin P Hewa Koneputugodage , Violetta Shevchenko , Yan Zuo , Alexander Long