English
Related papers

Related papers: Automatic Parallelization of Python Programs for D…

200 papers

The last improvements in programming languages, programming models, and frameworks have focused on abstracting the users from many programming issues. Among others, recent programming frameworks include simpler syntax, automatic memory…

Programming Languages · Computer Science 2018-10-29 Cristian Ramon-Cortes , Ramon Amela , Jorge Ejarque , Philippe Clauss , Rosa M. Badia

As the artificial intelligence community advances into the era of large models with billions of parameters, distributed training and inference have become essential. While various parallelism strategies-data, model, sequence, and…

Machine Learning · Computer Science 2025-03-13 Ruifeng She , Bowen Pang , Kai Li , Zehua Liu , Tao Zhong

This article presents an automatic approach to quickly derive a good solution for hardware resource partition and task granularity for task-based parallel applications on heterogeneous many-core architectures. Our approach employs a…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-03-10 Peng Zhang , Jianbin Fang , Canqun Yang , Chun Huang , Tao Tang , Zheng Wang

Sparse, irregular graphs show up in various applications like linear algebra, machine learning, engineering simulations, robotic control, etc. These graphs have a high degree of parallelism, but their execution on parallel threads of modern…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-02-17 Nimish Shah , Wannes Meert , Marian Verhelst

This paper describes Plumbing for Optimization with Asynchronous Parallelism (POAP) and the Python Surrogate Optimization Toolbox (pySOT). POAP is an event-driven framework for building and combining asynchronous optimization strategies,…

Optimization and Control · Mathematics 2019-08-02 David Eriksson , David Bindel , Christine A. Shoemaker

Prior work on Automatically Scalable Computation (ASC) suggests that it is possible to parallelize sequential computation by building a model of whole-program execution, using that model to predict future computations, and then…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-09-21 Peter Kraft , Amos Waterland , Daniel Y Fu , Anitha Gollamudi , Shai Szulanski , Margo Seltzer

PARyOpt is a python based implementation of the Bayesian optimization routine designed for remote and asynchronous function evaluations. Bayesian optimization is especially attractive for computational optimization due to its low cost…

Optimization and Control · Mathematics 2018-09-14 Balaji Sesha Sarath Pokuri , Alec Lofquist , Chad M Risko , Baskar Ganapathysubramanian

The aim of the paper is to introduce general techniques in order to optimize the parallel execution time of sorting on a distributed architectures with processors of various speeds. Such an application requires a partitioning step. For…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-08-16 Christophe Cérin , Jean-Christophe Dubacq , Jean-Louis Roch , the SafeScale Collaboration

The rapid adoption of large language models and multimodal foundation models has made multimodal data preparation pipelines critical AI infrastructure. These pipelines interleave CPU-heavy preprocessing with accelerator-backed (GPU/NPU/TPU)…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-03-03 Ding Pan , Zhuangzhuang Zhou , Long Qian , Binhang Yuan

With growing deployment of Internet of Things (IoT) and machine learning (ML) applications, which need to leverage computation on edge and cloud resources, it is important to develop algorithms and tools to place these distributed…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-12-30 Xiangchen Zhao , Diyi Hu , Bhaskar Krishnamachari

There are billions of lines of sequential code inside nowadays' software which do not benefit from the parallelism available in modern multicore architectures. Automatically parallelizing sequential code, to promote an efficient use of the…

Programming Languages · Computer Science 2016-04-13 Alcides Fonseca , Bruno Cabral , João Rafael , Ivo Correia

The rapid growth of large language models (LLMs) and the continuous release of new GPU products have significantly increased the demand for distributed training across heterogeneous GPU environments. In this paper, we present a…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-12-25 Yuxiao Wang , Yuedong Xu , Qingyang Duan , Yuxuan Liu , Lei Jiao , Yinghao Yu , Jun Wu

Randomized parallel algorithms for many fundamental problems achieve optimal linear work in expectation, but upgrading this guarantee to hold with high probability (whp) remains a recurring theoretical challenge. In this paper, we address…

Data Structures and Algorithms · Computer Science 2026-03-03 Chase Hutton , Adam Melrod

Heterogeneous computing is becoming mainstream in all scopes. This new era in computer architecture brings a new paradigm called Accelerator Level Parallelism (ALP). In ALP, accelerators are used concurrently to provide unprecedented levels…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-09-22 Pablo Antonio Martínez , Gregorio Bernabé , Jose Manuel García

Detecting parallelizable code regions is a challenging task, even for experienced developers. Numerous recent studies have explored the use of machine learning for code analysis and program synthesis, including parallelization, in light of…

Machine Learning · Computer Science 2024-11-25 Le Chen , Quazi Ishtiaque Mahmud , Hung Phan , Nesreen K. Ahmed , Ali Jannesari

We present EvoSort, a general-purpose adaptive parallel parallel sorting framework accessible at the Python level. EvoSort employs a Genetic Algorithm (GA) to automatically discover and refine critical parameters, including insertion sort…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-01-05 Shashank Raj , Kalyanmoy Deb

In this paper, we present several improvements in the parallelization of the in-place merge algorithm, which merges two contiguous sorted arrays into one with an O(T) space complexity (where T is the number of threads). The approach divides…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-05-27 Berenger Bramas , Quentin Bramas

We present a novel characterization of the mapping of multiple parallelism forms (e.g. data and model parallelism) onto hierarchical accelerator systems that is hierarchy-aware and greatly reduces the space of software-to-hardware mapping.…

Programming Languages · Computer Science 2021-11-17 Ningning Xie , Tamara Norman , Dominik Grewe , Dimitrios Vytiniotis

Manual parallelization of code remains a significant challenge due to the complexities of modern software systems and the widespread adoption of multi-core architectures. This paper introduces OMPar, an AI-driven tool designed to automate…

Computation and Language · Computer Science 2024-09-24 Tal Kadosh , Niranjan Hasabnis , Prema Soundararajan , Vy A. Vo , Mihai Capota , Nesreen Ahmed , Yuval Pinter , Gal Oren

Purpose: Optoacoustic tomography (OAT) is inherently a three-dimensional (3D) inverse problem. However, most studies of OAT image reconstruction still employ two-dimensional (2D) imaging models. One important reason is because 3D image…

Medical Physics · Physics 2013-04-09 Kun Wang , Chao Huang , Yu-Jiun Kao , Cheng-Ying Chou , Alexander A. Oraevsky , Mark A. Anastasio
‹ Prev 1 2 3 10 Next ›