Related papers: An MPI-Based Python Framework for Distributed Trai…

Distributed Machine Learning for Computational Engineering using MPI

We propose a framework for training neural networks that are coupled with partial differential equations (PDEs) in a parallel computing environment. Unlike most distributed computing frameworks for deep neural networks, our focus is to…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-11-25 Kailai Xu , Weiqiang Zhu , Eric Darve

Theano-MPI: a Theano-based Distributed Training Framework

We develop a scalable and extendable training framework that can utilize GPUs across nodes in a cluster and accelerate the training of deep learning models based on data parallelism. Both synchronous and asynchronous training are…

Machine Learning · Computer Science 2016-05-27 He Ma , Fei Mao , Graham W. Taylor

pPython Performance Study

pPython seeks to provide a parallel capability that provides good speed-up without sacrificing the ease of programming in Python by implementing partitioned global array semantics (PGAS) on top of a simple file-based messaging library…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-01-03 Chansup Byun , William Arcand , David Bestor , Bill Bergeron , Vijay Gadepally , Michael Houle , Matthew Hubbell , Hayden Jananthan , Michael Jones , Anna Klein , Peter Michaleas , Lauren Milechin , Guillermo Morales , Julie Mullen , Andrew Prout , Albert Reuther , Antonio Rosa , Siddharth Samsi , Charles Yee , Jeremy Kepner

MPICH-G2: A Grid-Enabled Implementation of the Message Passing Interface

Application development for distributed computing "Grids" can benefit from tools that variously hide or enable application-level management of critical aspects of the heterogeneous environment. As part of an investigation of these issues,…

Distributed, Parallel, and Cluster Computing · Computer Science 2007-05-23 N. T. Karonis , B. Toonen , I. Foster

Machine Learning Interatomic Potentials with Keras API

A neural network is used to train, predict, and evaluate a model to calculate the energies of 3-dimensional systems composed of Ti and O atoms. Python classes are implemented to quantify atomic interactions through symmetry functions and to…

Computational Physics · Physics 2024-04-30 James Paolo Rili

MPI-rical: Data-Driven MPI Distributed Parallelism Assistance with Transformers

Message Passing Interface (MPI) plays a crucial role in distributed memory parallelization across multiple nodes. However, parallelizing MPI code manually, and specifically, performing domain decomposition, is a challenging, error-prone…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-08-31 Nadav Schneider , Tal Kadosh , Niranjan Hasabnis , Timothy Mattson , Yuval Pinter , Gal Oren

Machine Learning Pipelines with Modern Big Data Tools for High Energy Physics

The effective utilization at scale of complex machine learning (ML) techniques for HEP use cases poses several technological challenges, most importantly on the actual implementation of dedicated end-to-end data pipelines. A solution to…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-06-17 Matteo Migliorini , Riccardo Castellotti , Luca Canali , Marco Zanetti

Data-parallel distributed training of very large models beyond GPU capacity

GPUs have limited memory and it is difficult to train wide and/or deep models that cause the training process to go out of memory. It is shown in this paper how an open source tool called Large Model Support (LMS) can utilize a high…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-11-30 Samuel Matzek , Max Grossman , Minsik Cho , Anar Yusifov , Bryant Nelson , Amit Juneja

pPython for Parallel Python Programming

pPython seeks to provide a parallel capability that provides good speed-up without sacrificing the ease of programming in Python by implementing partitioned global array semantics (PGAS) on top of a simple file-based messaging library…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-12-12 Chansup Byun , William Arcand , David Bestor , Bill Bergeron , Vijay Gadepally , Michael Houle , Matthew Hubbell , Hayden Jananthan , Michael Jones , Kurt Keville , Anna Klein , Peter Michaleas , Lauren Milechin , Guillermo Morales , Julie Mullen , Andrew Prout , Albert Reuther , Antonio Rosa , Siddharth Samsi , Charles Yee , Jeremy Kepner

BAGUA: Scaling up Distributed Learning with System Relaxations

Recent years have witnessed a growing list of systems for distributed data-parallel training. Existing systems largely fit into two paradigms, i.e., parameter server and MPI-style collective operations. On the algorithmic side, researchers…

Machine Learning · Computer Science 2021-11-29 Shaoduo Gan , Xiangru Lian , Rui Wang , Jianbin Chang , Chengjun Liu , Hongmei Shi , Shengzhuo Zhang , Xianghong Li , Tengxu Sun , Jiawei Jiang , Binhang Yuan , Sen Yang , Ji Liu , Ce Zhang

MPI Advance : Open-Source Message Passing Optimizations

The large variety of production implementations of the message passing interface (MPI) each provide unique and varying underlying algorithms. Each emerging supercomputer supports one or a small number of system MPI installations, tuned for…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-09-15 Amanda Bienz , Derek Schafer , Anthony Skjellum

Parallel Programming Model for the Epiphany Many-Core Coprocessor Using Threaded MPI

The Adapteva Epiphany many-core architecture comprises a 2D tiled mesh Network-on-Chip (NoC) of low-power RISC cores with minimal uncore functionality. It offers high computational energy efficiency for both integer and floating point…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-06-18 James A. Ross , David A. Richie , Song J. Park , Dale R. Shires

MXNET-MPI: Embedding MPI parallelism in Parameter Server Task Model for scaling Deep Learning

Existing Deep Learning frameworks exclusively use either Parameter Server(PS) approach or MPI parallelism. In this paper, we discuss the drawbacks of such approaches and propose a generic framework supporting both PS and MPI programming…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-01-12 Amith R Mamidala , Georgios Kollias , Chris Ward , Fausto Artico

Characterizing and Understanding Distributed GNN Training on GPUs

Graph neural network (GNN) has been demonstrated to be a powerful model in many domains for its effectiveness in learning over graphs. To scale GNN training for large graphs, a widely adopted approach is distributed training which…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-04-19 Haiyang Lin , Mingyu Yan , Xiaocheng Yang , Mo Zou , Wenming Li , Xiaochun Ye , Dongrui Fan

MPI-Q: A Message Communication Library for Large-Scale Classical-Quantum Heterogeneous Hybrid Distributed Computing

The classical-quantum system heterogeneity (different data characteristics, execution paradigms and synchronization mechanism etc.) renders existing distributed communication mechanisms (e.g. MPI, NCCL etc.) inadequate. This bottleneck…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-04-03 Feng Wang , Junchao Wang , Zeyuan Wang , Lei Li , Hang Lian , Yangyang Fei , Jinyang Yao , Xuyan Qi , Fudong Liu , Yifan Hou , Shibo Liang , Zheng Shan

Empowering Distributed Training with Sparsity-driven Data Synchronization

Distributed training is the de facto standard to scale up the training of deep learning models with multiple GPUs. Its performance bottleneck lies in communications for gradient synchronization. Although high tensor sparsity is widely…

Machine Learning · Computer Science 2024-12-17 Zhuang Wang , Zhaozhuo Xu , Jingyi Xi , Yuke Wang , Anshumali Shrivastava , T. S. Eugene Ng

MicroPython Testbed for Federated Learning Algorithms

Recently, Python Testbed for Federated Learning Algorithms emerged as a low code and generative large language models amenable framework for developing decentralized and distributed applications, primarily targeting edge systems, by…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-01-23 Miroslav Popovic , Marko Popovic , Ivan Kastelan , Miodrag Djukic , Ilija Basicevic

Understanding and Optimizing the Performance of Distributed Machine Learning Applications on Apache Spark

In this paper we explore the performance limits of Apache Spark for machine learning applications. We begin by analyzing the characteristics of a state-of-the-art distributed machine learning algorithm implemented in Spark and compare it to…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-06-21 Celestine Dünner , Thomas Parnell , Kubilay Atasu , Manolis Sifalakis , Haralampos Pozidis

A Data and Model-Parallel, Distributed and Scalable Framework for Training of Deep Networks in Apache Spark

Training deep networks is expensive and time-consuming with the training period increasing with data size and growth in model parameters. In this paper, we provide a framework for distributed training of deep networks over a cluster of CPUs…

Machine Learning · Statistics 2017-08-22 Disha Shrivastava , Santanu Chaudhury , Dr. Jayadeva

MPI-based Evaluation of Coordinator Election Algorithms

In this paper, we detail how two types of distributed coordinator election algorithms can be compared in terms of performance based on an evaluation on the High Performance Computing (HPC) infrastructure. An experimental approach based on…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-11-09 Filip De Turck