English
Related papers

Related papers: An MPI-Based Python Framework for Distributed Trai…

200 papers

We propose a framework for training neural networks that are coupled with partial differential equations (PDEs) in a parallel computing environment. Unlike most distributed computing frameworks for deep neural networks, our focus is to…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-11-25 Kailai Xu , Weiqiang Zhu , Eric Darve

We develop a scalable and extendable training framework that can utilize GPUs across nodes in a cluster and accelerate the training of deep learning models based on data parallelism. Both synchronous and asynchronous training are…

Machine Learning · Computer Science 2016-05-27 He Ma , Fei Mao , Graham W. Taylor

pPython seeks to provide a parallel capability that provides good speed-up without sacrificing the ease of programming in Python by implementing partitioned global array semantics (PGAS) on top of a simple file-based messaging library…

Application development for distributed computing "Grids" can benefit from tools that variously hide or enable application-level management of critical aspects of the heterogeneous environment. As part of an investigation of these issues,…

Distributed, Parallel, and Cluster Computing · Computer Science 2007-05-23 N. T. Karonis , B. Toonen , I. Foster

A neural network is used to train, predict, and evaluate a model to calculate the energies of 3-dimensional systems composed of Ti and O atoms. Python classes are implemented to quantify atomic interactions through symmetry functions and to…

Computational Physics · Physics 2024-04-30 James Paolo Rili

Message Passing Interface (MPI) plays a crucial role in distributed memory parallelization across multiple nodes. However, parallelizing MPI code manually, and specifically, performing domain decomposition, is a challenging, error-prone…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-08-31 Nadav Schneider , Tal Kadosh , Niranjan Hasabnis , Timothy Mattson , Yuval Pinter , Gal Oren

The effective utilization at scale of complex machine learning (ML) techniques for HEP use cases poses several technological challenges, most importantly on the actual implementation of dedicated end-to-end data pipelines. A solution to…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-06-17 Matteo Migliorini , Riccardo Castellotti , Luca Canali , Marco Zanetti

GPUs have limited memory and it is difficult to train wide and/or deep models that cause the training process to go out of memory. It is shown in this paper how an open source tool called Large Model Support (LMS) can utilize a high…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-11-30 Samuel Matzek , Max Grossman , Minsik Cho , Anar Yusifov , Bryant Nelson , Amit Juneja

pPython seeks to provide a parallel capability that provides good speed-up without sacrificing the ease of programming in Python by implementing partitioned global array semantics (PGAS) on top of a simple file-based messaging library…

Recent years have witnessed a growing list of systems for distributed data-parallel training. Existing systems largely fit into two paradigms, i.e., parameter server and MPI-style collective operations. On the algorithmic side, researchers…

The large variety of production implementations of the message passing interface (MPI) each provide unique and varying underlying algorithms. Each emerging supercomputer supports one or a small number of system MPI installations, tuned for…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-09-15 Amanda Bienz , Derek Schafer , Anthony Skjellum

The Adapteva Epiphany many-core architecture comprises a 2D tiled mesh Network-on-Chip (NoC) of low-power RISC cores with minimal uncore functionality. It offers high computational energy efficiency for both integer and floating point…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-06-18 James A. Ross , David A. Richie , Song J. Park , Dale R. Shires

Existing Deep Learning frameworks exclusively use either Parameter Server(PS) approach or MPI parallelism. In this paper, we discuss the drawbacks of such approaches and propose a generic framework supporting both PS and MPI programming…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-01-12 Amith R Mamidala , Georgios Kollias , Chris Ward , Fausto Artico

Graph neural network (GNN) has been demonstrated to be a powerful model in many domains for its effectiveness in learning over graphs. To scale GNN training for large graphs, a widely adopted approach is distributed training which…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-04-19 Haiyang Lin , Mingyu Yan , Xiaocheng Yang , Mo Zou , Wenming Li , Xiaochun Ye , Dongrui Fan

The classical-quantum system heterogeneity (different data characteristics, execution paradigms and synchronization mechanism etc.) renders existing distributed communication mechanisms (e.g. MPI, NCCL etc.) inadequate. This bottleneck…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-04-03 Feng Wang , Junchao Wang , Zeyuan Wang , Lei Li , Hang Lian , Yangyang Fei , Jinyang Yao , Xuyan Qi , Fudong Liu , Yifan Hou , Shibo Liang , Zheng Shan

Distributed training is the de facto standard to scale up the training of deep learning models with multiple GPUs. Its performance bottleneck lies in communications for gradient synchronization. Although high tensor sparsity is widely…

Machine Learning · Computer Science 2024-12-17 Zhuang Wang , Zhaozhuo Xu , Jingyi Xi , Yuke Wang , Anshumali Shrivastava , T. S. Eugene Ng

Recently, Python Testbed for Federated Learning Algorithms emerged as a low code and generative large language models amenable framework for developing decentralized and distributed applications, primarily targeting edge systems, by…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-01-23 Miroslav Popovic , Marko Popovic , Ivan Kastelan , Miodrag Djukic , Ilija Basicevic

In this paper we explore the performance limits of Apache Spark for machine learning applications. We begin by analyzing the characteristics of a state-of-the-art distributed machine learning algorithm implemented in Spark and compare it to…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-06-21 Celestine Dünner , Thomas Parnell , Kubilay Atasu , Manolis Sifalakis , Haralampos Pozidis

Training deep networks is expensive and time-consuming with the training period increasing with data size and growth in model parameters. In this paper, we provide a framework for distributed training of deep networks over a cluster of CPUs…

Machine Learning · Statistics 2017-08-22 Disha Shrivastava , Santanu Chaudhury , Dr. Jayadeva

In this paper, we detail how two types of distributed coordinator election algorithms can be compared in terms of performance based on an evaluation on the High Performance Computing (HPC) infrastructure. An experimental approach based on…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-11-09 Filip De Turck
‹ Prev 1 2 3 10 Next ›