English
Related papers

Related papers: Distributed Optimization using Heterogeneous Compu…

200 papers

To improve the utility of learning applications and render machine learning solutions feasible for complex applications, a substantial amount of heavy computations is needed. Thus, it is essential to delegate the computations among several…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-04-29 Homa Esfahanizadeh , Alejandro Cohen , Muriel Medard

Data loading can dominate deep neural network training time on large-scale systems. We present a comprehensive study on accelerating data loading performance in large-scale distributed training. We first identify performance and scalability…

Machine Learning · Computer Science 2020-02-20 Chih-Chieh Yang , Guojing Cong

Deep learning systems are optimized for clusters with homogeneous resources. However, heterogeneity is prevalent in computing infrastructure across edge, cloud and HPC. When training neural networks using stochastic gradient descent…

Machine Learning · Computer Science 2025-03-25 Sahil Tyagi , Prateek Sharma

CPU-GPU heterogeneous architectures are now commonly used in a wide variety of computing systems from mobile devices to supercomputers. Maximizing the throughput for multi-programmed workloads on such systems is indispensable as one single…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-05-08 Issa Saba , Eishi Arima , Dai Liu , Martin Schulz

Distributed optimization is the standard way of speeding up machine learning training, and most of the research in the area focuses on distributed first-order, gradient-based methods. Yet, there are settings where some…

Machine Learning · Computer Science 2025-11-03 Matin Ansaripour , Shayan Talaei , Giorgi Nadiradze , Dan Alistarh

Deep learning models are yielding increasingly better performances thanks to multiple factors. To be successful, model may have large number of parameters or complex architectures and be trained on large dataset. This leads to large…

Machine Learning · Computer Science 2022-12-20 Jean-Roch Vlimant , Junqi Yin

Access to parallel and distributed computation has enabled researchers and developers to improve algorithms and performance in many applications. Recent research has focused on next generation special purpose systems with multiple kinds of…

Machine Learning · Computer Science 2019-06-11 Tegg Taekyong Sung , Valliappa Chockalingam , Alex Yahja , Bo Ryu

We consider a distributed system, consisting of a heterogeneous set of devices, ranging from low-end to high-end. These devices have different profiles, e.g., different energy budgets, or different hardware specifications, determining their…

Machine Learning · Computer Science 2020-06-11 Martin Rapp , Ramin Khalili , Jörg Henkel

Distributed training is a novel approach to accelerate Deep Neural Networks (DNN) training, but common training libraries fall short of addressing the distributed cases with heterogeneous processors or the cases where the processing nodes…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-07-17 Ali HeydariGorji , Siavash Rezaei , Mahdi Torabzadehkashi , Hossein Bobarshad , Vladimir Alves , Pai H. Chou

Deep learning has led to tremendous advancements in the field of Artificial Intelligence. One caveat however is the substantial amount of compute needed to train these deep learning models. Training a benchmark dataset like ImageNet on a…

Machine Learning · Computer Science 2018-10-30 Karanbir Chahal , Manraj Singh Grover , Kuntal Dey

The demand for large-scale deep learning is increasing, and distributed training is the current mainstream solution. Ring AllReduce is widely used as a data parallel decentralized algorithm. However, in a heterogeneous environment, each…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-11-17 Yongyue Chao , Mingxue Liao , Jiaxin Gao

Distributed optimization and learning algorithms are designed to operate over large scale networks enabling processing of vast amounts of data effectively and efficiently. One of the main challenges for ensuring a smooth learning process in…

Systems and Control · Electrical Eng. & Systems 2026-01-21 Apostolos I. Rikos , Nicola Bastianello , Themistoklis Charalambous , Karl H. Johansson

In the rapidly evolving research on artificial intelligence (AI) the demand for fast, computationally efficient, and scalable solutions has increased in recent years. The problem of optimizing the computing resources for distributed machine…

Machine Learning · Computer Science 2025-10-30 Mohammadreza Doostmohammadian , Zulfiya R. Gabidullina , Hamid R. Rabiee

The world needs diverse and unbiased data to train deep learning models. Currently data comes from a variety of sources that are unmoderated to a large extent. The outcomes of training neural networks with unverified data yields biased…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-11-27 Vaibhav Mathur , Karanbir Chahal

Coded distributed computing framework enables large-scale machine learning (ML) models to be trained efficiently in a distributed manner, while mitigating the straggler effect. In this work, we consider a multi-task assignment problem in a…

Information Theory · Computer Science 2019-05-21 Yuxuan Sun , Junlin Zhao , Sheng Zhou , Deniz Gündüz

Deep neural networks (DNNs) exploit many layers and a large number of parameters to achieve excellent performance. The training process of DNN models generally handles large-scale input data with many sparse features, which incurs high…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-06-08 Ji Liu , Zhihua Wu , Dianhai Yu , Yanjun Ma , Danlei Feng , Minxu Zhang , Xinxuan Wu , Xuefeng Yao , Dejing Dou

Background: Distributed training is essential for large scale training of deep neural networks (DNNs). The dominant methods for large scale DNN training are synchronous (e.g. All-Reduce), but these require waiting for all workers in each…

Machine Learning · Computer Science 2023-09-26 Niv Giladi , Shahar Gottlieb , Moran Shkolnik , Asaf Karnieli , Ron Banner , Elad Hoffer , Kfir Yehuda Levy , Daniel Soudry

Minimizing job scheduling time is a fundamental issue in data center networks that has been extensively studied in recent years. The incoming jobs require different CPU and memory units, and span different number of time slots. The…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-11-21 Weijia Chen , Yuedong Xu , Xiaofeng Wu

Neural operators have been applied in various scientific fields, such as solving parametric partial differential equations, dynamical systems with control, and inverse problems. However, challenges arise when dealing with input functions…

Numerical Analysis · Mathematics 2023-10-31 Zecheng Zhang , Christian Moya , Lu Lu , Guang Lin , Hayden Schaeffer

Despite the notable success of deep neural networks (DNNs) in solving complex tasks, the training process still remains considerable challenges. A primary obstacle is the substantial time required for training, particularly as high…

Machine Learning · Computer Science 2025-09-09 Viet Hoang Pham , Hyo-Sung Ahn
‹ Prev 1 2 3 10 Next ›