English
Related papers

Related papers: Distributed Optimization over Block-Cyclic Data

200 papers

Distributed machine learning has been widely studied in the literature to scale up machine learning model training in the presence of an ever-increasing amount of data. We study distributed machine learning from another perspective, where…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-05-16 Yaochen Hu , Di Niu , Jianming Yang , Shengping Zhou

One of the most common methods to train machine learning algorithms today is the stochastic gradient descent (SGD). In a distributed setting, SGD-based algorithms have been shown to converge theoretically under specific circumstances. A…

Machine Learning · Computer Science 2025-08-22 Soumya Sarkar , Shweta Jain

Stochastic convex optimization algorithms are the most popular way to train machine learning models on large-scale data. Scaling up the training process of these models is crucial, but the most popular algorithm, Stochastic Gradient Descent…

Machine Learning · Statistics 2018-10-30 Ashok Cutkosky , Robert Busa-Fekete

Many popular distributed optimization methods for training machine learning models fit the following template: a local gradient estimate is computed independently by each worker, then communicated to a master, which subsequently performs…

Machine Learning · Computer Science 2019-06-05 Konstantin Mishchenko , Filip Hanzely , Peter Richtárik

Stochastic gradient descent (SGD) is a popular stochastic optimization method in machine learning. Traditional parallel SGD algorithms, e.g., SimuParallel SGD, often require all nodes to have the same performance or to consume equal…

Machine Learning · Computer Science 2017-08-17 Cheng Daning , Li Shigang , Zhang Yunquan

The convergence of SGD based distributed training algorithms is tied to the data distribution across workers. Standard partitioning techniques try to achieve equal-sized partitions with per-class population distribution in proportion to the…

Machine Learning · Computer Science 2023-09-20 Haoze He , Parijat Dube

As the size of modern data sets exceeds the disk and memory capacities of a single computer, machine learning practitioners have resorted to parallel and distributed computing. Given that optimization is one of the pillars of machine…

Machine Learning · Statistics 2017-04-18 Alexandros Nathan , Diego Klabjan

Stochastic gradient descent (SGD) is a widely adopted iterative method for optimizing differentiable objective functions. In this paper, we propose and discuss a novel approach to scale up SGD in applications involving non-convex functions…

Machine Learning · Statistics 2022-10-07 Saad Mohamad , Hamad Alamri , Abdelhamid Bouchachia

Recently, there is a growing interest in the study of median-based algorithms for distributed non-convex optimization. Two prominent such algorithms include signSGD with majority vote, an effective approach for communication reduction via…

Machine Learning · Computer Science 2019-06-07 Xiangyi Chen , Tiancong Chen , Haoran Sun , Zhiwei Steven Wu , Mingyi Hong

In distributed machine learning, a central node outsources computationally expensive calculations to external worker nodes. The properties of optimization procedures like stochastic gradient descent (SGD) can be leveraged to mitigate the…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-04-19 Maximilian Egger , Serge Kas Hanna , Rawad Bitar

Distributed asynchronous offline training has received widespread attention in recent years because of its high performance on large-scale data and complex models. As data are distributed from cloud-centric to edge nodes, a big challenge…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-09-03 Chengjie Li , Ruixuan Li , Haozhao Wang , Yuhua Li , Pan Zhou , Song Guo , Keqin Li

We consider distributed optimization under communication constraints for training deep learning models. We propose a new algorithm, whose parameter updates rely on two forces: a regular gradient step, and a corrective direction dictated by…

Machine Learning · Computer Science 2022-04-29 Yunfei Teng , Wenbo Gao , Francois Chalus , Anna Choromanska , Donald Goldfarb , Adrian Weller

There is an increased interest in building data analytics frameworks with advanced algebraic capabilities both in industry and academia. Many of these frameworks, e.g., TensorFlow and BIDMach, implement their compute-intensive primitives in…

Databases · Computer Science 2018-02-27 Yujing Ma , Florin Rusu , Martin Torres

Distributed-memory implementations of numerical optimization algorithm, such as stochastic gradient descent (SGD), require interprocessor communication at every iteration of the algorithm. On modern distributed-memory clusters where…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-01-14 Aditya Devarakonda , Ramakrishnan Kannan

With the recent proliferation of large-scale learning problems,there have been a lot of interest on distributed machine learning algorithms, particularly those that are based on stochastic gradient descent (SGD) and its variants. However,…

Machine Learning · Computer Science 2015-12-07 Ruiliang Zhang , Shuai Zheng , James T. Kwok

Distributed learning is essential to train machine learning algorithms across heterogeneous agents while maintaining data privacy. We conduct an asymptotic analysis of Unified Distributed SGD (UD-SGD), exploring a variety of communication…

Machine Learning · Computer Science 2024-10-30 Jie Hu , Yi-Ting Ma , Do Young Eun

One of the most widely used methods for solving large-scale stochastic optimization problems is distributed asynchronous stochastic gradient descent (DASGD), a family of algorithms that result from parallelizing stochastic gradient descent…

Optimization and Control · Mathematics 2021-07-08 Zhengyuan Zhou , Panayotis Mertikopoulos , Nicholas Bambos , Peter W. Glynn , Yinyu Ye

We introduce a new and increasingly relevant setting for distributed optimization in machine learning, where the data defining the optimization are distributed (unevenly) over an extremely large number of \nodes, but the goal remains to…

Machine Learning · Computer Science 2015-11-12 Jakub Konečný , Brendan McMahan , Daniel Ramage

We consider convex SGD updates with a block-cyclic structure, i.e. where each cycle consists of a small number of blocks, each with many samples from a possibly different, block-specific, distribution. This situation arises, e.g., in…

Machine Learning · Computer Science 2019-04-24 Hubert Eichner , Tomer Koren , H. Brendan McMahan , Nathan Srebro , Kunal Talwar

In modern large-scale machine learning applications, the training data are often partitioned and stored on multiple machines. It is customary to employ the "data parallelism" approach, where the aggregated training loss is minimized without…

Machine Learning · Computer Science 2017-08-28 Shun Zheng , Jialei Wang , Fen Xia , Wei Xu , Tong Zhang
‹ Prev 1 2 3 10 Next ›