Related papers: Faster Adaptive Decentralized Learning Algorithms
Modern large-scale finite-sum optimization relies on two key aspects: distribution and stochastic updates. For smooth and strongly convex problems, existing decentralized algorithms are slower than modern accelerated variance-reduced…
Modern large-scale finite-sum optimization relies on two key aspects: distribution and stochastic updates. For smooth and strongly convex problems, existing decentralized algorithms are slower than modern accelerated variance-reduced…
Federated learning is a popular distributed and privacy-preserving learning paradigm in machine learning. Recently, some federated learning algorithms have been proposed to solve the distributed minimax problems. However, these federated…
In this work, we study the problem of minimizing the sum of strongly convex functions split over a network of $n$ nodes. We propose the decentralized and asynchronous algorithm ADFS to tackle the case when local functions are themselves…
Many modern large-scale machine learning problems benefit from decentralized and stochastic optimization. Recent works have shown that utilizing both decentralized computing and local stochastic gradient estimates can outperform…
Federated Learning is a popular distributed learning paradigm in machine learning. Meanwhile, composition optimization is an effective hierarchical learning model, which appears in many machine learning applications such as meta learning…
In this paper, we consider nonconvex decentralised optimisation and learning over a network of distributed agents. We develop an ADMM algorithm based on the Randomised Block Coordinate Douglas-Rachford splitting method which enables agents…
Derivative-free optimization (DFO) has recently gained a lot of momentum in machine learning, spawning interest in the community to design faster methods for problems where gradients are not accessible. While some attention has been given…
We develop a new consensus-based distributed algorithm for solving learning problems with feature partitioning and non-smooth convex objective functions. Such learning problems are not separable, i.e., the associated objective functions…
We provide new adaptive first-order methods for constrained convex optimization. Our main algorithms AdaACSA and AdaAGD+ are accelerated methods, which are universal in the sense that they achieve nearly-optimal convergence rates for both…
Bilevel optimization is a popular hierarchical model in machine learning, and has been widely applied to many machine learning tasks such as meta learning, hyperparameter learning and policy optimization. Although many bilevel optimization…
This paper considers the decentralized convex optimization problem, which has a wide range of applications in large-scale machine learning, sensor networks, and control theory. We propose novel algorithms that achieve optimal computation…
Adaptive gradient-based optimization methods such as \textsc{Adagrad}, \textsc{Rmsprop}, and \textsc{Adam} are widely used in solving large-scale machine learning problems including deep learning. A number of schemes have been proposed in…
We address distributed learning problems, both nonconvex and convex, over undirected networks. In particular, we design a novel algorithm based on the distributed Alternating Direction Method of Multipliers (ADMM) to address the challenges…
We propose a computationally-friendly adaptive learning rate schedule, "AdaLoss", which directly uses the information of the loss function to adjust the stepsize in gradient descent methods. We prove that this schedule enjoys linear…
Adaptive gradient methods are workhorses in deep learning. However, the convergence guarantees of adaptive gradient methods for nonconvex optimization have not been thoroughly studied. In this paper, we provide a fine-grained convergence…
Federated learning is a distributed machine learning paradigm in which a large number of clients coordinate with a central server to learn a model without sharing their own training data. Standard federated optimization methods such as…
Adaptive gradient methods have become popular in optimizing deep neural networks; recent examples include AdaGrad and Adam. Although Adam usually converges faster, variations of Adam, for instance, the AdaBelief algorithm, have been…
Motivated by machine learning applications in networks of sensors, internet-of-things (IoT) devices, and autonomous agents, we propose techniques for distributed stochastic convex learning from high-rate data streams. The setup involves a…
This paper addresses the problem of efficiently classifying high-dimensional data over decentralized networks. Penalized support vector machines (SVMs) are widely used for high-dimensional classification tasks. However, the double…