Related papers: A Distributed Second-Order Algorithm You Can Trust
This paper considers distributed optimization problems, where each agent cooperatively minimizes the sum of local objective functions through the communication with its neighbors. The widely adopted distributed gradient method in solving…
Due to the rapidly growing scale and heterogeneity of wireless networks, the design of distributed cross-layer optimization algorithms have received significant interest from the networking research community. So far, the standard…
Although the field of distributed optimization is well-developed, relevant literature focused on the application of distributed optimization to multi-robot problems is limited. This survey constitutes the second part of a two-part series on…
An algorithm is proposed for solving optimization problems arising in neural network training for supervised learning. The unique feature of the algorithm is the use of an auxiliary loss, in addition to the original loss employed for model…
We consider distributed optimization problems where forming the Hessian is computationally challenging and communication is a significant bottleneck. We develop unbiased parameter averaging methods for randomized second order optimization…
Distributed training of $l_1$ regularized classifiers has received great attention recently. Most existing methods approach this problem by taking steps obtained from approximating the objective by a quadratic approximation that is…
We study distributed algorithms for expected loss minimization where the datasets are large and have to be stored on different machines. Often we deal with minimizing the average of a set of convex functions where each function is the…
Distributed optimization is widely used in large-scale and privacy-preserving machine learning, where each agent stores a local objective and communicates only with its neighbors in a connected network. We study decentralized second-order…
Second-order methods for neural network optimization have several advantages over methods based on first-order gradient descent, including better scaling to large mini-batch sizes and fewer updates needed for convergence. But they are…
Scalable machine learning over big data is an important problem that is receiving a lot of attention in recent years. On popular distributed environments such as Hadoop running on a cluster of commodity machines, communication costs are…
As the size of datasets used in statistical learning continues to grow, distributed training of models has attracted increasing attention. These methods partition the data and exploit parallelism to reduce memory and runtime, but suffer…
Distributed optimization is the standard way of speeding up machine learning training, and most of the research in the area focuses on distributed first-order, gradient-based methods. Yet, there are settings where some…
Distributed learning provides an attractive framework for scaling the learning task by sharing the computational load over multiple nodes in a network. Here, we investigate the performance of distributed learning for large-scale linear…
Large scale optimization problems are ubiquitous in machine learning and data analysis and there is a plethora of algorithms for solving such problems. Many of these algorithms employ sub-sampling, as a way to either speed up the…
We introduce a novel method to compute a rank $m$ approximation of the inverse of the Hessian matrix in the distributed regime. By leveraging the differences in gradients and parameters of multiple Workers, we are able to efficiently…
Training structured prediction models is time-consuming. However, most existing approaches only use a single machine, thus, the advantage of computing power and the capacity for larger data sets of multiple machines have not been exploited.…
This work considers the problem of learning the Markov parameters of a linear system from observed data. Recent non-asymptotic system identification results have characterized the sample complexity of this problem in the single and…
Scalable and efficient distributed learning is one of the main driving forces behind the recent rapid advancement of machine learning and artificial intelligence. One prominent feature of this topic is that recent progresses have been made…
Deep learning involves a difficult non-convex optimization problem with a large number of weights between any two adjacent layers of a deep structure. To handle large data sets or complicated networks, distributed training is needed, but…
A stochastic second-order trust region method is proposed, which can be viewed as a second-order extension of the trust-region-ish (TRish) algorithm proposed by Curtis et al. (INFORMS J. Optim. 1(3) 200-220, 2019). In each iteration, a…