Related papers: Accelerated Parallel Optimization Methods for Larg…

Parallel Coordinate Descent for L1-Regularized Loss Minimization

We propose Shotgun, a parallel coordinate descent algorithm for minimizing L1-regularized losses. Though coordinate descent seems inherently sequential, we prove convergence bounds for Shotgun which predict linear speedups, up to a…

Machine Learning · Computer Science 2011-05-27 Joseph K. Bradley , Aapo Kyrola , Danny Bickson , Carlos Guestrin

Parallel Coordinate Descent Methods for Big Data Optimization

In this work we show that randomized (block) coordinate descent methods can be accelerated by parallelization when applied to the problem of minimizing the sum of a partially separable smooth convex function and a simple separable convex…

Optimization and Control · Mathematics 2013-11-27 Peter Richtárik , Martin Takáč

A Universal Catalyst for First-Order Optimization

We introduce a generic scheme for accelerating first-order optimization methods in the sense of Nesterov, which builds upon a new analysis of the accelerated proximal point algorithm. Our approach consists of minimizing a convex objective…

Optimization and Control · Mathematics 2015-10-27 Hongzhou Lin , Julien Mairal , Zaid Harchaoui

Accelerating Proximal Gradient-type Algorithms using Damped Anderson Acceleration with Restarts and Nesterov Initialization

Despite their frequent slow convergence, proximal gradient schemes are widely used in large-scale optimization tasks due to their tremendous stability, scalability, and ease of computation. In this paper, we develop and investigate a…

Computation · Statistics 2025-08-19 Nicholas C. Henderson , Ravi Varadhan

Nestrov's Acceleration For Second Order Method

Optimization plays a key role in machine learning. Recently, stochastic second-order methods have attracted much attention due to their low computational cost in each iteration. However, these algorithms might perform poorly especially if…

Machine Learning · Computer Science 2017-10-25 Haishan Ye , Zhihua Zhang

Accelerating Asynchronous Algorithms for Convex Optimization by Momentum Compensation

Asynchronous algorithms have attracted much attention recently due to the crucial demands on solving large-scale optimization problems. However, the accelerated versions of asynchronous algorithms are rarely studied. In this paper, we…

Optimization and Control · Mathematics 2018-02-28 Cong Fang , Yameng Huang , Zhouchen Lin

An Analysis of Asynchronous Stochastic Accelerated Coordinate Descent

Gradient descent, and coordinate descent in particular, are core tools in machine learning and elsewhere. Large problem instances are common. To help solve them, two orthogonal approaches are known: acceleration and parallelism. In this…

Optimization and Control · Mathematics 2018-08-16 Richard Cole , Yixin Tao

Nesterov's Acceleration For Approximate Newton

Optimization plays a key role in machine learning. Recently, stochastic second-order methods have attracted much attention due to their low computational cost in each iteration. However, these algorithms might perform poorly especially if…

Machine Learning · Computer Science 2017-10-25 Haishan Ye , Zhihua Zhang

Faster Convergence of a Randomized Coordinate Descent Method for Linearly Constrained Optimization Problems

The problem of minimizing a separable convex function under linearly coupled constraints arises from various application domains such as economic systems, distributed control, and network flow. The main challenge for solving this problem is…

Optimization and Control · Mathematics 2017-09-05 Qin Fan , Min Xu , Yiming Ying

Gradient descent with momentum --- to accelerate or to super-accelerate?

We consider gradient descent with `momentum', a widely used method for loss function minimization in machine learning. This method is often used with `Nesterov acceleration', meaning that the gradient is evaluated not at the current…

Machine Learning · Computer Science 2020-01-20 Goran Nakerst , John Brennan , Masudul Haque

Optimizing Distributed Training Approaches for Scaling Neural Networks

This paper presents a comparative analysis of distributed training strategies for large-scale neural networks, focusing on data parallelism, model parallelism, and hybrid approaches. We evaluate these strategies on image classification…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-04-01 Vishnu Vardhan Baligodugula , Fathi Amsaad

Acceleration of Subspace Learning Machine via Particle Swarm Optimization and Parallel Processing

Built upon the decision tree (DT) classification and regression idea, the subspace learning machine (SLM) has been recently proposed to offer higher performance in general classification and regression tasks. Its performance improvement is…

Machine Learning · Computer Science 2022-08-16 Hongyu Fu , Yijing Yang , Yuhuai Liu , Joseph Lin , Ethan Harrison , Vinod K. Mishra , C. -C. Jay Kuo

Decentralization and Acceleration Enables Large-Scale Bundle Adjustment

Scaling to arbitrarily large bundle adjustment problems requires data and compute to be distributed across multiple devices. Centralized methods in prior works are only able to solve small or medium size problems due to overhead in…

Computer Vision and Pattern Recognition · Computer Science 2023-08-10 Taosha Fan , Joseph Ortiz , Ming Hsiao , Maurizio Monge , Jing Dong , Todd Murphey , Mustafa Mukadam

On a Combination of Alternating Minimization and Nesterov's Momentum

Alternating minimization (AM) procedures are practically efficient in many applications for solving convex and non-convex optimization problems. On the other hand, Nesterov's accelerated gradient is theoretically optimal first-order method…

Optimization and Control · Mathematics 2021-09-16 Sergey Guminov , Pavel Dvurechensky , Nazarii Tupitsa , Alexander Gasnikov

Nesterov Method for Asynchronous Pipeline Parallel Optimization

Pipeline Parallelism (PP) enables large neural network training on small, interconnected devices by splitting the model into multiple stages. To maximize pipeline utilization, asynchronous optimization is appealing as it offers 100%…

Machine Learning · Computer Science 2025-05-05 Thalaiyasingam Ajanthan , Sameera Ramasinghe , Yan Zuo , Gil Avraham , Alexander Long

A General Distributed Dual Coordinate Optimization Framework for Regularized Loss Minimization

In modern large-scale machine learning applications, the training data are often partitioned and stored on multiple machines. It is customary to employ the "data parallelism" approach, where the aggregated training loss is minimized without…

Machine Learning · Computer Science 2017-08-28 Shun Zheng , Jialei Wang , Fen Xia , Wei Xu , Tong Zhang

Fast Margin Maximization via Dual Acceleration

We present and analyze a momentum-based gradient method for training linear classifiers with an exponentially-tailed loss (e.g., the exponential or logistic loss), which maximizes the classification margin on separable data at a rate of…

Machine Learning · Computer Science 2021-08-24 Ziwei Ji , Nathan Srebro , Matus Telgarsky

On the Acceleration of Proximal Bundle Methods

The proximal bundle method (PBM) is a fundamental and computationally effective algorithm for solving nonsmooth optimization problems. In this paper, we present the first variant of the PBM for smooth objectives, achieving an accelerated…

Optimization and Control · Mathematics 2025-04-30 David Fersztand , Xu Andy Sun

A Robust Accelerated Optimization Algorithm for Strongly Convex Functions

This work proposes an accelerated first-order algorithm we call the Robust Momentum Method for optimizing smooth strongly convex functions. The algorithm has a single scalar parameter that can be tuned to trade off robustness to gradient…

Optimization and Control · Mathematics 2018-02-27 Saman Cyrus , Bin Hu , Bryan Van Scoy , Laurent Lessard

Partitioning Algorithms for Improving Efficiency of Topic Modeling Parallelization

Topic modeling is a very powerful technique in data analysis and data mining but it is generally slow. Many parallelization approaches have been proposed to speed up the learning process. However, they are usually not very efficient because…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-02-24 Hung Nghiep Tran , Atsuhiro Takasu