Related papers: Beyond adaptive gradient: Fast-Controlled Minibatc…

CMA Light: a novel Minibatch Algorithm for large-scale non convex finite sum optimization

The supervised training of a deep neural network on a given dataset consists in the unconstrained minimization of the finite sum of continuously differentiable functions, commonly referred to as loss with respect to the samples. These…

Optimization and Control · Mathematics 2024-05-24 Corrado Coppola , Giampaolo Liuzzi , Laura Palagi

Dynamic Batch Adaptation

Current deep learning adaptive optimizer methods adjust the step magnitude of parameter updates by altering the effective learning rate used by each parameter. Motivated by the known inverse relation between batch size and learning rate on…

Machine Learning · Computer Science 2022-08-02 Cristian Simionescu , George Stoica , Robert Herscovici

Better Mini-Batch Algorithms via Accelerated Gradient Methods

Mini-batch algorithms have been proposed as a way to speed-up stochastic convex optimization problems. We study how such algorithms can be improved using accelerated gradient methods. We provide a novel analysis, which shows how standard…

Machine Learning · Computer Science 2011-06-24 Andrew Cotter , Ohad Shamir , Nathan Srebro , Karthik Sridharan

AdaGDA: Faster Adaptive Gradient Descent Ascent Methods for Minimax Optimization

In the paper, we propose a class of faster adaptive Gradient Descent Ascent (GDA) methods for solving the nonconvex-strongly-concave minimax problems by using the unified adaptive matrices, which include almost all existing coordinate-wise…

Optimization and Control · Mathematics 2023-02-22 Feihu Huang , Xidong Wu , Zhengmian Hu

DIVEBATCH: Accelerating Model Training Through Gradient-Diversity Aware Batch Size Adaptation

The goal of this paper is to accelerate the training of machine learning models, a critical challenge since the training of large-scale deep neural models can be computationally expensive. Stochastic gradient descent (SGD) and its variants…

Machine Learning · Computer Science 2025-09-22 Yuen Chen , Yian Wang , Hari Sundaram

Fast Gradient Method for Model Predictive Control with Input Rate and Amplitude Constraints

This paper is concerned with the computing efficiency of model predictive control (MPC) problems for dynamical systems with both rate and amplitude constraints on the inputs. Instead of augmenting the decision variables of the underlying…

Optimization and Control · Mathematics 2020-03-13 Idris Kempf , Paul Goulart , Stephen Duncan

Balancing Rates and Variance via Adaptive Batch-Size for Stochastic Optimization Problems

Stochastic gradient descent is a canonical tool for addressing stochastic optimization problems, and forms the bedrock of modern machine learning and statistics. In this work, we seek to balance the fact that attenuating step-size is…

Signal Processing · Electrical Eng. & Systems 2020-07-10 Zhan Gao , Alec Koppel , Alejandro Ribeiro

BADM: Batch ADMM for Deep Learning

Stochastic gradient descent-based algorithms are widely used for training deep neural networks but often suffer from slow convergence. To address the challenge, we leverage the framework of the alternating direction method of multipliers…

Machine Learning · Computer Science 2025-02-03 Ouya Wang , Shenglong Zhou , Geoffrey Ye Li

Faster Optimization-Based Meta-Learning Adaptation Phase

Neural networks require a large amount of annotated data to learn. Meta-learning algorithms propose a way to decrease the number of training samples to only a few. One of the most prominent optimization-based meta-learning algorithms is…

Machine Learning · Computer Science 2022-06-14 Kostiantyn Khabarlak

PALM: Pushing Adaptive Learning Rate Mechanisms for Continual Test-Time Adaptation

Real-world vision models in dynamic environments face rapid shifts in domain distributions, leading to decreased recognition performance. Using unlabeled test data, continuous test-time adaptation (CTTA) directly adjusts a pre-trained…

Computer Vision and Pattern Recognition · Computer Science 2025-01-28 Sarthak Kumar Maharana , Baoming Zhang , Yunhui Guo

Improving Adaptive Moment Optimization via Preconditioner Diagonalization

Modern adaptive optimization methods, such as Adam and its variants, have emerged as the most widely used tools in deep learning over recent years. These algorithms offer automatic mechanisms for dynamically adjusting the update step based…

Machine Learning · Computer Science 2025-02-12 Son Nguyen , Bo Liu , Lizhang Chen , Qiang Liu

Coupling Adaptive Batch Sizes with Learning Rates

Mini-batch stochastic gradient descent and variants thereof have become standard for large-scale empirical risk minimization like the training of neural networks. These methods are usually used with a constant batch size chosen by simple…

Machine Learning · Computer Science 2017-06-29 Lukas Balles , Javier Romero , Philipp Hennig

Statistical Analysis of Fixed Mini-Batch Gradient Descent Estimator

We study here a fixed mini-batch gradient decent (FMGD) algorithm to solve optimization problems with massive datasets. In FMGD, the whole sample is split into multiple non-overlapping partitions. Once the partitions are formed, they are…

Computation · Statistics 2023-04-17 Haobo Qi , Feifei Wang , Hansheng Wang

ACMo: Angle-Calibrated Moment Methods for Stochastic Optimization

Due to its simplicity and outstanding ability to generalize, stochastic gradient descent (SGD) is still the most widely used optimization method despite its slow convergence. Meanwhile, adaptive methods have attracted rising attention of…

Optimization and Control · Mathematics 2020-06-15 Xunpeng Huang , Runxin Xu , Hao Zhou , Zhe Wang , Zhengyang Liu , Lei Li

Accelerated Mini-Batch Stochastic Dual Coordinate Ascent

Stochastic dual coordinate ascent (SDCA) is an effective technique for solving regularized loss minimization problems in machine learning. This paper considers an extension of SDCA under the mini-batch setting that is often used in…

Machine Learning · Statistics 2013-05-14 Shai Shalev-Shwartz , Tong Zhang

MADA: Meta-Adaptive Optimizers through hyper-gradient Descent

Following the introduction of Adam, several novel adaptive optimizers for deep learning have been proposed. These optimizers typically excel in some tasks but may not outperform Adam uniformly across all tasks. In this work, we introduce…

Machine Learning · Computer Science 2024-06-18 Kaan Ozkara , Can Karakus , Parameswaran Raman , Mingyi Hong , Shoham Sabach , Branislav Kveton , Volkan Cevher

Fast Adaptation with Kernel and Gradient based Meta Leaning

Model Agnostic Meta Learning or MAML has become the standard for few-shot learning as a meta-learning problem. MAML is simple and can be applied to any model, as its name suggests. However, it often suffers from instability and…

Machine Learning · Computer Science 2024-11-04 JuneYoung Park , MinJae Kang

AUTOMATA: Gradient Based Data Subset Selection for Compute-Efficient Hyper-parameter Tuning

Deep neural networks have seen great success in recent years; however, training a deep model is often challenging as its performance heavily depends on the hyper-parameters used. In addition, finding the optimal hyper-parameter…

Machine Learning · Computer Science 2022-03-17 Krishnateja Killamsetty , Guttu Sai Abhishek , Aakriti , Alexandre V. Evfimievski , Lucian Popa , Ganesh Ramakrishnan , Rishabh Iyer

Adaptive Prompt Tuning: Vision Guided Prompt Tuning with Cross-Attention for Fine-Grained Few-Shot Learning

Few-shot, fine-grained classification in computer vision poses significant challenges due to the need to differentiate subtle class distinctions with limited data. This paper presents a novel method that enhances the Contrastive…

Computer Vision and Pattern Recognition · Computer Science 2025-04-24 Eric Brouwer , Jan Erik van Woerden , Gertjan Burghouts , Matias Valdenegro-Toro , Marco Zullich

A Fixed-Point of View on Gradient Methods for Big Data

Interpreting gradient methods as fixed-point iterations, we provide a detailed analysis of those methods for minimizing convex objective functions. Due to their conceptual and algorithmic simplicity, gradient methods are widely used in…

Machine Learning · Statistics 2017-08-16 Alexander Jung