English
Related papers

Related papers: Beyond adaptive gradient: Fast-Controlled Minibatc…

200 papers

The supervised training of a deep neural network on a given dataset consists in the unconstrained minimization of the finite sum of continuously differentiable functions, commonly referred to as loss with respect to the samples. These…

Optimization and Control · Mathematics 2024-05-24 Corrado Coppola , Giampaolo Liuzzi , Laura Palagi

Current deep learning adaptive optimizer methods adjust the step magnitude of parameter updates by altering the effective learning rate used by each parameter. Motivated by the known inverse relation between batch size and learning rate on…

Machine Learning · Computer Science 2022-08-02 Cristian Simionescu , George Stoica , Robert Herscovici

Mini-batch algorithms have been proposed as a way to speed-up stochastic convex optimization problems. We study how such algorithms can be improved using accelerated gradient methods. We provide a novel analysis, which shows how standard…

Machine Learning · Computer Science 2011-06-24 Andrew Cotter , Ohad Shamir , Nathan Srebro , Karthik Sridharan

In the paper, we propose a class of faster adaptive Gradient Descent Ascent (GDA) methods for solving the nonconvex-strongly-concave minimax problems by using the unified adaptive matrices, which include almost all existing coordinate-wise…

Optimization and Control · Mathematics 2023-02-22 Feihu Huang , Xidong Wu , Zhengmian Hu

The goal of this paper is to accelerate the training of machine learning models, a critical challenge since the training of large-scale deep neural models can be computationally expensive. Stochastic gradient descent (SGD) and its variants…

Machine Learning · Computer Science 2025-09-22 Yuen Chen , Yian Wang , Hari Sundaram

This paper is concerned with the computing efficiency of model predictive control (MPC) problems for dynamical systems with both rate and amplitude constraints on the inputs. Instead of augmenting the decision variables of the underlying…

Optimization and Control · Mathematics 2020-03-13 Idris Kempf , Paul Goulart , Stephen Duncan

Stochastic gradient descent is a canonical tool for addressing stochastic optimization problems, and forms the bedrock of modern machine learning and statistics. In this work, we seek to balance the fact that attenuating step-size is…

Signal Processing · Electrical Eng. & Systems 2020-07-10 Zhan Gao , Alec Koppel , Alejandro Ribeiro

Stochastic gradient descent-based algorithms are widely used for training deep neural networks but often suffer from slow convergence. To address the challenge, we leverage the framework of the alternating direction method of multipliers…

Machine Learning · Computer Science 2025-02-03 Ouya Wang , Shenglong Zhou , Geoffrey Ye Li

Neural networks require a large amount of annotated data to learn. Meta-learning algorithms propose a way to decrease the number of training samples to only a few. One of the most prominent optimization-based meta-learning algorithms is…

Machine Learning · Computer Science 2022-06-14 Kostiantyn Khabarlak

Real-world vision models in dynamic environments face rapid shifts in domain distributions, leading to decreased recognition performance. Using unlabeled test data, continuous test-time adaptation (CTTA) directly adjusts a pre-trained…

Computer Vision and Pattern Recognition · Computer Science 2025-01-28 Sarthak Kumar Maharana , Baoming Zhang , Yunhui Guo

Modern adaptive optimization methods, such as Adam and its variants, have emerged as the most widely used tools in deep learning over recent years. These algorithms offer automatic mechanisms for dynamically adjusting the update step based…

Machine Learning · Computer Science 2025-02-12 Son Nguyen , Bo Liu , Lizhang Chen , Qiang Liu

Mini-batch stochastic gradient descent and variants thereof have become standard for large-scale empirical risk minimization like the training of neural networks. These methods are usually used with a constant batch size chosen by simple…

Machine Learning · Computer Science 2017-06-29 Lukas Balles , Javier Romero , Philipp Hennig

We study here a fixed mini-batch gradient decent (FMGD) algorithm to solve optimization problems with massive datasets. In FMGD, the whole sample is split into multiple non-overlapping partitions. Once the partitions are formed, they are…

Computation · Statistics 2023-04-17 Haobo Qi , Feifei Wang , Hansheng Wang

Due to its simplicity and outstanding ability to generalize, stochastic gradient descent (SGD) is still the most widely used optimization method despite its slow convergence. Meanwhile, adaptive methods have attracted rising attention of…

Optimization and Control · Mathematics 2020-06-15 Xunpeng Huang , Runxin Xu , Hao Zhou , Zhe Wang , Zhengyang Liu , Lei Li

Stochastic dual coordinate ascent (SDCA) is an effective technique for solving regularized loss minimization problems in machine learning. This paper considers an extension of SDCA under the mini-batch setting that is often used in…

Machine Learning · Statistics 2013-05-14 Shai Shalev-Shwartz , Tong Zhang

Following the introduction of Adam, several novel adaptive optimizers for deep learning have been proposed. These optimizers typically excel in some tasks but may not outperform Adam uniformly across all tasks. In this work, we introduce…

Machine Learning · Computer Science 2024-06-18 Kaan Ozkara , Can Karakus , Parameswaran Raman , Mingyi Hong , Shoham Sabach , Branislav Kveton , Volkan Cevher

Model Agnostic Meta Learning or MAML has become the standard for few-shot learning as a meta-learning problem. MAML is simple and can be applied to any model, as its name suggests. However, it often suffers from instability and…

Machine Learning · Computer Science 2024-11-04 JuneYoung Park , MinJae Kang

Deep neural networks have seen great success in recent years; however, training a deep model is often challenging as its performance heavily depends on the hyper-parameters used. In addition, finding the optimal hyper-parameter…

Few-shot, fine-grained classification in computer vision poses significant challenges due to the need to differentiate subtle class distinctions with limited data. This paper presents a novel method that enhances the Contrastive…

Computer Vision and Pattern Recognition · Computer Science 2025-04-24 Eric Brouwer , Jan Erik van Woerden , Gertjan Burghouts , Matias Valdenegro-Toro , Marco Zullich

Interpreting gradient methods as fixed-point iterations, we provide a detailed analysis of those methods for minimizing convex objective functions. Due to their conceptual and algorithmic simplicity, gradient methods are widely used in…

Machine Learning · Statistics 2017-08-16 Alexander Jung
‹ Prev 1 2 3 10 Next ›