Related papers: Beyond adaptive gradient: Fast-Controlled Minibatc…
The supervised training of a deep neural network on a given dataset consists in the unconstrained minimization of the finite sum of continuously differentiable functions, commonly referred to as loss with respect to the samples. These…
Current deep learning adaptive optimizer methods adjust the step magnitude of parameter updates by altering the effective learning rate used by each parameter. Motivated by the known inverse relation between batch size and learning rate on…
Mini-batch algorithms have been proposed as a way to speed-up stochastic convex optimization problems. We study how such algorithms can be improved using accelerated gradient methods. We provide a novel analysis, which shows how standard…
In the paper, we propose a class of faster adaptive Gradient Descent Ascent (GDA) methods for solving the nonconvex-strongly-concave minimax problems by using the unified adaptive matrices, which include almost all existing coordinate-wise…
The goal of this paper is to accelerate the training of machine learning models, a critical challenge since the training of large-scale deep neural models can be computationally expensive. Stochastic gradient descent (SGD) and its variants…
This paper is concerned with the computing efficiency of model predictive control (MPC) problems for dynamical systems with both rate and amplitude constraints on the inputs. Instead of augmenting the decision variables of the underlying…
Stochastic gradient descent is a canonical tool for addressing stochastic optimization problems, and forms the bedrock of modern machine learning and statistics. In this work, we seek to balance the fact that attenuating step-size is…
Stochastic gradient descent-based algorithms are widely used for training deep neural networks but often suffer from slow convergence. To address the challenge, we leverage the framework of the alternating direction method of multipliers…
Neural networks require a large amount of annotated data to learn. Meta-learning algorithms propose a way to decrease the number of training samples to only a few. One of the most prominent optimization-based meta-learning algorithms is…
Real-world vision models in dynamic environments face rapid shifts in domain distributions, leading to decreased recognition performance. Using unlabeled test data, continuous test-time adaptation (CTTA) directly adjusts a pre-trained…
Modern adaptive optimization methods, such as Adam and its variants, have emerged as the most widely used tools in deep learning over recent years. These algorithms offer automatic mechanisms for dynamically adjusting the update step based…
Mini-batch stochastic gradient descent and variants thereof have become standard for large-scale empirical risk minimization like the training of neural networks. These methods are usually used with a constant batch size chosen by simple…
We study here a fixed mini-batch gradient decent (FMGD) algorithm to solve optimization problems with massive datasets. In FMGD, the whole sample is split into multiple non-overlapping partitions. Once the partitions are formed, they are…
Due to its simplicity and outstanding ability to generalize, stochastic gradient descent (SGD) is still the most widely used optimization method despite its slow convergence. Meanwhile, adaptive methods have attracted rising attention of…
Stochastic dual coordinate ascent (SDCA) is an effective technique for solving regularized loss minimization problems in machine learning. This paper considers an extension of SDCA under the mini-batch setting that is often used in…
Following the introduction of Adam, several novel adaptive optimizers for deep learning have been proposed. These optimizers typically excel in some tasks but may not outperform Adam uniformly across all tasks. In this work, we introduce…
Model Agnostic Meta Learning or MAML has become the standard for few-shot learning as a meta-learning problem. MAML is simple and can be applied to any model, as its name suggests. However, it often suffers from instability and…
Deep neural networks have seen great success in recent years; however, training a deep model is often challenging as its performance heavily depends on the hyper-parameters used. In addition, finding the optimal hyper-parameter…
Few-shot, fine-grained classification in computer vision poses significant challenges due to the need to differentiate subtle class distinctions with limited data. This paper presents a novel method that enhances the Contrastive…
Interpreting gradient methods as fixed-point iterations, we provide a detailed analysis of those methods for minimizing convex objective functions. Due to their conceptual and algorithmic simplicity, gradient methods are widely used in…