Related papers: Training Aware Sigmoidal Optimizer

Accelerating Neural Network Training with Distributed Asynchronous and Selective Optimization (DASO)

With increasing data and model complexities, the time required to train neural networks has become prohibitively large. To address the exponential rise in training time, users are turning to data parallel neural networks (DPNN) to utilize…

Machine Learning · Computer Science 2022-02-09 Daniel Coquelin , Charlotte Debus , Markus Götz , Fabrice von der Lehr , James Kahn , Martin Siggel , Achim Streit

A Simple Dynamic Learning Rate Tuning Algorithm For Automated Training of DNNs

Training neural networks on image datasets generally require extensive experimentation to find the optimal learning rate regime. Especially, for the cases of adversarial training or for training a newly synthesized model, one would not know…

Machine Learning · Computer Science 2019-10-28 Koyel Mukherjee , Alind Khare , Ashish Verma

AdaSAM: Boosting Sharpness-Aware Minimization with Adaptive Learning Rate and Momentum for Training Deep Neural Networks

Sharpness aware minimization (SAM) optimizer has been extensively explored as it can generalize better for training deep neural networks via introducing extra perturbation steps to flatten the landscape of deep learning models. Integrating…

Machine Learning · Computer Science 2023-03-02 Hao Sun , Li Shen , Qihuang Zhong , Liang Ding , Shixiang Chen , Jingwei Sun , Jing Li , Guangzhong Sun , Dacheng Tao

A Trainable Optimizer

The concept of learning to optimize involves utilizing a trainable optimization strategy rather than relying on manually defined full gradient estimations such as ADAM. We present a framework that jointly trains the full gradient estimator…

Machine Learning · Computer Science 2026-01-30 Ruiqi Wang , Diego Klabjan

Learning to Optimize Quasi-Newton Methods

Fast gradient-based optimization algorithms have become increasingly essential for the computationally efficient training of machine learning models. One technique is to multiply the gradient by a preconditioner matrix to produce a step,…

Machine Learning · Computer Science 2023-09-12 Isaac Liao , Rumen R. Dangovski , Jakob N. Foerster , Marin Soljačić

TADS: Task-Aware Data Selection for Multi-Task Multimodal Pre-Training

Large-scale multimodal pre-trained models like CLIP rely heavily on high-quality training data, yet raw web-crawled datasets are often noisy, misaligned, and redundant, leading to inefficient training and suboptimal generalization. Existing…

Machine Learning · Computer Science 2026-02-06 Guanjie Cheng , Boyi Li , Lingyu Sun , Mengying Zhu , Yangyang Wu , Xinkui Zhao , Shuiguang Deng

Dynamics of Learning: Generative Schedules from Latent ODEs

The learning rate schedule is one of the most impactful aspects of neural network optimization, yet most schedules either follow simple parametric functions or react only to short-term training signals. None of them are supported by a…

Machine Learning · Computer Science 2025-09-30 Matt L. Sampson , Peter Melchior

Amos: An Adam-style Optimizer with Adaptive Weight Decay towards Model-Oriented Scale

We present Amos, a stochastic gradient-based optimizer designed for training deep neural networks. It can be viewed as an Adam optimizer with theoretically supported, adaptive learning-rate decay and weight decay. A key insight behind Amos…

Machine Learning · Computer Science 2022-11-22 Ran Tian , Ankur P. Parikh

An Automatic Learning Rate Schedule Algorithm for Achieving Faster Convergence and Steeper Descent

The delta-bar-delta algorithm is recognized as a learning rate adaptation technique that enhances the convergence speed of the training process in optimization by dynamically scheduling the learning rate based on the difference between the…

Machine Learning · Computer Science 2023-10-18 Zhao Song , Chiwun Yang

Stress-Aware Resilient Neural Training

This paper introduces Stress-Aware Learning, a resilient neural training paradigm in which deep neural networks dynamically adjust their optimization behavior - whether under stable training regimes or in settings with uncertain dynamics -…

Machine Learning · Computer Science 2025-08-04 Ashkan Shakarami , Yousef Yeganeh , Azade Farshad , Lorenzo Nicole , Stefano Ghidoni , Nassir Navab

Fast-Slow Co-advancing Optimizer: Toward Harmonious Adversarial Training of GAN

Up to now, the training processes of typical Generative Adversarial Networks (GANs) are still particularly sensitive to data properties and hyperparameters, which may lead to severe oscillations, difficulties in convergence, or even…

Machine Learning · Computer Science 2025-04-22 Lin Wang , Xiancheng Wang , Rui Wang , Zhibo Zhang , Minghang Zhao

A Fast Saddle-Point Dynamical System Approach to Robust Deep Learning

Recent focus on robustness to adversarial attacks for deep neural networks produced a large variety of algorithms for training robust models. Most of the effective algorithms involve solving the min-max optimization problem for training…

Machine Learning · Computer Science 2021-03-03 Yasaman Esfandiari , Aditya Balu , Keivan Ebrahimi , Umesh Vaidya , Nicola Elia , Soumik Sarkar

AdamZ: An Enhanced Optimisation Method for Neural Network Training

AdamZ is an advanced variant of the Adam optimiser, developed to enhance convergence efficiency in neural network training. This optimiser dynamically adjusts the learning rate by incorporating mechanisms to address overshooting and…

Machine Learning · Computer Science 2024-11-26 Ilia Zaznov , Atta Badii , Alfonso Dufour , Julian Kunkel

Layer-Specific Adaptive Learning Rates for Deep Networks

The increasing complexity of deep learning architectures is resulting in training time requiring weeks or even months. This slow training is due in part to vanishing gradients, in which the gradients used by back-propagation are extremely…

Computer Vision and Pattern Recognition · Computer Science 2015-10-16 Bharat Singh , Soham De , Yangmuzi Zhang , Thomas Goldstein , Gavin Taylor

Dimer-Enhanced Optimization: A First-Order Approach to Escaping Saddle Points in Neural Network Training

First-order optimization methods, such as SGD and Adam, are widely used for training large-scale deep neural networks due to their computational efficiency and robust performance. However, relying solely on gradient information, these…

Machine Learning · Computer Science 2025-07-29 Yue Hu , Zanxia Cao , Yingchao Liu

TACO: Temporal Consensus Optimization for Continual Neural Mapping

Neural implicit mapping has emerged as a powerful paradigm for robotic navigation and scene understanding. However, real-world robotic deployment requires continual adaptation to changing environments under strict memory and computation…

Robotics · Computer Science 2026-05-29 Xunlan Zhou , Hongrui Zhao , Negar Mehr

Topology Aware Deep Learning for Wireless Network Optimization

Data-driven machine learning approaches have recently been proposed to facilitate wireless network optimization by learning latent knowledge from historical optimization instances. However, existing methods do not well handle the topology…

Networking and Internet Architecture · Computer Science 2021-01-06 Shuai Zhang , Bo Yin , Yu Cheng

VeLO: Training Versatile Learned Optimizers by Scaling Up

While deep learning models have replaced hand-designed features across many domains, these models are still trained with hand-designed optimizers. In this work, we leverage the same scaling approach behind the success of deep learning to…

Machine Learning · Computer Science 2022-11-18 Luke Metz , James Harrison , C. Daniel Freeman , Amil Merchant , Lucas Beyer , James Bradbury , Naman Agrawal , Ben Poole , Igor Mordatch , Adam Roberts , Jascha Sohl-Dickstein

TransBO: Hyperparameter Optimization via Two-Phase Transfer Learning

With the extensive applications of machine learning models, automatic hyperparameter optimization (HPO) has become increasingly important. Motivated by the tuning behaviors of human experts, it is intuitive to leverage auxiliary knowledge…

Machine Learning · Computer Science 2022-06-07 Yang Li , Yu Shen , Huaijun Jiang , Wentao Zhang , Zhi Yang , Ce Zhang , Bin Cui

Optimal Learning Rate Schedule for Balancing Effort and Performance

Learning how to learn efficiently is a fundamental challenge for biological agents and a growing concern for artificial ones. To learn effectively, an agent must regulate its learning speed, balancing the benefits of rapid improvement…

Machine Learning · Computer Science 2026-01-13 Valentina Njaradi , Rodrigo Carrasco-Davis , Peter E. Latham , Andrew Saxe