Related papers: A Hessian-informed hyperparameter optimization for…

One LR Doesn't Fit All: Heavy-Tail Guided Layerwise Learning Rates for LLMs

Learning rate configuration is a fundamental aspect of modern deep learning. The prevailing practice of applying a uniform learning rate across all layers overlooks the structural heterogeneity of Transformers, potentially limiting their…

Machine Learning · Computer Science 2026-05-28 Di He , Songjun Tu , Keyu Wang , Lu Yin , Shiwei Liu

Towards hyperparameter-free optimization with differential privacy

Differential privacy (DP) is a privacy-preserving paradigm that protects the training data when training deep learning models. Critically, the performance of models is determined by the training hyperparameters, especially those of the…

Machine Learning · Computer Science 2025-03-04 Zhiqi Bu , Ruixuan Liu

Improving Neural Network Learning Through Dual Variable Learning Rates

This paper introduces and evaluates a novel training method for neural networks: Dual Variable Learning Rates (DVLR). Building on insights from behavioral psychology, the dual learning rates are used to emphasize correct and incorrect…

Machine Learning · Computer Science 2021-02-11 Elizabeth Liner , Risto Miikkulainen

TempoRL: laser pulse temporal shape optimization with Deep Reinforcement Learning

High Power Laser's (HPL) optimal performance is essential for the success of a wide variety of experimental tasks related to light-matter interactions. Traditionally, HPL parameters are optimised in an automated fashion relying on black-box…

Optics · Physics 2023-04-25 Francesco Capuano , Davorin Peceli , Gabriele Tiboni , Raffaello Camoriano , Bedřich Rus

Learning with Random Learning Rates

Hyperparameter tuning is a bothersome step in the training of deep learning models. One of the most sensitive hyperparameters is the learning rate of the gradient descent. We present the 'All Learning Rates At Once' (Alrao) optimization…

Machine Learning · Computer Science 2019-01-30 Léonard Blier , Pierre Wolinski , Yann Ollivier

Deep Reinforcement Learning using Cyclical Learning Rates

Deep Reinforcement Learning (DRL) methods often rely on the meticulous tuning of hyperparameters to successfully resolve problems. One of the most influential parameters in optimization procedures based on stochastic gradient descent (SGD)…

Machine Learning · Computer Science 2020-08-05 Ralf Gulde , Marc Tuscher , Akos Csiszar , Oliver Riedel , Alexander Verl

Forget the Learning Rate, Decay Loss

In the usual deep neural network optimization process, the learning rate is the most important hyper parameter, which greatly affects the final convergence effect. The purpose of learning rate is to control the stepsize and gradually reduce…

Machine Learning · Computer Science 2019-05-02 Jiakai Wei

Revisiting Learning Rate Control

The learning rate is one of the most important hyperparameters in deep learning, and how to control it is an active area within both AutoML and deep learning research. Approaches for learning rate control span from classic optimization to…

Machine Learning · Computer Science 2025-07-03 Micha Henheik , Theresa Eimer , Marius Lindauer

Differentiable Self-Adaptive Learning Rate

Learning rate adaptation is a popular topic in machine learning. Gradient Descent trains neural nerwork with a fixed learning rate. Learning rate adaptation is proposed to accelerate the training process through adjusting the step size in…

Machine Learning · Computer Science 2022-10-20 Bozhou Chen , Hongzhi Wang , Chenmin Ba

Improving Neural Network Training using Dynamic Learning Rate Schedule for PINNs and Image Classification

Training neural networks can be challenging, especially as the complexity of the problem increases. Despite using wider or deeper networks, training them can be a tedious process, especially if a wrong choice of the hyperparameter is made.…

Computational Engineering, Finance, and Science · Computer Science 2025-07-30 D. Veerababu , Ashwin A. Raikar , Prasanta K. Ghosh

HypeRL: Hypernetwork-Based Reinforcement Learning for Control of Parametrized Dynamical Systems

In this work, we devise a new, general-purpose reinforcement learning strategy for the optimal control of parametric dynamical systems. Such problems frequently arise in applied sciences and engineering and entail a significant complexity…

Machine Learning · Computer Science 2026-02-12 Nicolò Botteghi , Stefania Fresca , Mengwu Guo , Andrea Manzoni

On the Importance of Hyperparameter Optimization for Model-based Reinforcement Learning

Model-based Reinforcement Learning (MBRL) is a promising framework for learning control in a data-efficient manner. MBRL algorithms can be fairly complex due to the separate dynamics modeling and the subsequent planning algorithm, and as a…

Machine Learning · Computer Science 2021-03-01 Baohe Zhang , Raghu Rajan , Luis Pineda , Nathan Lambert , André Biedenkapp , Kurtland Chua , Frank Hutter , Roberto Calandra

Demystifying Learning Rate Policies for High Accuracy Training of Deep Neural Networks

Learning Rate (LR) is an important hyper-parameter to tune for effective training of deep neural networks (DNNs). Even for the baseline of a constant learning rate, it is non-trivial to choose a good constant value for training a DNN.…

Machine Learning · Computer Science 2019-10-29 Yanzhao Wu , Ling Liu , Juhyun Bae , Ka-Ho Chow , Arun Iyengar , Calton Pu , Wenqi Wei , Lei Yu , Qi Zhang

LipschitzLR: Using theoretically computed adaptive learning rates for fast convergence

Optimizing deep neural networks is largely thought to be an empirical process, requiring manual tuning of several hyper-parameters, such as learning rate, weight decay, and dropout rate. Arguably, the learning rate is the most important of…

Machine Learning · Computer Science 2020-08-04 Rahul Yedida , Snehanshu Saha , Tejas Prashanth

Adaptive pruning-based Newton's method for distributed learning

Newton's method leverages curvature information to boost performance, and thus outperforms first-order methods for distributed learning problems. However, Newton's method is not practical in large-scale and heterogeneous learning…

Machine Learning · Computer Science 2024-12-18 Shuzhen Chen , Yuan Yuan , Youming Tao , Tianzhu Wang , Zhipeng Cai , Dongxiao Yu

Learning Rate Dropout

The performance of a deep neural network is highly dependent on its training, and finding better local optimal solutions is the goal of many optimization algorithms. However, existing optimization algorithms show a preference for descent…

Computer Vision and Pattern Recognition · Computer Science 2019-12-06 Huangxing Lin , Weihong Zeng , Xinghao Ding , Yue Huang , Chenxi Huang , John Paisley

Layer-Specific Adaptive Learning Rates for Deep Networks

The increasing complexity of deep learning architectures is resulting in training time requiring weeks or even months. This slow training is due in part to vanishing gradients, in which the gradients used by back-propagation are extremely…

Computer Vision and Pattern Recognition · Computer Science 2015-10-16 Bharat Singh , Soham De , Yangmuzi Zhang , Thomas Goldstein , Gavin Taylor

HLoRA: Efficient Federated Learning System for LLM Heterogeneous Fine-Tuning

Federated learning systems have been identified as an efficient approach to scaling distributed model training with a large amount of participants or data owners while guaranteeing data privacy. To apply the current most popular pre-trained…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-03-04 Qianli Liu , Zhaorui Zhang , Xin Yao , Benben Liu

Full Kullback-Leibler-Divergence Loss for Hyperparameter-free Label Distribution Learning

The concept of Label Distribution Learning (LDL) is a technique to stabilize classification and regression problems with ambiguous and/or imbalanced labels. A prototypical use-case of LDL is human age estimation based on profile images.…

Machine Learning · Computer Science 2022-09-07 Maurice Günder , Nico Piatkowski , Christian Bauckhage

Learning Rate Matters: Vanilla LoRA May Suffice for LLM Fine-tuning

Low-Rank Adaptation (LoRA) is the prevailing approach for efficient large language model (LLM) fine-tuning. Building on this paradigm, recent studies have proposed alternative initialization strategies, architectural modifications, and…

Machine Learning · Computer Science 2026-05-20 Yu-Ang Lee , Ching-Yun Ko , Pin-Yu Chen , Mi-Yen Yeh