Related papers: A Hessian-informed hyperparameter optimization for…
Learning rate configuration is a fundamental aspect of modern deep learning. The prevailing practice of applying a uniform learning rate across all layers overlooks the structural heterogeneity of Transformers, potentially limiting their…
Differential privacy (DP) is a privacy-preserving paradigm that protects the training data when training deep learning models. Critically, the performance of models is determined by the training hyperparameters, especially those of the…
This paper introduces and evaluates a novel training method for neural networks: Dual Variable Learning Rates (DVLR). Building on insights from behavioral psychology, the dual learning rates are used to emphasize correct and incorrect…
High Power Laser's (HPL) optimal performance is essential for the success of a wide variety of experimental tasks related to light-matter interactions. Traditionally, HPL parameters are optimised in an automated fashion relying on black-box…
Hyperparameter tuning is a bothersome step in the training of deep learning models. One of the most sensitive hyperparameters is the learning rate of the gradient descent. We present the 'All Learning Rates At Once' (Alrao) optimization…
Deep Reinforcement Learning (DRL) methods often rely on the meticulous tuning of hyperparameters to successfully resolve problems. One of the most influential parameters in optimization procedures based on stochastic gradient descent (SGD)…
In the usual deep neural network optimization process, the learning rate is the most important hyper parameter, which greatly affects the final convergence effect. The purpose of learning rate is to control the stepsize and gradually reduce…
The learning rate is one of the most important hyperparameters in deep learning, and how to control it is an active area within both AutoML and deep learning research. Approaches for learning rate control span from classic optimization to…
Learning rate adaptation is a popular topic in machine learning. Gradient Descent trains neural nerwork with a fixed learning rate. Learning rate adaptation is proposed to accelerate the training process through adjusting the step size in…
Training neural networks can be challenging, especially as the complexity of the problem increases. Despite using wider or deeper networks, training them can be a tedious process, especially if a wrong choice of the hyperparameter is made.…
In this work, we devise a new, general-purpose reinforcement learning strategy for the optimal control of parametric dynamical systems. Such problems frequently arise in applied sciences and engineering and entail a significant complexity…
Model-based Reinforcement Learning (MBRL) is a promising framework for learning control in a data-efficient manner. MBRL algorithms can be fairly complex due to the separate dynamics modeling and the subsequent planning algorithm, and as a…
Learning Rate (LR) is an important hyper-parameter to tune for effective training of deep neural networks (DNNs). Even for the baseline of a constant learning rate, it is non-trivial to choose a good constant value for training a DNN.…
Optimizing deep neural networks is largely thought to be an empirical process, requiring manual tuning of several hyper-parameters, such as learning rate, weight decay, and dropout rate. Arguably, the learning rate is the most important of…
Newton's method leverages curvature information to boost performance, and thus outperforms first-order methods for distributed learning problems. However, Newton's method is not practical in large-scale and heterogeneous learning…
The performance of a deep neural network is highly dependent on its training, and finding better local optimal solutions is the goal of many optimization algorithms. However, existing optimization algorithms show a preference for descent…
The increasing complexity of deep learning architectures is resulting in training time requiring weeks or even months. This slow training is due in part to vanishing gradients, in which the gradients used by back-propagation are extremely…
Federated learning systems have been identified as an efficient approach to scaling distributed model training with a large amount of participants or data owners while guaranteeing data privacy. To apply the current most popular pre-trained…
The concept of Label Distribution Learning (LDL) is a technique to stabilize classification and regression problems with ambiguous and/or imbalanced labels. A prototypical use-case of LDL is human age estimation based on profile images.…
Low-Rank Adaptation (LoRA) is the prevailing approach for efficient large language model (LLM) fine-tuning. Building on this paradigm, recent studies have proposed alternative initialization strategies, architectural modifications, and…