English
Related papers

Related papers: A Hessian-informed hyperparameter optimization for…

200 papers

Learning rate configuration is a fundamental aspect of modern deep learning. The prevailing practice of applying a uniform learning rate across all layers overlooks the structural heterogeneity of Transformers, potentially limiting their…

Machine Learning · Computer Science 2026-05-28 Di He , Songjun Tu , Keyu Wang , Lu Yin , Shiwei Liu

Differential privacy (DP) is a privacy-preserving paradigm that protects the training data when training deep learning models. Critically, the performance of models is determined by the training hyperparameters, especially those of the…

Machine Learning · Computer Science 2025-03-04 Zhiqi Bu , Ruixuan Liu

This paper introduces and evaluates a novel training method for neural networks: Dual Variable Learning Rates (DVLR). Building on insights from behavioral psychology, the dual learning rates are used to emphasize correct and incorrect…

Machine Learning · Computer Science 2021-02-11 Elizabeth Liner , Risto Miikkulainen

High Power Laser's (HPL) optimal performance is essential for the success of a wide variety of experimental tasks related to light-matter interactions. Traditionally, HPL parameters are optimised in an automated fashion relying on black-box…

Hyperparameter tuning is a bothersome step in the training of deep learning models. One of the most sensitive hyperparameters is the learning rate of the gradient descent. We present the 'All Learning Rates At Once' (Alrao) optimization…

Machine Learning · Computer Science 2019-01-30 Léonard Blier , Pierre Wolinski , Yann Ollivier

Deep Reinforcement Learning (DRL) methods often rely on the meticulous tuning of hyperparameters to successfully resolve problems. One of the most influential parameters in optimization procedures based on stochastic gradient descent (SGD)…

Machine Learning · Computer Science 2020-08-05 Ralf Gulde , Marc Tuscher , Akos Csiszar , Oliver Riedel , Alexander Verl

In the usual deep neural network optimization process, the learning rate is the most important hyper parameter, which greatly affects the final convergence effect. The purpose of learning rate is to control the stepsize and gradually reduce…

Machine Learning · Computer Science 2019-05-02 Jiakai Wei

The learning rate is one of the most important hyperparameters in deep learning, and how to control it is an active area within both AutoML and deep learning research. Approaches for learning rate control span from classic optimization to…

Machine Learning · Computer Science 2025-07-03 Micha Henheik , Theresa Eimer , Marius Lindauer

Learning rate adaptation is a popular topic in machine learning. Gradient Descent trains neural nerwork with a fixed learning rate. Learning rate adaptation is proposed to accelerate the training process through adjusting the step size in…

Machine Learning · Computer Science 2022-10-20 Bozhou Chen , Hongzhi Wang , Chenmin Ba

Training neural networks can be challenging, especially as the complexity of the problem increases. Despite using wider or deeper networks, training them can be a tedious process, especially if a wrong choice of the hyperparameter is made.…

Computational Engineering, Finance, and Science · Computer Science 2025-07-30 D. Veerababu , Ashwin A. Raikar , Prasanta K. Ghosh

In this work, we devise a new, general-purpose reinforcement learning strategy for the optimal control of parametric dynamical systems. Such problems frequently arise in applied sciences and engineering and entail a significant complexity…

Machine Learning · Computer Science 2026-02-12 Nicolò Botteghi , Stefania Fresca , Mengwu Guo , Andrea Manzoni

Model-based Reinforcement Learning (MBRL) is a promising framework for learning control in a data-efficient manner. MBRL algorithms can be fairly complex due to the separate dynamics modeling and the subsequent planning algorithm, and as a…

Machine Learning · Computer Science 2021-03-01 Baohe Zhang , Raghu Rajan , Luis Pineda , Nathan Lambert , André Biedenkapp , Kurtland Chua , Frank Hutter , Roberto Calandra

Learning Rate (LR) is an important hyper-parameter to tune for effective training of deep neural networks (DNNs). Even for the baseline of a constant learning rate, it is non-trivial to choose a good constant value for training a DNN.…

Machine Learning · Computer Science 2019-10-29 Yanzhao Wu , Ling Liu , Juhyun Bae , Ka-Ho Chow , Arun Iyengar , Calton Pu , Wenqi Wei , Lei Yu , Qi Zhang

Optimizing deep neural networks is largely thought to be an empirical process, requiring manual tuning of several hyper-parameters, such as learning rate, weight decay, and dropout rate. Arguably, the learning rate is the most important of…

Machine Learning · Computer Science 2020-08-04 Rahul Yedida , Snehanshu Saha , Tejas Prashanth

Newton's method leverages curvature information to boost performance, and thus outperforms first-order methods for distributed learning problems. However, Newton's method is not practical in large-scale and heterogeneous learning…

Machine Learning · Computer Science 2024-12-18 Shuzhen Chen , Yuan Yuan , Youming Tao , Tianzhu Wang , Zhipeng Cai , Dongxiao Yu

The performance of a deep neural network is highly dependent on its training, and finding better local optimal solutions is the goal of many optimization algorithms. However, existing optimization algorithms show a preference for descent…

Computer Vision and Pattern Recognition · Computer Science 2019-12-06 Huangxing Lin , Weihong Zeng , Xinghao Ding , Yue Huang , Chenxi Huang , John Paisley

The increasing complexity of deep learning architectures is resulting in training time requiring weeks or even months. This slow training is due in part to vanishing gradients, in which the gradients used by back-propagation are extremely…

Computer Vision and Pattern Recognition · Computer Science 2015-10-16 Bharat Singh , Soham De , Yangmuzi Zhang , Thomas Goldstein , Gavin Taylor

Federated learning systems have been identified as an efficient approach to scaling distributed model training with a large amount of participants or data owners while guaranteeing data privacy. To apply the current most popular pre-trained…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-03-04 Qianli Liu , Zhaorui Zhang , Xin Yao , Benben Liu

The concept of Label Distribution Learning (LDL) is a technique to stabilize classification and regression problems with ambiguous and/or imbalanced labels. A prototypical use-case of LDL is human age estimation based on profile images.…

Machine Learning · Computer Science 2022-09-07 Maurice Günder , Nico Piatkowski , Christian Bauckhage

Low-Rank Adaptation (LoRA) is the prevailing approach for efficient large language model (LLM) fine-tuning. Building on this paradigm, recent studies have proposed alternative initialization strategies, architectural modifications, and…

Machine Learning · Computer Science 2026-05-20 Yu-Ang Lee , Ching-Yun Ko , Pin-Yu Chen , Mi-Yen Yeh
‹ Prev 1 2 3 10 Next ›