English
Related papers

Related papers: Scaling Laws for Hyperparameter Optimization

200 papers

Deep learning techniques play an increasingly important role in industrial and research environments due to their outstanding results. However, the large number of hyper-parameters to be set may lead to errors if they are set manually. The…

Machine Learning · Computer Science 2020-06-04 Michele Fraccaroli , Evelina Lamma , Fabrizio Riguzzi

The impressive capabilities of Large Language Models (LLMs) across diverse tasks are now well established, yet their effective deployment necessitates careful hyperparameter optimization. Although existing methods have explored the…

Neural scaling laws define a predictable relationship between a model's parameter count and its performance after training in the form of a power law. However, most research to date has not explicitly investigated whether scaling laws can…

Computation and Language · Computer Science 2022-10-19 Maor Ivgi , Yair Carmon , Jonathan Berant

Since deep neural networks were developed, they have made huge contributions to everyday lives. Machine learning provides more rational advice than humans are capable of in almost every aspect of daily life. However, despite this…

Machine Learning · Computer Science 2020-03-13 Tong Yu , Hong Zhu

Machine learning algorithms have been used widely in various applications and areas. To fit a machine learning model into different problems, its hyper-parameters must be tuned. Selecting the best hyper-parameter configuration for machine…

Machine Learning · Computer Science 2022-10-06 Li Yang , Abdallah Shami

Most machine learning algorithms are configured by one or several hyperparameters that must be carefully chosen and often considerably impact performance. To avoid a time consuming and unreproducible manual trial-and-error process to find…

In recent years, the state-of-the-art in deep learning has been dominated by very large models that have been pre-trained on vast amounts of data. The paradigm is very simple: investing more computational resources (optimally) leads to…

Machine Learning · Computer Science 2024-05-24 Sotiris Anagnostidis , Gregor Bachmann , Imanol Schlag , Thomas Hofmann

Large Language Models have driven significant AI advancements, yet their training is resource-intensive and highly sensitive to hyper-parameter selection. While scaling laws provide valuable guidance on model size and data requirements,…

Machine Learning · Computer Science 2026-05-21 Xingyu Xie , Kuangyu Ding , Shuicheng Yan , Kim-Chuan Toh , Tianwen Wei

Hyperparameter transfer has become an important component of modern large-scale training recipes. Existing methods, such as muP, primarily focus on transfer between model sizes, with transfer across batch sizes and training horizons often…

Machine Learning · Computer Science 2026-03-18 Egor Shulgin , Dimitri von Rütte , Tianyue H. Zhang , Niccolò Ajroldi , Bernhard Schölkopf , Antonio Orvieto

Reinforcement learning (RL) applications, where an agent can simply learn optimal behaviors by interacting with the environment, are quickly gaining tremendous success in a wide variety of applications from controlling simple pendulums to…

Machine Learning · Computer Science 2022-01-28 Mariam Kiran , Melis Ozyildirim

We present two novel hyperparameter optimization strategies for optimization of deep learning models with a modular architecture constructed of multiple subnetworks. As complex networks with multiple subnetworks become more frequently…

Machine Learning · Computer Science 2022-02-25 Alex H. Treacher , Albert Montillo

This paper explores the use of foundational large language models (LLMs) in hyperparameter optimization (HPO). Hyperparameters are critical in determining the effectiveness of machine learning models, yet their optimization often relies on…

Machine Learning · Computer Science 2024-11-12 Michael R. Zhang , Nishkrit Desai , Juhan Bae , Jonathan Lorraine , Jimmy Ba

Model-based Reinforcement Learning (MBRL) is a promising framework for learning control in a data-efficient manner. MBRL algorithms can be fairly complex due to the separate dynamics modeling and the subsequent planning algorithm, and as a…

Machine Learning · Computer Science 2021-03-01 Baohe Zhang , Raghu Rajan , Luis Pineda , Nathan Lambert , André Biedenkapp , Kurtland Chua , Frank Hutter , Roberto Calandra

Scaling laws have emerged as important components of large language model (LLM) training as they can predict performance gains through scale, and provide guidance on important hyper-parameter choices that would otherwise be expensive. LLMs…

Hyperparameter optimization is both a practical issue and an interesting theoretical problem in training of deep architectures. Despite many recent advances the most commonly used methods almost universally involve training multiple and…

Machine Learning · Computer Science 2019-09-10 Vlad Pushkarov , Jonathan Efroni , Mykola Maksymenko , Maciej Koch-Janusz

In order to improve reproducibility, deep reinforcement learning (RL) has been adopting better scientific practices such as standardized evaluation metrics and reporting. However, the process of hyperparameter optimization still varies…

Machine Learning · Computer Science 2023-06-05 Theresa Eimer , Marius Lindauer , Roberta Raileanu

As a result of the ever increasing complexity of configuring and fine-tuning machine learning models, the field of automated machine learning (AutoML) has emerged over the past decade. However, software implementations like Auto-WEKA and…

Machine Learning · Computer Science 2022-11-09 Dimitrios Iliadis , Marcel Wever , Bernard De Baets , Willem Waegeman

Neural scaling laws have revolutionized the design and optimization of large-scale AI models by revealing predictable relationships between model size, dataset volume, and computational resources. Early research established power-law…

Computation and Language · Computer Science 2025-05-28 Ayan Sengupta , Yash Goel , Tanmoy Chakraborty

Predicting material properties is crucial for designing better batteries, semiconductors, and medical devices. Deep learning helps scientists quickly find promising materials by predicting their energy, forces, and stresses. Companies scale…

Machine Learning · Computer Science 2025-09-29 Akshay Trikha , Kyle Chu , Advait Gosai , Parker Szachta , Eric Weiner

The remarkable progress in deep learning in recent years is largely driven by improvements in scale, where bigger models are trained on larger datasets for longer schedules. To predict the benefit of scale empirically, we argue for a more…

Machine Learning · Computer Science 2022-11-02 Ibrahim Alabdulmohsin , Behnam Neyshabur , Xiaohua Zhai
‹ Prev 1 2 3 10 Next ›