Related papers: Stage-based Hyper-parameter Optimization for Deep …
Hyper-parameter optimization is crucial for pushing the accuracy of a deep learning model to its limits. A hyper-parameter optimization job, referred to as a study, involves numerous trials of training a model using different training…
Hyperparameter selection in continual learning scenarios is a challenging and underexplored aspect, especially in practical non-stationary environments. Traditional approaches, such as grid searches with held-out validation data from all…
In this work, we propose a multi-stage training strategy for the development of deep learning algorithms applied to problems with multiscale features. Each stage of the pro-posed strategy shares an (almost) identical network structure and…
In this paper, we present a cross-entropy optimization method for hyperparameter optimization in stochastic gradient-based approaches to train deep neural networks. The value of a hyperparameter of a learning algorithm often has great…
Most machine learning algorithms are configured by one or several hyperparameters that must be carefully chosen and often considerably impact performance. To avoid a time consuming and unreproducible manual trial-and-error process to find…
Hyperparameter optimization (HPO) is a critical component of machine learning pipelines, significantly affecting model robustness, stability, and generalization. However, HPO is often a time-consuming and computationally intensive task.…
Hyperparameter optimization (HPO) plays a central role in the automated machine learning (AutoML). It is a challenging task as the response surfaces of hyperparameters are generally unknown, hence essentially a global optimization problem.…
Machine learning algorithms have been used widely in various applications and areas. To fit a machine learning model into different problems, its hyper-parameters must be tuned. Selecting the best hyper-parameter configuration for machine…
A sequential training method for large-scale feedforward neural networks is presented. Each layer of the neural network is decoupled and trained separately. After the training is completed for each layer, they are combined together. The…
It is typical for a machine learning system to have numerous hyperparameters that affect its learning rate and prediction quality. Finding a good combination of the hyperparameters is, however, a challenging job. This is mainly because…
Most learning algorithms require the practitioner to manually set the values of many hyperparameters before the learning process can begin. However, with modern algorithms, the evaluation of a given hyperparameter setting can take a…
We give a simple, fast algorithm for hyperparameter optimization inspired by techniques from the analysis of Boolean functions. We focus on the high-dimensional regime where the canonical example is training a neural network with a large…
Hyperparameter optimization is both a practical issue and an interesting theoretical problem in training of deep architectures. Despite many recent advances the most commonly used methods almost universally involve training multiple and…
Hyperparameter optimization (HPO) is an important step in machine learning (ML) model development, but common practices are archaic -- primarily relying on manual or grid searches. This is partly because adopting advanced HPO algorithms…
Recent trends towards training ever-larger language models have substantially improved machine learning performance across linguistic tasks. However, the huge cost of training larger models can make tuning them prohibitively expensive,…
We present two novel hyperparameter optimization strategies for optimization of deep learning models with a modular architecture constructed of multiple subnetworks. As complex networks with multiple subnetworks become more frequently…
Multilevel optimization has gained renewed interest in machine learning due to its promise in applications such as hyperparameter tuning and continual learning. However, existing methods struggle with the inherent difficulty of efficiently…
Deploying deep learning (DL) models across multiple compute devices to train large and complex models continues to grow in importance because of the demand for faster and more frequent training. Data parallelism (DP) is the most widely used…
Since deep neural networks were developed, they have made huge contributions to everyday lives. Machine learning provides more rational advice than humans are capable of in almost every aspect of daily life. However, despite this…
Deep learning models are yielding increasingly better performances thanks to multiple factors. To be successful, model may have large number of parameters or complex architectures and be trained on large dataset. This leads to large…