Related papers: Transformer-Based Learned Optimization

Narrowing the Focus: Learned Optimizers for Pretrained Models

In modern deep learning, the models are learned by applying gradient updates using an optimizer, which transforms the updates based on various statistics. Optimizers are often hand-designed and tuning their hyperparameters is a big part of…

Machine Learning · Computer Science 2024-10-08 Gus Kristiansen , Mark Sandler , Andrey Zhmoginov , Nolan Miller , Anirudh Goyal , Jihwan Lee , Max Vladymyrov

Tasks, stability, architecture, and compute: Training more effective learned optimizers, and using them to train themselves

Much as replacing hand-designed features with learned functions has revolutionized how we solve perceptual tasks, we believe learned algorithms will transform how we train models. In this work we focus on general-purpose learned optimizers…

Machine Learning · Computer Science 2020-09-24 Luke Metz , Niru Maheswaranathan , C. Daniel Freeman , Ben Poole , Jascha Sohl-Dickstein

Investigation into the Training Dynamics of Learned Optimizers

Optimization is an integral part of modern deep learning. Recently, the concept of learned optimizers has emerged as a way to accelerate this optimization process by replacing traditional, hand-crafted algorithms with meta-learned…

Machine Learning · Computer Science 2023-12-13 Jan Sobotka , Petr Šimánek , Daniel Vašata

Reverse engineering learned optimizers reveals known and novel mechanisms

Learned optimizers are algorithms that can themselves be trained to solve optimization problems. In contrast to baseline optimizers (such as momentum or Adam) that use simple update rules derived from theoretical principles, learned…

Machine Learning · Computer Science 2021-12-09 Niru Maheswaranathan , David Sussillo , Luke Metz , Ruoxi Sun , Jascha Sohl-Dickstein

Greedy Learning to Optimize with Convergence Guarantees

Learning to optimize is an approach that leverages training data to accelerate the solution of optimization problems. Many approaches use unrolling to parametrize the update step and learn optimal parameters. Although L2O has shown…

Optimization and Control · Mathematics 2025-07-15 Patrick Fahy , Mohammad Golbabaee , Matthias J. Ehrhardt

Learned Optimizers for Analytic Continuation

Traditional maximum entropy and sparsity-based algorithms for analytic continuation often suffer from the ill-posed kernel matrix or demand tremendous computation time for parameter tuning. Here we propose a neural network method by convex…

Machine Learning · Computer Science 2022-02-07 Dongchen Huang , Yi-feng Yang

Learning to Learn with Generative Models of Neural Network Checkpoints

We explore a data-driven approach for learning to optimize neural networks. We construct a dataset of neural network checkpoints and train a generative model on the parameters. In particular, our model is a conditional diffusion transformer…

Machine Learning · Computer Science 2022-09-27 William Peebles , Ilija Radosavovic , Tim Brooks , Alexei A. Efros , Jitendra Malik

From Learning to Optimize to Learning Optimization Algorithms

Towards designing learned optimization algorithms that are usable beyond their training setting, we identify key principles that classical algorithms obey, but have up to now, not been used for Learning to Optimize (L2O). Following these…

Machine Learning · Computer Science 2025-09-19 Camille Castera , Peter Ochs

Deep Optimized Priors for 3D Shape Modeling and Reconstruction

Many learning-based approaches have difficulty scaling to unseen data, as the generality of its learned prior is limited to the scale and variations of the training samples. This holds particularly true with 3D learning tasks, given the…

Computer Vision and Pattern Recognition · Computer Science 2020-12-15 Mingyue Yang , Yuxin Wen , Weikai Chen , Yongwei Chen , Kui Jia

Simple Linear Neuron Boosting

Given a differentiable network architecture and loss function, we revisit optimizing the network's neurons in function space using Boosted Backpropagation (Grubb & Bagnell, 2010), in contrast to optimizing in parameter space. From this…

Machine Learning · Computer Science 2025-02-04 Daniel Munoz

Learned Optimizers that Scale and Generalize

Learning to learn has emerged as an important direction for achieving artificial intelligence. Two of the primary barriers to its adoption are an inability to scale to larger problems and a limited ability to generalize to new tasks. We…

Machine Learning · Computer Science 2017-09-11 Olga Wichrowska , Niru Maheswaranathan , Matthew W. Hoffman , Sergio Gomez Colmenarejo , Misha Denil , Nando de Freitas , Jascha Sohl-Dickstein

Learning to Optimize Neural Nets

Learning to Optimize is a recently proposed framework for learning optimization algorithms using reinforcement learning. In this paper, we explore learning an optimization algorithm for training shallow neural nets. Such high-dimensional…

Machine Learning · Computer Science 2017-12-01 Ke Li , Jitendra Malik

Re-parameterizing Your Optimizers rather than Architectures

The well-designed structures in neural networks reflect the prior knowledge incorporated into the models. However, though different models have various priors, we are used to training them with model-agnostic optimizers such as SGD. In this…

Machine Learning · Computer Science 2023-02-10 Xiaohan Ding , Honghao Chen , Xiangyu Zhang , Kaiqi Huang , Jungong Han , Guiguang Ding

Understanding and correcting pathologies in the training of learned optimizers

Deep learning has shown that learned functions can dramatically outperform hand-designed functions on perceptual tasks. Analogously, this suggests that learned optimizers may similarly outperform current hand-designed optimizers, especially…

Neural and Evolutionary Computing · Computer Science 2019-06-11 Luke Metz , Niru Maheswaranathan , Jeremy Nixon , C. Daniel Freeman , Jascha Sohl-Dickstein

A Generalizable Approach to Learning Optimizers

A core issue with learning to optimize neural networks has been the lack of generalization to real world problems. To address this, we describe a system designed from a generalization-first perspective, learning to update optimizer…

Machine Learning · Computer Science 2021-06-09 Diogo Almeida , Clemens Winter , Jie Tang , Wojciech Zaremba

Neural Optimizer Search with Reinforcement Learning

We present an approach to automate the process of discovering optimization methods, with a focus on deep learning architectures. We train a Recurrent Neural Network controller to generate a string in a domain specific language that…

Artificial Intelligence · Computer Science 2017-09-25 Irwan Bello , Barret Zoph , Vijay Vasudevan , Quoc V. Le

Efficient Reconstruction of Neural Mass Dynamics Modeled by Linear-Threshold Networks

This paper studies the data-driven reconstruction of firing rate dynamics of brain activity described by linear-threshold network models. Identifying the system parameters directly leads to a large number of variables and a highly…

Systems and Control · Electrical Eng. & Systems 2023-08-29 Xuan Wang , Jorge Cortes

Practical tradeoffs between memory, compute, and performance in learned optimizers

Optimization plays a costly and crucial role in developing machine learning systems. In learned optimizers, the few hyperparameters of commonly used hand-designed optimizers, e.g. Adam or SGD, are replaced with flexible parametric…

Machine Learning · Computer Science 2022-07-19 Luke Metz , C. Daniel Freeman , James Harrison , Niru Maheswaranathan , Jascha Sohl-Dickstein

Learning Algorithm Hyperparameters for Fast Parametric Convex Optimization

We introduce a machine-learning framework to learn the hyperparameter sequence of first-order methods (e.g., the step sizes in gradient descent) to quickly solve parametric convex optimization problems. Our computational architecture…

Optimization and Control · Mathematics 2024-12-23 Rajiv Sambharya , Bartolomeo Stellato

Transformers learn to implement preconditioned gradient descent for in-context learning

Several recent works demonstrate that transformers can implement algorithms like gradient descent. By a careful construction of weights, these works show that multiple layers of transformers are expressive enough to simulate iterations of…

Machine Learning · Computer Science 2023-11-13 Kwangjun Ahn , Xiang Cheng , Hadi Daneshmand , Suvrit Sra