Related papers: Learning Regularizers: Learning Optimizers that ca…

$\mu$LO: Compute-Efficient Meta-Generalization of Learned Optimizers

Learned optimizers (LOs) have the potential to significantly reduce the wall-clock training time of neural networks. However, they can struggle to optimize unseen tasks (meta-generalize), especially when training networks wider than those…

Machine Learning · Computer Science 2026-03-20 Benjamin Thérien , Charles-Étienne Joseph , Boris Knyazev , Edouard Oyallon , Irina Rish , Eugene Belilovsky

Narrowing the Focus: Learned Optimizers for Pretrained Models

In modern deep learning, the models are learned by applying gradient updates using an optimizer, which transforms the updates based on various statistics. Optimizers are often hand-designed and tuning their hyperparameters is a big part of…

Machine Learning · Computer Science 2024-10-08 Gus Kristiansen , Mark Sandler , Andrey Zhmoginov , Nolan Miller , Anirudh Goyal , Jihwan Lee , Max Vladymyrov

LABO: Towards Learning Optimal Label Regularization via Bi-level Optimization

Regularization techniques are crucial to improving the generalization performance and training efficiency of deep neural networks. Many deep learning algorithms rely on weight decay, dropout, batch/layer normalization to converge faster and…

Machine Learning · Computer Science 2025-05-23 Peng Lu , Ahmad Rashid , Ivan Kobyzev , Mehdi Rezagholizadeh , Philippe Langlais

Can Learned Optimization Make Reinforcement Learning Less Difficult?

While reinforcement learning (RL) holds great potential for decision making in the real world, it suffers from a number of unique difficulties which often need specific consideration. In particular: it is highly non-stationary; suffers from…

Machine Learning · Computer Science 2025-04-16 Alexander David Goldie , Chris Lu , Matthew Thomas Jackson , Shimon Whiteson , Jakob Nicolaus Foerster

Reverse engineering learned optimizers reveals known and novel mechanisms

Learned optimizers are algorithms that can themselves be trained to solve optimization problems. In contrast to baseline optimizers (such as momentum or Adam) that use simple update rules derived from theoretical principles, learned…

Machine Learning · Computer Science 2021-12-09 Niru Maheswaranathan , David Sussillo , Luke Metz , Ruoxi Sun , Jascha Sohl-Dickstein

Learning Optimal Linear Regularizers

We present algorithms for efficiently learning regularizers that improve generalization. Our approach is based on the insight that regularizers can be viewed as upper bounds on the generalization gap, and that reducing the slack in the…

Machine Learning · Computer Science 2019-02-25 Matthew Streeter

A Closer Look at Learned Optimization: Stability, Robustness, and Inductive Biases

Learned optimizers -- neural networks that are trained to act as optimizers -- have the potential to dramatically accelerate training of machine learning models. However, even when meta-trained across thousands of tasks at huge…

Machine Learning · Computer Science 2022-09-23 James Harrison , Luke Metz , Jascha Sohl-Dickstein

Learning to Generalize Provably in Learning to Optimize

Learning to optimize (L2O) has gained increasing popularity, which automates the design of optimizers by data-driven approaches. However, current L2O methods often suffer from poor generalization performance in at least two folds: (i)…

Machine Learning · Computer Science 2023-03-29 Junjie Yang , Tianlong Chen , Mingkang Zhu , Fengxiang He , Dacheng Tao , Yingbin Liang , Zhangyang Wang

Self-Regularized Learning Methods

We introduce a general framework for analyzing learning algorithms based on the notion of self-regularization, which captures implicit complexity control without requiring explicit regularization. This is motivated by previous observations…

Machine Learning · Statistics 2026-03-19 Max Schölpple , Liu Fanghui , Ingo Steinwart

Tasks, stability, architecture, and compute: Training more effective learned optimizers, and using them to train themselves

Much as replacing hand-designed features with learned functions has revolutionized how we solve perceptual tasks, we believe learned algorithms will transform how we train models. In this work we focus on general-purpose learned optimizers…

Machine Learning · Computer Science 2020-09-24 Luke Metz , Niru Maheswaranathan , C. Daniel Freeman , Ben Poole , Jascha Sohl-Dickstein

Investigation into the Training Dynamics of Learned Optimizers

Optimization is an integral part of modern deep learning. Recently, the concept of learned optimizers has emerged as a way to accelerate this optimization process by replacing traditional, hand-crafted algorithms with meta-learned…

Machine Learning · Computer Science 2023-12-13 Jan Sobotka , Petr Šimánek , Daniel Vašata

Efficient Learning with a Family of Nonconvex Regularizers by Redistributing Nonconvexity

The use of convex regularizers allows for easy optimization, though they often produce biased estimation and inferior prediction performance. Recently, nonconvex regularizers have attracted a lot of attention and outperformed convex ones.…

Optimization and Control · Mathematics 2017-02-14 Quanming Yao , James. T Kwok

Regularization Matters in Policy Optimization

Deep Reinforcement Learning (Deep RL) has been receiving increasingly more attention thanks to its encouraging performance on a variety of control tasks. Yet, conventional regularization techniques in training neural networks (e.g., $L_2$…

Machine Learning · Computer Science 2021-11-30 Zhuang Liu , Xuanlin Li , Bingyi Kang , Trevor Darrell

An Analysis of Regularized Approaches for Constrained Machine Learning

Regularization-based approaches for injecting constraints in Machine Learning (ML) were introduced to improve a predictive model via expert knowledge. We tackle the issue of finding the right balance between the loss (the accuracy of the…

Machine Learning · Computer Science 2020-05-22 Michele Lombardi , Federico Baldo , Andrea Borghesi , Michela Milano

Learning Games and Rademacher Observations Losses

It has recently been shown that supervised learning with the popular logistic loss is equivalent to optimizing the exponential loss over sufficient statistics about the class: Rademacher observations (rados). We first show that this…

Machine Learning · Computer Science 2016-02-16 Richard Nock

Self-Paced Learning: an Implicit Regularization Perspective

Self-paced learning (SPL) mimics the cognitive mechanism of humans and animals that gradually learns from easy to hard samples. One key issue in SPL is to obtain better weighting strategy that is determined by minimizer function. Existing…

Machine Learning · Computer Science 2016-09-20 Yanbo Fan , Ran He , Jian Liang , Bao-Gang Hu

Learning Gradient Descent: Better Generalization and Longer Horizons

Training deep neural networks is a highly nontrivial task, involving carefully selecting appropriate training algorithms, scheduling step sizes and tuning other hyperparameters. Trying different combinations can be quite labor-intensive and…

Machine Learning · Computer Science 2017-06-13 Kaifeng Lv , Shunhua Jiang , Jian Li

Learning to Optimize for Reinforcement Learning

In recent years, by leveraging more data, computation, and diverse tasks, learned optimizers have achieved remarkable success in supervised learning, outperforming classical hand-designed optimizers. Reinforcement learning (RL) is…

Machine Learning · Computer Science 2024-06-05 Qingfeng Lan , A. Rupam Mahmood , Shuicheng Yan , Zhongwen Xu

Learning Filter Functions in Regularisers by Minimising Quotients

Learning approaches have recently become very popular in the field of inverse problems. A large variety of methods has been established in recent years, ranging from bi-level learning to high-dimensional machine learning techniques. Most…

Optimization and Control · Mathematics 2017-04-05 Martin Benning , Guy Gilboa , Joana Sarah Grah , Carola-Bibiane Schönlieb

Conflicting Biases at the Edge of Stability: Norm versus Sharpness Regularization

A widely believed explanation for the remarkable generalization capacities of overparameterized neural networks is that the optimization algorithms used for training induce an implicit bias towards benign solutions. To grasp this…

Machine Learning · Computer Science 2025-12-19 Maria Matveev , Vit Fojtik , Hung-Hsu Chou , Gitta Kutyniok , Johannes Maly