Related papers: Efficient Bilevel Optimization with KFAC-Based Hyp…

Efficient Curvature-Aware Hypergradient Approximation for Bilevel Optimization

Bilevel optimization is a powerful tool for many machine learning problems, such as hyperparameter optimization and meta-learning. Estimating hypergradients (also known as implicit gradients) is crucial for developing gradient-based methods…

Optimization and Control · Mathematics 2025-05-06 Youran Dong , Junfeng Yang , Wei Yao , Jin Zhang

Memory-Efficient Gradient Unrolling for Large-Scale Bi-level Optimization

Bi-level optimization (BO) has become a fundamental mathematical framework for addressing hierarchical machine learning problems. As deep learning models continue to grow in size, the demand for scalable bi-level optimization solutions has…

Machine Learning · Computer Science 2024-12-25 Qianli Shen , Yezhen Wang , Zhouhao Yang , Xiang Li , Haonan Wang , Yang Zhang , Jonathan Scarlett , Zhanxing Zhu , Kenji Kawaguchi

BOME! Bilevel Optimization Made Easy: A Simple First-Order Approach

Bilevel optimization (BO) is useful for solving a variety of important machine learning problems including but not limited to hyperparameter optimization, meta-learning, continual learning, and reinforcement learning. Conventional BO…

Machine Learning · Computer Science 2022-09-20 Mao Ye , Bo Liu , Stephen Wright , Peter Stone , Qiang Liu

Gradient Descent on Neurons and its Link to Approximate Second-Order Optimization

Second-order optimizers are thought to hold the potential to speed up neural network training, but due to the enormous size of the curvature matrix, they typically require approximations to be computationally tractable. The most successful…

Machine Learning · Computer Science 2022-06-13 Frederik Benzing

qNBO: quasi-Newton Meets Bilevel Optimization

Bilevel optimization, addressing challenges in hierarchical learning tasks, has gained significant interest in machine learning. The practical implementation of the gradient descent method to bilevel optimization encounters computational…

Machine Learning · Computer Science 2025-02-04 Sheng Fang , Yong-Jin Liu , Wei Yao , Chengming Yu , Jin Zhang

Optimizing Neural Networks with Kronecker-factored Approximate Curvature

We propose an efficient method for approximating natural gradient descent in neural networks which we call Kronecker-Factored Approximate Curvature (K-FAC). K-FAC is based on an efficiently invertible approximation of a neural network's…

Machine Learning · Computer Science 2020-06-09 James Martens , Roger Grosse

Scalable Thermodynamic Second-order Optimization

Many hardware proposals have aimed to accelerate inference in AI workloads. Less attention has been paid to hardware acceleration of training, despite the enormous societal impact of rapid training of AI models. Physics-based computers,…

Emerging Technologies · Computer Science 2025-02-13 Kaelan Donatella , Samuel Duffield , Denis Melanson , Maxwell Aifer , Phoebe Klett , Rajath Salegame , Zach Belateche , Gavin Crooks , Antonio J. Martinez , Patrick J. Coles

A Trace-restricted Kronecker-Factored Approximation to Natural Gradient

Second-order optimization methods have the ability to accelerate convergence by modifying the gradient through the curvature matrix. There have been many attempts to use second-order optimization methods for training deep neural networks.…

Machine Learning · Computer Science 2020-11-24 Kai-Xin Gao , Xiao-Lei Liu , Zheng-Hai Huang , Min Wang , Zidong Wang , Dachuan Xu , Fan Yu

MAC: An Efficient Gradient Preconditioning using Mean Activation Approximated Curvature

Second-order optimization methods for training neural networks, such as KFAC, exhibit superior convergence by utilizing curvature information of loss landscape. However, it comes at the expense of high computational burden. In this work, we…

Machine Learning · Computer Science 2025-11-12 Hyunseok Seung , Jaewoo Lee , Hyunsuk Ko

UFO-BLO: Unbiased First-Order Bilevel Optimization

Bilevel optimization (BLO) is a popular approach with many applications including hyperparameter optimization, neural architecture search, adversarial robustness and model-agnostic meta-learning. However, the approach suffers from time and…

Machine Learning · Computer Science 2021-06-08 Valerii Likhosherstov , Xingyou Song , Krzysztof Choromanski , Jared Davis , Adrian Weller

Two-Level K-FAC Preconditioning for Deep Learning

In the context of deep learning, many optimization methods use gradient covariance information in order to accelerate the convergence of Stochastic Gradient Descent. In particular, starting with Adagrad, a seemingly endless line of research…

Machine Learning · Computer Science 2020-12-08 Nikolaos Tselepidis , Jonas Kohler , Antonio Orvieto

Learning Theory for Kernel Bilevel Optimization

Bilevel optimization has emerged as a technique for addressing a wide range of machine learning problems that involve an outer objective implicitly determined by the minimizer of an inner problem. While prior works have primarily focused on…

Machine Learning · Computer Science 2025-11-18 Fares El Khoury , Edouard Pauwels , Samuel Vaiter , Michael Arbel

Communication-Efficient Federated Bilevel Optimization with Local and Global Lower Level Problems

Bilevel Optimization has witnessed notable progress recently with new emerging efficient algorithms. However, its application in the Federated Learning setting remains relatively underexplored, and the impact of Federated Learning's…

Machine Learning · Computer Science 2024-02-28 Junyi Li , Feihu Huang , Heng Huang

A New Way: Kronecker-Factored Approximate Curvature Deep Hedging and its Benefits

This paper advances the computational efficiency of Deep Hedging frameworks through the novel integration of Kronecker-Factored Approximate Curvature (K-FAC) optimization. While recent literature has established Deep Hedging as a…

Statistical Finance · Quantitative Finance 2024-11-25 Tsogt-Ochir Enkhbayar

Bilevel Optimization under Unbounded Smoothness: A New Algorithm and Convergence Analysis

Bilevel optimization is an important formulation for many machine learning problems. Current bilevel optimization algorithms assume that the gradient of the upper-level function is Lipschitz. However, recent studies reveal that certain…

Machine Learning · Computer Science 2024-01-19 Jie Hao , Xiaochuan Gong , Mingrui Liu

Implicit Bilevel Optimization: Differentiating through Bilevel Optimization Programming

Bilevel Optimization Programming is used to model complex and conflicting interactions between agents, for example in Robust AI or Privacy-preserving AI. Integrating bilevel mathematical programming within deep learning is thus an essential…

Machine Learning · Computer Science 2023-03-01 Francesco Alesiani

Bilevel Learning via Inexact Stochastic Gradient Descent

Bilevel optimization is a central tool in machine learning for high-dimensional hyperparameter tuning. Its applications are vast; for instance, in imaging it can be used for learning data-adaptive regularizers and optimizing forward…

Optimization and Control · Mathematics 2025-11-11 Mohammad Sadegh Salehi , Subhadip Mukherjee , Lindon Roberts , Matthias J. Ehrhardt

LancBiO: dynamic Lanczos-aided bilevel optimization via Krylov subspace

Bilevel optimization, with broad applications in machine learning, has an intricate hierarchical structure. Gradient-based methods have emerged as a common approach to large-scale bilevel problems. However, the computation of the…

Optimization and Control · Mathematics 2025-02-27 Yan Yang , Bin Gao , Ya-xiang Yuan

Double Momentum Method for Lower-Level Constrained Bilevel Optimization

Bilevel optimization (BO) has recently gained prominence in many machine learning applications due to its ability to capture the nested structure inherent in these problems. Recently, many hypergradient methods have been proposed as…

Optimization and Control · Mathematics 2024-09-04 Wanli Shi , Yi Chang , Bin Gu

Riemannian Bilevel Optimization with Gradient Aggregation

Bilevel optimization (BLO) offers a principled framework for hierarchical decision-making and has been widely applied in machine learning tasks such as hyperparameter optimization and meta-learning. While existing BLO methods are mostly…

Optimization and Control · Mathematics 2025-10-20 Zhuo Chen , Xinjian Xu , Shihui Ying , Tieyong Zeng