Related papers: Second-Order Sensitivity Analysis for Bilevel Opti…

Enhancing Hypergradients Estimation: A Study of Preconditioning and Reparameterization

Bilevel optimization aims to optimize an outer objective function that depends on the solution to an inner optimization problem. It is routinely used in Machine Learning, notably for hyperparameter tuning. The conventional method to compute…

Machine Learning · Computer Science 2024-02-27 Zhenzhang Ye , Gabriel Peyré , Daniel Cremers , Pierre Ablin

Oracle Complexity of Second-Order Methods for Finite-Sum Problems

Finite-sum optimization problems are ubiquitous in machine learning, and are commonly solved using first-order methods which rely on gradient computations. Recently, there has been growing interest in \emph{second-order} methods, which rely…

Optimization and Control · Mathematics 2017-03-09 Yossi Arjevani , Ohad Shamir

The bilinear Hessian for large scale optimization

Second order information is useful in many ways in smooth optimization problems, including for the design of step size rules and descent directions, or the analysis of the local properties of the objective functional. However, the…

Optimization and Control · Mathematics 2025-02-06 Marcus Carlsson , Viktor Nikitin , Erik Troedsson , Herwig Wendt

On the Convergence Theory for Hessian-Free Bilevel Algorithms

Bilevel optimization has arisen as a powerful tool in modern machine learning. However, due to the nested structure of bilevel optimization, even gradient-based methods require second-order derivative approximations via Jacobian- or/and…

Machine Learning · Computer Science 2022-06-07 Daouda Sow , Kaiyi Ji , Yingbin Liang

Sub-Sampled Newton Methods I: Globally Convergent Algorithms

Large scale optimization problems are ubiquitous in machine learning and data analysis and there is a plethora of algorithms for solving such problems. Many of these algorithms employ sub-sampling, as a way to either speed up the…

Optimization and Control · Mathematics 2016-02-29 Farbod Roosta-Khorasani , Michael W. Mahoney

Penalty Method for Inversion-Free Deep Bilevel Optimization

Solving a bilevel optimization problem is at the core of several machine learning problems such as hyperparameter tuning, data denoising, meta- and few-shot learning, and training-data poisoning. Different from simultaneous or…

Machine Learning · Computer Science 2021-10-07 Akshay Mehra , Jihun Hamm

Optimizing Millions of Hyperparameters by Implicit Differentiation

We propose an algorithm for inexpensive gradient-based hyperparameter optimization that combines the implicit function theorem (IFT) with efficient inverse Hessian approximations. We present results about the relationship between the IFT…

Machine Learning · Computer Science 2019-11-11 Jonathan Lorraine , Paul Vicol , David Duvenaud

Low-Order Explicit Hessian Imitation Method for Large-Scale Supervised Machine Learning

An algorithm is proposed for solving optimization problems arising in neural network training for supervised learning. The unique feature of the algorithm is the use of an auxiliary loss, in addition to the original loss employed for model…

Optimization and Control · Mathematics 2026-05-11 Yunlang Zhu , Lingjun Guo , Zahra Khatti , Xiaoyi Qu , Chia-Yuan Wu , Lara Zebiane , Frank E. Curtis

Second-Order Stochastic Optimization for Machine Learning in Linear Time

First-order stochastic methods are the state-of-the-art in large-scale machine learning optimization owing to efficient per-iteration complexity. Second-order methods, while able to provide faster convergence, have been much less explored…

Machine Learning · Statistics 2017-12-01 Naman Agarwal , Brian Bullins , Elad Hazan

Beyond backpropagation: bilevel optimization through implicit differentiation and equilibrium propagation

This paper reviews gradient-based techniques to solve bilevel optimization problems. Bilevel optimization is a general way to frame the learning of systems that are implicitly defined through a quantity that they minimize. This…

Machine Learning · Computer Science 2023-05-26 Nicolas Zucchet , João Sacramento

A Fully First-Order Layer for Differentiable Optimization

Differentiable optimization layers enable learning systems to make decisions by solving embedded optimization problems. However, computing gradients via implicit differentiation requires solving a linear system with Hessian terms, which is…

Machine Learning · Computer Science 2025-12-03 Zihao Zhao , Kai-Chia Mo , Shing-Hei Ho , Brandon Amos , Kai Wang

Gathering and Exploiting Higher-Order Information when Training Large Structured Models

When training large models, such as neural networks, the full derivatives of order 2 and beyond are usually inaccessible, due to their computational cost. Therefore, among the second-order optimization methods, it is common to bypass the…

Machine Learning · Computer Science 2025-10-01 Pierre Wolinski

Bilevel learning

Bilevel learning refers to machine learning problems that can be formulated as bilevel optimization models, where decisions are organized in a hierarchical structure. This paradigm has recently gained considerable attention in machine…

Optimization and Control · Mathematics 2026-05-05 Riccardo Grazzi , Massimiliano Pontil , Saverio Salzo , Alain Zemkoho

Achieving optimal complexity guarantees for a class of bilevel convex optimization problems

We design and analyze a novel accelerated gradient-based algorithm for a class of bilevel optimization problems. These problems have various applications arising from machine learning and image processing, where optimal solutions of the two…

Optimization and Control · Mathematics 2023-11-20 Sepideh Samadi , Daniel Burbano , Farzad Yousefian

A Fully First-Order Method for Stochastic Bilevel Optimization

We consider stochastic unconstrained bilevel optimization problems when only the first-order gradient oracles are available. While numerous optimization methods have been proposed for tackling bilevel problems, existing methods either tend…

Optimization and Control · Mathematics 2023-01-27 Jeongyeol Kwon , Dohyun Kwon , Stephen Wright , Robert Nowak

BOME! Bilevel Optimization Made Easy: A Simple First-Order Approach

Bilevel optimization (BO) is useful for solving a variety of important machine learning problems including but not limited to hyperparameter optimization, meta-learning, continual learning, and reinforcement learning. Conventional BO…

Machine Learning · Computer Science 2022-09-20 Mao Ye , Bo Liu , Stephen Wright , Peter Stone , Qiang Liu

Bilevel Learning via Inexact Stochastic Gradient Descent

Bilevel optimization is a central tool in machine learning for high-dimensional hyperparameter tuning. Its applications are vast; for instance, in imaging it can be used for learning data-adaptive regularizers and optimizing forward…

Optimization and Control · Mathematics 2025-11-11 Mohammad Sadegh Salehi , Subhadip Mukherjee , Lindon Roberts , Matthias J. Ehrhardt

A sensitivity-based method for bilevel optimization problems: Theoretical analysis and computational performance

Bilevel optimization provides a powerful framework for modelling hierarchical decision-making systems. This work presents a sensitivity-based algorithm that addresses the bilevel structure directly by treating the lower-level optimal…

Optimization and Control · Mathematics 2026-05-28 Eduardo Nolasco , Ross D. King , Vassilios S. Vassiliadis

A Doubly Stochastically Perturbed Algorithm for Linearly Constrained Bilevel Optimization

In this work, we develop analysis and algorithms for a class of (stochastic) bilevel optimization problems whose lower-level (LL) problem is strongly convex and linearly constrained. Most existing approaches for solving such problems rely…

Optimization and Control · Mathematics 2025-04-08 Prashant Khanduri , Ioannis Tsaknakis , Yihua Zhang , Sijia Liu , Mingyi Hong

Implicit differentiation with second-order derivatives and benchmarks in finite-element-based differentiable physics

Differentiable programming is revolutionizing computational science by enabling automatic differentiation (AD) of numerical simulations. While first-order gradients are well-established, second-order derivatives (Hessians) for implicit…

Computational Engineering, Finance, and Science · Computer Science 2025-05-20 Tianju Xue