Related papers: Alternating Differentiation for Optimization Layer…
Recent advances in neural-network architecture allow for seamless integration of convex optimization problems as differentiable layers in an end-to-end trainable neural network. Integrating medium and large scale quadratic programs into a…
This article describes a novel optimization solution framework, called alternating gradient descent (GD) and minimization (AltGDmin), that is useful for many problems for which alternating minimization (AltMin) is a popular solution. AltMin…
Quantization of the parameters of machine learning models, such as deep neural networks, requires solving constrained optimization problems, where the constraint set is formed by the Cartesian product of many simple discrete sets. For such…
This paper proposes a new method for differentiating through optimal trajectories arising from non-convex, constrained discrete-time optimal control (COC) problems using the implicit function theorem (IFT). Previous works solve a…
Automatic differentiation (autodiff) has revolutionized machine learning. It allows to express complex computations by composing elementary ones in creative ways and removes the burden of computing their derivatives by hand. More recently,…
We study a class of structured convex optimization problems, which have a two-block separable objective and nonlinear functional constraints as well as affine constraints that couple the two block variables. Such problems naturally arise…
The Gradient Descent-Ascent (GDA) algorithm, designed to solve minimax optimization problems, takes the descent and ascent steps either simultaneously (Sim-GDA) or alternately (Alt-GDA). While Alt-GDA is commonly observed to converge…
As a well-known optimization framework, the Alternating Direction Method of Multipliers (ADMM) has achieved tremendous success in many classification and regression applications. Recently, it has attracted the attention of deep learning…
This paper proposes a provably convergent multiblock ADMM for nonconvex optimization with nonlinear dynamics constraints, overcoming the divergence issue in classical extensions. We consider a class of optimization problems that arise from…
Finding the optimal hyperparameters of a model can be cast as a bilevel optimization problem, typically solved using zero-order techniques. In this work we study first-order methods when the inner optimization problem is convex but…
It has been well established that increasing scale in deep transformer networks leads to improved quality and performance. However, this increase in scale often comes with prohibitive increases in compute cost and inference latency. We…
This paper presents a real-time computational framework for multi-node distributed optimization by extending the Augmented Lagrangian Alternating Direction Inexact Newton (ALADIN) algorithm. Our approach integrates adjoint sequential…
This is a tutorial and survey paper on Karush-Kuhn-Tucker (KKT) conditions, first-order and second-order numerical optimization, and distributed optimization. After a brief review of history of optimization, we start with some preliminaries…
Alternating gradient-descent-ascent (AltGDA) is an optimization algorithm that has been widely used for model training in various machine learning applications, which aims to solve a nonconvex minimax optimization problem. However, the…
This paper aims to develop distributed algorithms for nonconvex optimization problems with complicated constraints associated with a network. The network can be a physical one, such as an electric power network, where the constraints are…
Alternating Direction Method of Multipliers (ADMM) has been used successfully in many conventional machine learning applications and is considered to be a useful alternative to Stochastic Gradient Descent (SGD) as a deep learning optimizer.…
Decentralized optimization algorithms are important in different contexts, such as distributed optimal power flow or distributed model predictive control, as they avoid central coordination and enable decomposition of large-scale problems.…
For some typical and widely used non-convex half-quadratic regularization models and the Ambrosio-Tortorelli approximate Mumford-Shah model, based on the Kurdyka-\L ojasiewicz analysis and the recent nonconvex proximal algorithms, we…
This paper considers a nonconvex optimization problem that evolves over time, and addresses the synthesis and analysis of regularized primal-dual gradient methods to track a Karush-Kuhn-Tucker (KKT) trajectory. The proposed regularized…
Alternating Minimization is a widely used and empirically successful heuristic for matrix completion and related low-rank optimization problems. Theoretical guarantees for Alternating Minimization have been hard to come by and are still…