Related papers: Provably Correct Automatic Subdifferentiation for …
How does one compile derivatives of tensor programs, such that the resulting code is purely functional (hence easier to optimize and parallelize) and provably efficient relative to the original program? We show that naively differentiating…
Simplex gradients are an essential feature of many derivative free optimization algorithms, and can be employed, for example, as part of the process of defining a direction of search, or as part of a termination criterion. The calculation…
Using the notion of conservative gradient, we provide a simple model to estimate the computational costs of the backward and forward modes of algorithmic differentiation for a wide class of nonsmooth programs. The overhead complexity of the…
Stochastic gradient descent is the method of choice for large-scale machine learning problems, by virtue of its light complexity per iteration. However, it lags behind its non-stochastic counterparts with respect to the convergence rate,…
We investigate the computation of the gradient of the value function in parametric convex optimization problems. We derive general expression for the gradient of the value function in terms of the cost function, constraints and Lagrange…
In this paper we propose two proximal gradient algorithms for fractional programming problems in real Hilbert spaces, where the numerator is a proper, convex and lower semicontinuous function and the denominator is a smooth function, either…
This work investigates finite differences and the use of interpolation models to obtain approximations to the first and second derivatives of a function. Here, it is shown that if a particular set of points is used in the interpolation…
We derive a stochastic gradient algorithm for semidefinite optimization using randomization techniques. The algorithm uses subsampling to reduce the computational cost of each iteration and the subsampling ratio explicitly controls…
The paper addresses the study and applications of a broad class of extended-real-valued functions, known as optimal value or marginal functions, which are frequently appeared in variational analysis, parametric optimization, and a variety…
Several recent empirical studies demonstrate that important machine learning tasks, e.g., training deep neural networks, exhibit low-rank structure, where the loss function varies significantly in only a few directions of the input space.…
Estimating hyperparameters has been a long-standing problem in machine learning. We consider the case where the task at hand is modeled as the solution to an optimization problem. Here the exact gradient with respect to the hyperparameters…
Gradient Descent (GD) and Conjugate Gradient (CG) methods are among the most effective iterative algorithms for solving unconstrained optimization problems, particularly in machine learning and statistical modeling, where they are employed…
Many problems encountered in science and engineering can be formulated as estimating a low-rank object (e.g., matrices and tensors) from incomplete, and possibly corrupted, linear measurements. Through the lens of matrix and tensor…
Finding gradients is a crucial step in training machine learning models. For quantum neural networks, computing gradients using the parameter-shift rule requires calculating the cost function twice for each adjustable parameter in the…
Algorithmic differentiation (AD) tools allow to obtain gradient information of a continuously differentiable objective function in a computationally cheap way using the so-called backward mode. It is common practice to use the same tools…
Multivariate functions encountered in high-dimensional uncertainty quantification problems often vary most strongly along a few dominant directions in the input parameter space. We propose a gradient-based method for detecting these…
In a variety of problems originating in supervised, unsupervised, and reinforcement learning, the loss function is defined by an expectation over a collection of random variables, which might be part of a probabilistic model or the external…
Using backpropagation to compute gradients of objective functions for optimization has remained a mainstay of machine learning. Backpropagation, or reverse-mode differentiation, is a special case within the general family of automatic…
We consider gradient descent like algorithms for Support Vector Machine (SVM) training when the data is in relational form. The gradient of the SVM objective can not be efficiently computed by known techniques as it suffers from the…
How much can you say about the gradient of a neural network without computing a loss or knowing the label? This may sound like a strange question: surely the answer is "very little." However, in this paper, we show that gradients are more…