Related papers: QN Optimization with Hessian Sample
Quasi-Newton (QN) methods provide an efficient alternative to second-order methods for minimizing smooth unconstrained problems. While QN methods generally compose a Hessian estimate based on one secant interpolation per iteration,…
We present an algorithm for minimizing a sum of functions that combines the computational efficiency of stochastic gradient descent (SGD) with the second order curvature information leveraged by quasi-Newton methods. We unify these…
Second-order methods for neural network optimization have several advantages over methods based on first-order gradient descent, including better scaling to large mini-batch sizes and fewer updates needed for convergence. But they are…
We propose a novel algorithm, termed soft quasi-Newton (soft QN), for optimization in the presence of bounded noise. Traditional quasi-Newton algorithms are vulnerable to such perturbations. To develop a more robust quasi-Newton method, we…
Stochastic gradient descent and other first-order variants, such as Adam and AdaGrad, are commonly used in the field of deep learning due to their computational efficiency and low-storage memory requirements. However, these methods do not…
Deep learning algorithms often require solving a highly non-linear and nonconvex unconstrained optimization problem. Methods for solving optimization problems in large-scale machine learning, such as deep learning and deep reinforcement…
Large scale optimization problems are ubiquitous in machine learning and data analysis and there is a plethora of algorithms for solving such problems. Many of these algorithms employ sub-sampling, as a way to either speed up the…
Many data-fitting applications require the solution of an optimization problem involving a sum of large number of functions of high dimensional parameter. Here, we consider the problem of minimizing a sum of $n$ functions over a convex…
Training deep neural network is a high dimensional and a highly non-convex optimization problem. Stochastic gradient descent (SGD) algorithm and it's variations are the current state-of-the-art solvers for this task. However, due to…
In this paper, we consider an unconstrained optimization model where the objective is a sum of a large number of possibly nonconvex functions, though overall the objective is assumed to be smooth and convex. Our bid to solving such model…
We propose a novel limited-memory stochastic block BFGS update for incorporating enriched curvature information in stochastic approximation methods. In our method, the estimate of the inverse Hessian matrix that is maintained by it, is…
The question of how to incorporate curvature information in stochastic approximation methods is challenging. The direct application of classical quasi- Newton updating techniques for deterministic optimization leads to noisy curvature…
In this paper, we consider stochastic second-order methods for minimizing a finite summation of nonconvex functions. One important key is to find an ingenious but cheap scheme to incorporate local curvature information. Since the true…
We develop error-tolerant quantum state discrimination(QSD) strategies that maintain reliable performance under moderate noise. Two complementary approaches are proposed: CrossQSD, which generalizes unambiguous discrimination with tunable…
We consider minimizing finite-sum and expectation objective functions via Hessian-averaging based subsampled Newton methods. These methods allow for gradient inexactness and have fixed per-iteration Hessian approximation costs. The recent…
We consider a class of distributed optimization problem where the objective function consists of a sum of strongly convex and smooth functions and a (possibly nonsmooth) convex regularizer. A multi-agent network is assumed, where each agent…
In this paper, we introduce the Quasi-Quadratic Gradient (QQG), a novel search direction designed to accelerate the BFGS method within the quasi-Newton framework. By defining the QQG as the product of the inverse Hessian approximation and…
Near-term quantum devices provide only finite-shot measurements and prepare imperfect, contaminated states. This motivates algorithms that convert samples into reliable low-energy estimates without full tomography or exhaustive…
We analyze the convergence rate of the randomized Newton-like method introduced by Qu et. al. (2016) for smooth and convex objectives, which uses random coordinate blocks of a Hessian-over-approximation matrix $\bM$ instead of the true…
Second-order optimization methods are among the most widely used optimization approaches for convex optimization problems, and have recently been used to optimize non-convex optimization problems such as deep learning models. The widely…