Related papers: Approximate Newton Methods
Minimizing loss functions is central to machine-learning training. Although first-order methods dominate practical applications, higher-order techniques such as Newton's method can deliver greater accuracy and faster convergence, yet are…
Optimization plays a key role in machine learning. Recently, stochastic second-order methods have attracted much attention due to their low computational cost in each iteration. However, these algorithms might perform poorly especially if…
Newton's method is the most widespread high-order method, demanding the gradient and the Hessian of the objective function. However, one of the main disadvantages of Newtons method is its lack of global convergence and high iteration cost.…
Large scale optimization problems are ubiquitous in machine learning and data analysis and there is a plethora of algorithms for solving such problems. Many of these algorithms employ sub-sampling, as a way to either speed up the…
Optimization plays a key role in machine learning. Recently, stochastic second-order methods have attracted much attention due to their low computational cost in each iteration. However, these algorithms might perform poorly especially if…
Many machine learning models depend on solving a large scale optimization problem. Recently, sub-sampled Newton methods have emerged to attract much attention for optimization due to their efficiency at each iteration, rectified a weakness…
Many data-fitting applications require the solution of an optimization problem involving a sum of large number of functions of high dimensional parameter. Here, we consider the problem of minimizing a sum of $n$ functions over a convex…
Inspired by multigrid methods for linear systems of equations, multilevel optimization methods have been proposed to solve structured optimization problems. Multilevel methods make more assumptions regarding the structure of the…
We investigate the problem of sequential linear data prediction for real life big data applications. The second order algorithms, i.e., Newton-Raphson Methods, asymptotically achieve the performance of the "best" possible linear data…
In many contemporary optimization problems such as those arising in machine learning, it can be computationally challenging or even infeasible to evaluate an entire function or its derivatives. This motivates the use of stochastic…
Machine learning assumes a pivotal role in our data-driven world. The increasing scale of models and datasets necessitates quick and reliable algorithms for model training. This dissertation investigates adaptivity in machine learning…
We consider minimization of a sum of convex objective functions where the components of the objective are available at different nodes of a network and nodes are allowed to only communicate with their neighbors. The use of distributed…
While first-order optimization methods such as stochastic gradient descent (SGD) are popular in machine learning (ML), they come with well-known deficiencies, including relatively-slow convergence, sensitivity to the settings of…
The problem of minimizing a sum of local convex objective functions over a networked system captures many important applications and has received much attention in the distributed optimization field. Most of existing work focuses on…
Superlinear convergence has been an elusive goal for black-box nonsmooth optimization. Even in the convex case, the subgradient method is very slow, and while some cutting plane algorithms, including traditional bundle methods, are popular…
In this paper, we propose new proximal Newton-type methods for convex optimization problems in composite form. The applications include model predictive control (MPC) and embedded MPC. Our new methods are computationally attractive since…
First-order stochastic methods are the state-of-the-art in large-scale machine learning optimization owing to efficient per-iteration complexity. Second-order methods, while able to provide faster convergence, have been much less explored…
We generalize Newton-type methods for minimizing smooth functions to handle a sum of two convex functions: a smooth function and a nonsmooth function with a simple proximal mapping. We show that the resulting proximal Newton-type methods…
Dual descent methods are commonly used to solve network flow optimization problems, since their implementation can be distributed over the network. These algorithms, however, often exhibit slow convergence rates. Approximate Newton methods…
We develop a principled approach to obtain exact computer-aided worst-case guarantees on the performance of second-order optimization methods on classes of univariate functions. We first present a generic technique to derive interpolation…