Related papers: Incremental Without Replacement Sampling in Noncon…
Recent stochastic gradient methods that have appeared in the literature base their efficiency and global convergence properties on a suitable control of the variance of the gradient batch estimate. This control is typically achieved by…
Sampling is a fundamental technique, and sampling without replacement is often desirable when duplicate samples are not beneficial. Within machine learning, sampling is useful for generating diverse outputs from a trained model. We present…
We analyze the convergence rates of stochastic gradient algorithms for smooth finite-sum minimax optimization and show that, for many such algorithms, sampling the data points without replacement leads to faster convergence compared to…
Stochastic gradient methods for machine learning and optimization problems are usually analyzed assuming data points are sampled \emph{with} replacement. In practice, however, sampling \emph{without} replacement is very common, easier to…
This paper reviews the gradient sampling methodology for solving nonsmooth, nonconvex optimization problems. An intuitively straightforward gradient sampling algorithm is stated and its convergence properties are summarized. Throughout this…
Bilevel Optimization has experienced significant advancements recently with the introduction of new efficient algorithms. Mirroring the success in single-level optimization, stochastic gradient-based algorithms are widely used in bilevel…
We present novel minibatch stochastic optimization methods for empirical risk minimization problems, the methods efficiently leverage variance reduced first-order and sub-sampled higher-order information to accelerate the convergence speed.…
We analyze stochastic algorithms for optimizing nonconvex, nonsmooth finite-sum problems, where the nonconvex part is smooth and the nonsmooth part is convex. Surprisingly, unlike the smooth case, our knowledge of this fundamental problem…
We provide the first importance sampling variants of variance reduced algorithms for empirical risk minimization with non-convex loss functions. In particular, we analyze non-convex versions of SVRG, SAGA and SARAH. Our methods have the…
Gradient descent methods and especially their stochastic variants have become highly popular in the last decade due to their efficiency on big data optimization problems. In this thesis we present the development of data sampling strategies…
Sparse learning is a very important tool for mining useful information and patterns from high dimensional data. Non-convex non-smooth regularized learning problems play essential roles in sparse learning, and have drawn extensive attentions…
Incremental methods are widely utilized for solving finite-sum optimization problems in machine learning and signal processing. In this paper, we study a family of incremental methods -- including incremental subgradient, incremental…
We consider the problem of unconstrained minimization of a smooth objective function in $\R^n$ in a setting where only function evaluations are possible. While importance sampling is one of the most popular techniques used by machine…
In this paper, we develop a new accelerated stochastic gradient method for efficiently solving the convex regularized empirical risk minimization problem in mini-batch settings. The use of mini-batches is becoming a golden standard in the…
We consider the problem of minimizing the composition of a smooth (nonconvex) function and a smooth vector mapping, where the inner mapping is in the form of an expectation over some random variable or a finite sum. We propose a stochastic…
This work explores a novel perspective on solving nonconvex and nonsmooth optimization problems by leveraging sampling based methods. Instead of treating the objective function purely through traditional (often deterministic) optimization…
In this paper we address the convergence of stochastic approximation when the functions to be minimized are not convex and nonsmooth. We show that the "mean-limit" approach to the convergence which leads, for smooth problems, to the ODE…
We focus on solving constrained convex optimization problems using mini-batch stochastic gradient descent. Dynamic sample size rules are presented which ensure a descent direction with high probability. Empirical results from two…
When applying a stochastic algorithm, one must choose an order to draw samples. The practical choices are without-replacement sampling orders, which are empirically faster and more cache-friendly than uniform-iid-sampling but often have…
Risk minimization for nonsmooth nonconvex problems naturally leads to first-order sampling or, by an abuse of terminology, to stochastic subgradient descent. We establish the convergence of this method in the path-differentiable case and…