Related papers: HAMSI: A Parallel Incremental Optimization Algorit…
Large scale optimization problems are ubiquitous in machine learning and data analysis and there is a plethora of algorithms for solving such problems. Many of these algorithms employ sub-sampling, as a way to either speed up the…
Current algorithms for large-scale industrial optimization problems typically face a trade-off: they either require exponential time to reach optimal solutions, or employ problem-specific heuristics. To overcome these limitations, we…
Hierarchical learning algorithms that gradually approximate a solution to a data-driven optimization problem are essential to decision-making systems, especially under limitations on time and computational resources. In this study, we…
Inspired by the developments in quantum computing, building domain-specific classical hardware to solve computationally hard problems has received increasing attention. Here, by introducing systematic sparsification techniques, we…
We consider learning problems over training sets in which both, the number of training examples and the dimension of the feature vectors, are large. To solve these problems we propose the random parallel stochastic algorithm (RAPSA). We…
Recently several methods were proposed for sparse optimization which make careful use of second-order information [10, 28, 16, 3] to improve local convergence rates. These methods construct a composite quadratic approximation using Hessian…
In this work, we consider convex optimization problems with smooth objective function and nonsmooth functional constraints. We propose a new stochastic gradient algorithm, called Stochastic Halfspace Approximation Method (SHAM), to solve…
Specialized function gradient computing hardware could greatly improve the performance of state-of-the-art optimization algorithms, e.g., based on gradient descent or conjugate gradient methods that are at the core of control, machine…
An algorithm is proposed for solving optimization problems arising in neural network training for supervised learning. The unique feature of the algorithm is the use of an auxiliary loss, in addition to the original loss employed for model…
Numerous practical medical problems often involve data that possess a combination of both sparse and non-sparse structures. Traditional penalized regularizations techniques, primarily designed for promoting sparsity, are inadequate to…
The growing interest for high dimensional and functional data analysis led in the last decade to an important research developing a consequent amount of techniques. Parallelized algorithms, which consist in distributing and treat the data…
We develop a new parallel algorithm for minimizing Lipschitz, convex functions with a stochastic subgradient oracle. The total number of queries made and the query depth, i.e., the number of parallel rounds of queries, match the prior…
Joint diagonalization, the process of finding a shared set of approximate eigenvectors for a collection of matrices, arises in diverse applications such as multidimensional harmonic analysis or quantum information theory. This task is…
In this paper, we discuss the problem of minimizing the sum of two convex functions: a smooth function plus a non-smooth function. Further, the smooth part can be expressed by the average of a large number of smooth component functions, and…
We propose an inexact variable-metric proximal point algorithm to accelerate gradient-based optimization algorithms. The proposed scheme, called QNing can be notably applied to incremental first-order methods such as the stochastic…
We describe an approach to parallel graph partitioning that scales to hundreds of processors and produces a high solution quality. For example, for many instances from Walshaw's benchmark collection we improve the best known partitioning.…
We present an algorithm for minimizing a sum of functions that combines the computational efficiency of stochastic gradient descent (SGD) with the second order curvature information leveraged by quasi-Newton methods. We unify these…
In this paper, we consider the problem of stochastic optimization, where the objective function is in terms of the expectation of a (possibly non-convex) cost function that is parametrized by a random variable. While the convergence speed…
Second-order optimization methods offer superior convergence rates but are often bottlenecked by the wall-clock cost of Hessian computation and factorization. In the moderate-dimensional regime where the full Hessian fits in memory,…
We consider the projected gradient algorithm for the nonconvex best subset selection problem that minimizes a given empirical loss function under an $\ell_0$-norm constraint. Through decomposing the feasible set of the given sparsity…