Related papers: Implicit Regularization in Over-Parameterized Supp…
In this paper, we leverage over-parameterization to design regularization-free algorithms for the high-dimensional single index model and provide theoretical guarantees for the induced implicit regularization phenomenon. Specifically, we…
Recent years have seen a flurry of activities in designing provably efficient nonconvex procedures for solving statistical estimation problems. Due to the highly nonconvex nature of the empirical loss, state-of-the-art procedures often…
Many important machine learning applications involve regularized nonconvex bi-level optimization. However, the existing gradient-based bi-level optimization algorithms cannot handle nonconvex or nonsmooth regularizers, and they suffer from…
In nonsmooth optimization, a negative subgradient is not necessarily a descent direction, making the design of convergent descent methods based on zeroth-order and first-order information a challenging task. The well-studied bundle methods…
We propose a new method for unconstrained optimization of a smooth and strongly convex function, which attains the optimal rate of convergence of Nesterov's accelerated gradient descent. The new algorithm has a simple geometric…
In machine learning and statistical data analysis, we often run into objective function that is a summation: the number of terms in the summation possibly is equal to the sample size, which can be enormous. In such a setting, the stochastic…
We study finite-sum nonconvex optimization problems, where the objective function is an average of $n$ nonconvex functions. We propose a new stochastic gradient descent algorithm based on nested variance reduction. Compared with…
Many statistical estimators for high-dimensional linear regression are M-estimators, formed through minimizing a data-dependent square loss function plus a regularizer. This work considers a new class of estimators implicitly defined…
We propose a general learning based framework for solving nonsmooth and nonconvex image reconstruction problems. We model the regularization function as the composition of the $l_{2,1}$ norm and a smooth but nonconvex feature mapping…
Optimization algorithms for solving nonconvex inverse problem have attracted significant interests recently. However, existing methods require the nonconvex regularization to be smooth or simple to ensure convergence. In this paper, we…
When equipped with efficient optimization algorithms, the over-parameterized neural networks have demonstrated high level of performance even though the loss function is non-convex and non-smooth. While many works have been focusing on…
Recovering jointly sparse signals in the multiple measurement vectors (MMV) setting is a fundamental problem in machine learning, but traditional methods often require careful parameter tuning or prior knowledge of the sparsity of the…
Over-parameterized neural networks generalize well in practice without any explicit regularization. Although it has not been proven yet, empirical evidence suggests that implicit regularization plays a crucial role in deep learning and…
In this thesis we develop a novel framework to study smooth and strongly convex optimization algorithms, both deterministic and stochastic. Focusing on quadratic functions we are able to examine optimization algorithms as a recursive…
This paper addresses the problem of efficiently classifying high-dimensional data over decentralized networks. Penalized support vector machines (SVMs) are widely used for high-dimensional classification tasks. However, the double…
The Support Vector Machine (SVM) has been used in a wide variety of classification problems. The original SVM uses the hinge loss function, which is non-differentiable and makes the problem difficult to solve in particular for regularized…
In this paper, we propose a unified view of gradient-based algorithms for stochastic convex composite optimization by extending the concept of estimate sequence introduced by Nesterov. This point of view covers the stochastic gradient…
We consider stochastic gradient methods under the interpolation regime where a perfect fit can be obtained (minimum loss at each observation). While previous work highlighted the implicit regularization of such algorithms, we consider an…
We present two stochastic descent algorithms that apply to unconstrained optimization and are particularly efficient when the objective function is slow to evaluate and gradients are not easily obtained, as in some PDE-constrained…
Deep neural networks with remarkably strong generalization performances are usually over-parameterized. Despite explicit regularization strategies are used for practitioners to avoid over-fitting, the impacts are often small. Some…