Related papers: Loss Gradient Gaussian Width based Generalization …
The success of deep learning is due, to a large extent, to the remarkable effectiveness of gradient-based optimization methods applied to large neural networks. The purpose of this work is to propose a modern view and a general mathematical…
We establish novel generalization bounds for learning algorithms that converge to global minima. We do so by deriving black-box stability results that only depend on the convergence of a learning algorithm and the geometry around the…
An influential line of recent work has focused on the generalization properties of unregularized gradient-based learning procedures applied to separable linear classification with exponentially-tailed loss functions. The ability of such…
This paper addresses the study of derivative-free smooth optimization problems, where the gradient information on the objective function is unavailable. Two novel general derivative-free methods are proposed and developed for minimizing…
Optimization and generalization are two essential aspects of statistical machine learning. In this paper, we propose a framework to connect optimization with generalization by analyzing the generalization error based on the optimization…
We study the complexity of finding the global solution to stochastic nonconvex optimization when the objective function satisfies global Kurdyka-Lojasiewicz (KL) inequality and the queries from stochastic gradient oracles satisfy mild…
We study the statistical and computational complexities of the Polyak step size gradient descent algorithm under generalized smoothness and Lojasiewicz conditions of the population loss function, namely, the limit of the empirical loss…
We provide sharp path-dependent generalization and excess risk guarantees for the full-batch Gradient Descent (GD) algorithm on smooth losses (possibly non-Lipschitz, possibly nonconvex). At the heart of our analysis is an upper bound on…
Many statistical $M$-estimators are based on convex optimization problems formed by the combination of a data-dependent loss function with a norm-based regularizer. We analyze the convergence rates of projected gradient and composite…
Distributionally robust optimization has emerged as an attractive way to train robust machine learning models, capturing data uncertainty and distribution shifts. Recent statistical analyses have proved that generalization guarantees of…
We develop a general framework for proving rigorous guarantees on the performance of the EM algorithm and a variant known as gradient EM. Our analysis is divided into two parts: a treatment of these algorithms at the population level (in…
In statistical learning theory, generalization error is used to quantify the degree to which a supervised machine learning algorithm may overfit to training data. Recent work [Xu and Raginsky (2017)] has established a bound on the…
Policy gradients methods apply to complex, poorly understood, control problems by performing stochastic gradient descent over a parameterized class of polices. Unfortunately, even for simple control problems solvable by standard dynamic…
We study the problem of non-convex optimization using Stochastic Gradient Langevin Dynamics (SGLD). SGLD is a natural and popular variation of stochastic gradient descent where at each step, appropriately scaled Gaussian noise is added. To…
Proving algorithm-dependent generalization error bounds for gradient-type optimization methods has attracted significant attention recently in learning theory. However, most existing trajectory-based analyses require either restrictive…
We consider the problem of optimising the expected value of a loss functional over a nonlinear model class of functions, assuming that we have only access to realisations of the gradient of the loss. This is a classical task in statistics,…
We address the problem of distributed convex unconstrained optimization over networks characterized by asynchronous and possibly lossy communications. We analyze the case where the global cost function is the sum of locally coupled local…
Optimization is widely used in statistics, and often efficiently delivers point estimates on useful spaces involving structural constraints or combinatorial structure. To quantify uncertainty, Gibbs posterior exponentiates the negative loss…
A longstanding goal in deep learning research has been to precisely characterize training and generalization. However, the often complex loss landscapes of neural networks have made a theory of learning dynamics elusive. In this work, we…
The success of deep learning has led to a rising interest in the generalization property of the stochastic gradient descent (SGD) method, and stability is one popular approach to study it. Existing works based on stability have studied…