Related papers: Loss Gradient Gaussian Width based Generalization …

Loss landscapes and optimization in over-parameterized non-linear systems and neural networks

The success of deep learning is due, to a large extent, to the remarkable effectiveness of gradient-based optimization methods applied to large neural networks. The purpose of this work is to propose a modern view and a general mathematical…

Machine Learning · Computer Science 2021-05-28 Chaoyue Liu , Libin Zhu , Mikhail Belkin

Stability and Generalization of Learning Algorithms that Converge to Global Optima

We establish novel generalization bounds for learning algorithms that converge to global minima. We do so by deriving black-box stability results that only depend on the convergence of a learning algorithm and the geometry around the…

Machine Learning · Statistics 2017-10-25 Zachary Charles , Dimitris Papailiopoulos

Stability vs Implicit Bias of Gradient Methods on Separable Data and Beyond

An influential line of recent work has focused on the generalization properties of unregularized gradient-based learning procedures applied to separable linear classification with exponentially-tailed loss functions. The ability of such…

Machine Learning · Computer Science 2022-06-24 Matan Schliserman , Tomer Koren

General Derivative-Free Optimization Methods under Global and Local Lipschitz Continuity of Gradients

This paper addresses the study of derivative-free smooth optimization problems, where the gradient information on the objective function is unavailable. Two novel general derivative-free methods are proposed and developed for minimizing…

Optimization and Control · Mathematics 2023-11-29 Pham Duy Khanh , Boris S. Mordukhovich , Dat Ba Tran

From Optimization Dynamics to Generalization Bounds via {\L}ojasiewicz Gradient Inequality

Optimization and generalization are two essential aspects of statistical machine learning. In this paper, we propose a framework to connect optimization with generalization by analyzing the generalization error based on the optimization…

Machine Learning · Statistics 2022-10-13 Fusheng Liu , Haizhao Yang , Soufiane Hayou , Qianxiao Li

Sharp Analysis of Stochastic Optimization under Global Kurdyka-{\L}ojasiewicz Inequality

We study the complexity of finding the global solution to stochastic nonconvex optimization when the objective function satisfies global Kurdyka-Lojasiewicz (KL) inequality and the queries from stochastic gradient oracles satisfy mild…

Optimization and Control · Mathematics 2022-10-05 Ilyas Fatkhullin , Jalal Etesami , Niao He , Negar Kiyavash

Towards Statistical and Computational Complexities of Polyak Step Size Gradient Descent

We study the statistical and computational complexities of the Polyak step size gradient descent algorithm under generalized smoothness and Lojasiewicz conditions of the population loss function, namely, the limit of the empirical loss…

Machine Learning · Computer Science 2021-10-18 Tongzheng Ren , Fuheng Cui , Alexia Atsidakou , Sujay Sanghavi , Nhat Ho

Beyond Lipschitz: Sharp Generalization and Excess Risk Bounds for Full-Batch GD

We provide sharp path-dependent generalization and excess risk guarantees for the full-batch Gradient Descent (GD) algorithm on smooth losses (possibly non-Lipschitz, possibly nonconvex). At the heart of our analysis is an upper bound on…

Machine Learning · Statistics 2023-02-13 Konstantinos E. Nikolakakis , Farzin Haddadpour , Amin Karbasi , Dionysios S. Kalogerias

Fast global convergence of gradient methods for high-dimensional statistical recovery

Many statistical $M$-estimators are based on convex optimization problems formed by the combination of a data-dependent loss function with a norm-based regularizer. We analyze the convergence rates of projected gradient and composite…

Machine Learning · Statistics 2012-07-26 Alekh Agarwal , Sahand N. Negahban , Martin J. Wainwright

Universal generalization guarantees for Wasserstein distributionally robust models

Distributionally robust optimization has emerged as an attractive way to train robust machine learning models, capturing data uncertainty and distribution shifts. Recent statistical analyses have proved that generalization guarantees of…

Optimization and Control · Mathematics 2025-01-28 Tam Le , Jérôme Malick

Statistical guarantees for the EM algorithm: From population to sample-based analysis

We develop a general framework for proving rigorous guarantees on the performance of the EM algorithm and a variant known as gradient EM. Our analysis is divided into two parts: a treatment of these algorithms at the population level (in…

Statistics Theory · Mathematics 2014-08-12 Sivaraman Balakrishnan , Martin J. Wainwright , Bin Yu

Generalization Error Bounds for Noisy, Iterative Algorithms

In statistical learning theory, generalization error is used to quantify the degree to which a supervised machine learning algorithm may overfit to training data. Recent work [Xu and Raginsky (2017)] has established a bound on the…

Machine Learning · Computer Science 2018-01-16 Ankit Pensia , Varun Jog , Po-Ling Loh

Global Optimality Guarantees For Policy Gradient Methods

Policy gradients methods apply to complex, poorly understood, control problems by performing stochastic gradient descent over a parameterized class of polices. Unfortunately, even for simple control problems solvable by standard dynamic…

Machine Learning · Computer Science 2022-06-22 Jalaj Bhandari , Daniel Russo

Langevin Dynamics: A Unified Perspective on Optimization via Lyapunov Potentials

We study the problem of non-convex optimization using Stochastic Gradient Langevin Dynamics (SGLD). SGLD is a natural and popular variation of stochastic gradient descent where at each step, appropriately scaled Gaussian noise is added. To…

Machine Learning · Computer Science 2024-07-08 August Y. Chen , Ayush Sekhari , Karthik Sridharan

Generalization Bounds for Gradient Methods via Discrete and Continuous Prior

Proving algorithm-dependent generalization error bounds for gradient-type optimization methods has attracted significant attention recently in learning theory. However, most existing trajectory-based analyses require either restrictive…

Machine Learning · Computer Science 2022-10-12 Xuanyuan Luo , Luo Bei , Jian Li

Optimal sampling for stochastic and natural gradient descent

We consider the problem of optimising the expected value of a loss functional over a nonlinear model class of functions, assuming that we have only access to realisations of the gradient of the loss. This is a classical task in statistics,…

Optimization and Control · Mathematics 2026-02-02 Robert Gruhlke , Anthony Nouy , Philipp Trunschke

Generalized gradient optimization over lossy networks for partition-based estimation

We address the problem of distributed convex unconstrained optimization over networks characterized by asynchronous and possibly lossy communications. We analyze the case where the global cost function is the sum of locally coupled local…

Optimization and Control · Mathematics 2020-10-06 Marco Todescato , Nicoletta Bof , Guido Cavraro , Ruggero Carli , Luca Schenato

Bridged Posterior: Optimization, Profile Likelihood and a New Approach to Generalized Bayes

Optimization is widely used in statistics, and often efficiently delivers point estimates on useful spaces involving structural constraints or combinatorial structure. To quantify uncertainty, Gibbs posterior exponentiates the negative loss…

Methodology · Statistics 2025-07-23 Cheng Zeng , Eleni Dilma , Jason Xu , Leo L Duan

Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent

A longstanding goal in deep learning research has been to precisely characterize training and generalization. However, the often complex loss landscapes of neural networks have made a theory of learning dynamics elusive. In this work, we…

Machine Learning · Statistics 2021-02-03 Jaehoon Lee , Lechao Xiao , Samuel S. Schoenholz , Yasaman Bahri , Roman Novak , Jascha Sohl-Dickstein , Jeffrey Pennington

Generalization Error Bounds with Probabilistic Guarantee for SGD in Nonconvex Optimization

The success of deep learning has led to a rising interest in the generalization property of the stochastic gradient descent (SGD) method, and stability is one popular approach to study it. Existing works based on stability have studied…

Machine Learning · Statistics 2019-03-08 Yi Zhou , Yingbin Liang , Huishuai Zhang