Related papers: Towards Optimal Problem Dependent Generalization E…

Learning with Statistical Equality Constraints

As machine learning applications grow increasingly ubiquitous and complex, they face an increasing set of requirements beyond accuracy. The prevalent approach to handle this challenge is to aggregate a weighted combination of requirement…

Machine Learning · Computer Science 2026-01-07 Aneesh Barthakur , Luiz F. O. Chamon

Learning Hard Optimization Problems: A Data Generation Perspective

Optimization problems are ubiquitous in our societies and are present in almost every segment of the economy. Most of these optimization problems are NP-hard and computationally demanding, often requiring approximate solutions for…

Optimization and Control · Mathematics 2021-06-23 James Kotary , Ferdinando Fioretto , Pascal Van Hentenryck

On improving generalization in a class of learning problems with the method of small parameters for weakly-controlled optimal gradient systems

In this paper, we provide a mathematical framework for improving generalization in a class of learning problems which is related to point estimations for modeling of high-dimensional nonlinear functions. In particular, we consider a…

Optimization and Control · Mathematics 2024-12-13 Getachew K. Befekadu

Stability, Complexity and Data-Dependent Worst-Case Generalization Bounds

Providing generalization guarantees for stochastic optimization algorithms remains a key challenge in learning theory. Recently, numerous works demonstrated the impact of the geometric properties of optimization trajectories on…

Machine Learning · Computer Science 2026-01-23 Mario Tuci , Lennart Bastian , Benjamin Dupuis , Nassir Navab , Tolga Birdal , Umut Şimşekli

Stochastic Optimization Using a Trust-Region Method and Random Models

In this paper, we propose and analyze a trust-region model-based algorithm for solving unconstrained stochastic optimization problems. Our framework utilizes random models of an objective function $f(x)$, obtained from stochastic…

Optimization and Control · Mathematics 2016-09-26 Ruobing Chen , Matt Menickelly , Katya Scheinberg

A Linearly Convergent Proximal Gradient Algorithm for Decentralized Optimization

Decentralized optimization is a powerful paradigm that finds applications in engineering and learning design. This work studies decentralized composite optimization problems with non-smooth regularization terms. Most existing gradient-based…

Optimization and Control · Mathematics 2019-10-29 Sulaiman A. Alghunaim , Kun Yuan , Ali H. Sayed

Fast Rate Information-theoretic Bounds on Generalization Errors

The generalization error of a learning algorithm refers to the discrepancy between the loss of a learning algorithm on training data and that on unseen testing data. Various information-theoretic bounds on the generalization error have been…

Information Theory · Computer Science 2025-06-24 Xuetong Wu , Jonathan H. Manton , Uwe Aickelin , Jingge Zhu

Fast learning rates in statistical inference through aggregation

We develop minimax optimal risk bounds for the general learning task consisting in predicting as well as the best function in a reference set $\mathcal{G}$ up to the smallest possible additive term, called the convergence rate. When the…

Statistics Theory · Mathematics 2009-09-09 Jean-Yves Audibert

Fast learning rates in statistical inference through aggregation

We develop minimax optimal risk bounds for the general learning task consisting in predicting as well as the best function in a reference set G up to the smallest possible additive term, called the convergence rate. When the reference set…

Statistics Theory · Mathematics 2008-03-04 Jean-Yves Audibert

Training With Data Dependent Dynamic Learning Rates

Recently many first and second order variants of SGD have been proposed to facilitate training of Deep Neural Networks (DNNs). A common limitation of these works stem from the fact that they use the same learning rate across all instances…

Machine Learning · Computer Science 2021-05-31 Shreyas Saxena , Nidhi Vyas , Dennis DeCoste

Optimal Learning via Moderate Deviations Theory

This paper proposes a statistically optimal approach for learning a function value using a confidence interval in a wide range of models, including general non-parametric estimation of an expected loss described as a stochastic programming…

Machine Learning · Statistics 2025-08-07 Arnab Ganguly , Tobias Sutter

Embedding generalization within the learning dynamics: An approach based-on sample path large deviation theory

We consider a typical learning problem of point estimations for modeling of nonlinear functions or dynamical systems in which generalization, i.e., verifying a given learned model, can be embedded as an integral part of the learning process…

Optimization and Control · Mathematics 2024-08-06 Getachew K. Befekadu

On Generalization Error Bounds of Noisy Gradient Methods for Non-Convex Learning

Generalization error (also known as the out-of-sample error) measures how well the hypothesis learned from training data generalizes to previously unseen data. Proving tight generalization error bounds is a central question in statistical…

Machine Learning · Computer Science 2020-03-03 Jian Li , Xuanyuan Luo , Mingda Qiao

Self-Regularized Learning Methods

We introduce a general framework for analyzing learning algorithms based on the notion of self-regularization, which captures implicit complexity control without requiring explicit regularization. This is motivated by previous observations…

Machine Learning · Statistics 2026-03-19 Max Schölpple , Liu Fanghui , Ingo Steinwart

Learning-to-Learn Stochastic Gradient Descent with Biased Regularization

We study the problem of learning-to-learn: inferring a learning algorithm that works well on tasks sampled from an unknown distribution. As class of algorithms we consider Stochastic Gradient Descent on the true risk regularized by the…

Machine Learning · Computer Science 2019-03-26 Giulia Denevi , Carlo Ciliberto , Riccardo Grazzi , Massimiliano Pontil

Data-Dependent Stability of Stochastic Gradient Descent

We establish a data-dependent notion of algorithmic stability for Stochastic Gradient Descent (SGD), and employ it to develop novel generalization bounds. This is in contrast to previous distribution-free algorithmic stability results for…

Machine Learning · Computer Science 2018-02-19 Ilja Kuzborskij , Christoph H. Lampert

On the Convergence and Complexity of the Stochastic Central Finite-Difference Based Gradient Estimation Methods

This paper presents an algorithmic framework for solving unconstrained stochastic optimization problems using only stochastic function evaluations. We employ central finite-difference based gradient estimation methods to approximate the…

Optimization and Control · Mathematics 2025-01-14 Raghu Bollapragada , Cem Karamanli

Time-Delay Momentum: A Regularization Perspective on the Convergence and Generalization of Stochastic Momentum for Deep Learning

In this paper we study the problem of convergence and generalization error bound of stochastic momentum for deep learning from the perspective of regularization. To do so, we first interpret momentum as solving an $\ell_2$-regularized…

Machine Learning · Computer Science 2019-06-04 Ziming Zhang , Wenju Xu , Alan Sullivan

Tight Convergence Rate Bounds for Optimization Under Power Law Spectral Conditions

Performance of optimization on quadratic problems sensitively depends on the low-lying part of the spectrum. For large (effectively infinite-dimensional) problems, this part of the spectrum can often be naturally represented or approximated…

Optimization and Control · Mathematics 2024-03-26 Maksim Velikanov , Dmitry Yarotsky

Tight Nonparametric Convergence Rates for Stochastic Gradient Descent under the Noiseless Linear Model

In the context of statistical supervised learning, the noiseless linear model assumes that there exists a deterministic linear relation $Y = \langle \theta_*, X \rangle$ between the random output $Y$ and the random feature vector $\Phi(U)$,…

Machine Learning · Computer Science 2020-10-28 Raphaël Berthier , Francis Bach , Pierre Gaillard