Related papers: Simultaneous Model Selection and Optimization thro…

Stochastic Gradient Descent for Nonparametric Additive Regression

This paper introduces an iterative algorithm for training nonparametric additive models that enjoys favorable memory storage and computational requirements. The algorithm can be viewed as the functional counterpart of stochastic gradient…

Machine Learning · Statistics 2026-01-01 Xin Chen , Jason M. Klusowski

Beyond Cross-Validation: Adaptive Parameter Selection for Kernel-Based Gradient Descents

This paper proposes a novel parameter selection strategy for kernel-based gradient descent (KGD) algorithms, integrating bias-variance analysis with the splitting method. We introduce the concept of empirical effective dimension to quantify…

Machine Learning · Statistics 2026-03-05 Xiaotong Liu , Yunwen Lei , Xiangyu Chang , Shao-Bo Lin

Training Deep Networks without Learning Rates Through Coin Betting

Deep learning methods achieve state-of-the-art performance in many application scenarios. Yet, these methods require a significant amount of hyperparameters tuning in order to achieve the best results. In particular, tuning the learning…

Machine Learning · Computer Science 2017-11-07 Francesco Orabona , Tatiana Tommasi

Bolstering Stochastic Gradient Descent with Model Building

Stochastic gradient descent method and its variants constitute the core optimization algorithms that achieve good convergence rates for solving machine learning problems. These rates are obtained especially when these algorithms are…

Machine Learning · Computer Science 2024-03-14 S. Ilker Birbil , Ozgur Martin , Gonenc Onay , Figen Oztoprak

Towards Learning Stochastic Population Models by Gradient Descent

Increasing effort is put into the development of methods for learning mechanistic models from data. This task entails not only the accurate estimation of parameters but also a suitable model structure. Recent work on the discovery of…

Machine Learning · Computer Science 2024-07-01 Justin N. Kreikemeyer , Philipp Andelfinger , Adelinde M. Uhrmacher

Stochastic Low-Rank Kernel Learning for Regression

We present a novel approach to learn a kernel-based regression function. It is based on the useof conical combinations of data-based parameterized kernels and on a new stochastic convex optimization procedure of which we establish…

Machine Learning · Computer Science 2012-01-13 Pierre Machart , Thomas Peel , Liva Ralaivola , Sandrine Anthoine , Hervé Glotin

Stochastic gradient descent with random learning rate

We propose to optimize neural networks with a uniformly-distributed random learning rate. The associated stochastic gradient descent algorithm can be approximated by continuous stochastic equations and analyzed within the Fokker-Planck…

Machine Learning · Computer Science 2020-10-13 Daniele Musso

Approximate Stochastic Subgradient Estimation Training for Support Vector Machines

Subgradient algorithms for training support vector machines have been quite successful for solving large-scale and online learning problems. However, they have been restricted to linear kernels and strongly convex formulations. This paper…

Machine Learning · Computer Science 2011-11-04 Sangkyun Lee , Stephen J. Wright

Constant Step Size Stochastic Gradient Descent for Probabilistic Modeling

Stochastic gradient methods enable learning probabilistic models from large amounts of data. While large step-sizes (learning rates) have shown to be best for least-squares (e.g., Gaussian noise) once combined with parameter averaging,…

Machine Learning · Statistics 2018-11-22 Dmitry Babichev , Francis Bach

Gradient Descent with Provably Tuned Learning-rate Schedules

Gradient-based iterative optimization methods are the workhorse of modern machine learning. They crucially rely on careful tuning of parameters like learning rate and momentum. However, one typically sets them using heuristic approaches…

Machine Learning · Computer Science 2025-12-05 Dravyansh Sharma

No More Pesky Learning Rates

The performance of stochastic gradient descent (SGD) depends critically on how learning rates are tuned and decreased over time. We propose a method to automatically adjust multiple learning rates so as to minimize the expected error at any…

Machine Learning · Statistics 2013-02-19 Tom Schaul , Sixin Zhang , Yann LeCun

When Does Stochastic Gradient Algorithm Work Well?

In this paper, we consider a general stochastic optimization problem which is often at the core of supervised learning, such as deep learning and linear classification. We consider a standard stochastic gradient descent (SGD) method with a…

Machine Learning · Statistics 2018-12-27 Lam M. Nguyen , Nam H. Nguyen , Dzung T. Phan , Jayant R. Kalagnanam , Katya Scheinberg

Alignment Based Kernel Learning with a Continuous Set of Base Kernels

The success of kernel-based learning methods depend on the choice of kernel. Recently, kernel learning methods have been proposed that use data to select the most appropriate kernel, usually by combining a set of base kernels. We introduce…

Machine Learning · Computer Science 2011-12-21 Arash Afkanpour , Csaba Szepesvari , Michael Bowling

A Randomized Mirror Descent Algorithm for Large Scale Multiple Kernel Learning

We consider the problem of simultaneously learning to linearly combine a very large number of kernels and learn a good predictor based on the learnt kernel. When the number of kernels $d$ to be combined is very large, multiple kernel…

Machine Learning · Computer Science 2015-03-20 Arash Afkanpour , András György , Csaba Szepesvári , Michael Bowling

Model selection of polynomial kernel regression

Polynomial kernel regression is one of the standard and state-of-the-art learning strategies. However, as is well known, the choices of the degree of polynomial kernel and the regularization parameter are still open in the realm of model…

Machine Learning · Computer Science 2023-06-14 Shaobo Lin , Xingping Sun , Zongben Xu , Jinshan Zeng

Tight Nonparametric Convergence Rates for Stochastic Gradient Descent under the Noiseless Linear Model

In the context of statistical supervised learning, the noiseless linear model assumes that there exists a deterministic linear relation $Y = \langle \theta_*, X \rangle$ between the random output $Y$ and the random feature vector $\Phi(U)$,…

Machine Learning · Computer Science 2020-10-28 Raphaël Berthier , Francis Bach , Pierre Gaillard

Statistical Inference for Online Decision Making via Stochastic Gradient Descent

Online decision making aims to learn the optimal decision rule by making personalized decisions and updating the decision rule recursively. It has become easier than before with the help of big data, but new challenges also come along.…

Machine Learning · Statistics 2020-10-16 Haoyu Chen , Wenbin Lu , Rui Song

Truncated Kernel Stochastic Gradient Descent with General Losses and Spherical Radial Basis Functions

In this paper, we propose a novel kernel stochastic gradient descent (SGD) algorithm for large-scale supervised learning with general losses. Compared to traditional kernel SGD, our algorithm improves efficiency and scalability through an…

Machine Learning · Computer Science 2026-04-28 Jinhui Bai , Andreas Christmann , Lei Shi

Differentiable Calibration of Inexact Stochastic Simulation Models via Kernel Score Minimization

Stochastic simulation models are generative models that mimic complex systems to help with decision-making. The reliability of these models heavily depends on well-calibrated input model parameters. However, in many practical scenarios,…

Methodology · Statistics 2024-11-11 Ziwei Su , Diego Klabjan

Deep Clustered Convolutional Kernels

Deep neural networks have recently achieved state of the art performance thanks to new training algorithms for rapid parameter estimation and new regularization methods to reduce overfitting. However, in practice the network architecture…

Machine Learning · Computer Science 2016-03-04 Minyoung Kim , Luca Rigazio