English
Related papers

Related papers: Online Learning Under A Separable Stochastic Appro…

200 papers

Stochastic Gradient Descent (SGD) is one of the most widely used techniques for online optimization in machine learning. In this work, we accelerate SGD by adaptively learning how to sample the most useful training examples at each time…

Machine Learning · Computer Science 2016-03-16 Guillaume Bouchard , Théo Trouillon , Julien Perez , Adrien Gaidon

Stochastic convex optimization algorithms are the most popular way to train machine learning models on large-scale data. Scaling up the training process of these models is crucial, but the most popular algorithm, Stochastic Gradient Descent…

Machine Learning · Statistics 2018-10-30 Ashok Cutkosky , Robert Busa-Fekete

Stochastic gradient descent (SGD) is commonly used for optimization in large-scale machine learning problems. Langford et al. (2009) introduce a sparse online learning method to induce sparsity via truncated gradient. With high-dimensional…

Machine Learning · Statistics 2017-05-10 Yuting Ma , Tian Zheng

We consider distributed optimization under communication constraints for training deep learning models. We propose a new algorithm, whose parameter updates rely on two forces: a regular gradient step, and a corrective direction dictated by…

Machine Learning · Computer Science 2022-04-29 Yunfei Teng , Wenbo Gao , Francois Chalus , Anna Choromanska , Donald Goldfarb , Adrian Weller

We consider online learning with linear models, where the algorithm predicts on sequentially revealed instances (feature vectors), and is compared against the best linear function (comparator) in hindsight. Popular algorithms in this…

Machine Learning · Computer Science 2019-02-21 Michał Kempka , Wojciech Kotłowski , Manfred K. Warmuth

Recent work has established an empirically successful framework for adapting learning rates for stochastic gradient descent (SGD). This effectively removes all needs for tuning, while automatically reducing learning rates over time on…

Machine Learning · Computer Science 2013-03-28 Tom Schaul , Yann LeCun

We prove local convergence of several notable gradient descent algorithms used in machine learning, for which standard stochastic gradient descent theory does not apply directly. This includes, first, online algorithms for recurrent models…

Dynamical Systems · Mathematics 2021-01-11 Pierre-Yves Massé , Yann Ollivier

We propose a novel gradient-based online optimization framework for solving stochastic programming problems that frequently arise in the context of cyber-physical and robotic systems. Our problem formulation accommodates constraints that…

Machine Learning · Computer Science 2026-01-06 Hao Ma , Melanie Zeilinger , Michael Muehlebach

Modern machine learning is trained by stochastic gradient descent (SGD), whose performance critically depends on how the learning rate (LR) is adjusted and decreased over time. Yet existing LR regimes may be intricate, or need to tune one…

Machine Learning · Computer Science 2025-08-20 Zhuang Yang

Recursive least squares (RLS) algorithms were once widely used for training small-scale neural networks, due to their fast convergence. However, previous RLS algorithms are unsuitable for training deep neural networks (DNNs), since they…

Machine Learning · Computer Science 2021-09-08 Chunyuan Zhang , Qi Song , Hui Zhou , Yigui Ou , Hongyao Deng , Laurence Tianruo Yang

In this paper, we propose a novel optimization algorithm for training machine learning models called Input Normalized Stochastic Gradient Descent (INSGD), inspired by the Normalized Least Mean Squares (NLMS) algorithm used in adaptive…

Machine Learning · Computer Science 2023-06-28 Salih Atici , Hongyi Pan , Ahmet Enis Cetin

This paper introduces an iterative algorithm for training nonparametric additive models that enjoys favorable memory storage and computational requirements. The algorithm can be viewed as the functional counterpart of stochastic gradient…

Machine Learning · Statistics 2026-01-01 Xin Chen , Jason M. Klusowski

State-of-the-art training algorithms for deep learning models are based on stochastic gradient descent (SGD). Recently, many variations have been explored: perturbing parameters for better accuracy (such as in Extragradient), limiting SGD…

Machine Learning · Computer Science 2022-03-23 Amirkeivan Mohtashami , Martin Jaggi , Sebastian U. Stich

This paper proposes SplitSGD, a new dynamic learning rate schedule for stochastic optimization. This method decreases the learning rate for better adaptation to the local geometry of the objective function whenever a stationary phase is…

Machine Learning · Statistics 2024-02-20 Matteo Sordello , Niccolò Dalmasso , Hangfeng He , Weijie Su

Matrix completion, where we wish to recover a low rank matrix by observing a few entries from it, is a widely studied problem in both theory and practice with wide applications. Most of the provable algorithms so far on this problem have…

Machine Learning · Computer Science 2016-05-27 Chi Jin , Sham M. Kakade , Praneeth Netrapalli

We initiate the study of numerical linear algebra in the sliding window model, where only the most recent $W$ updates in a stream form the underlying data set. We first introduce a unified row-sampling based framework that gives randomized…

Data Structures and Algorithms · Computer Science 2023-04-12 Vladimir Braverman , Petros Drineas , Cameron Musco , Christopher Musco , Jalaj Upadhyay , David P. Woodruff , Samson Zhou

This paper proposes a new family of algorithms for training neural networks (NNs). These are based on recent developments in the field of non-convex optimization, going under the general name of successive convex approximation (SCA)…

Machine Learning · Statistics 2017-06-16 Simone Scardapane , Paolo Di Lorenzo

Stochastic gradient descent (SGD) is a standard optimization method to minimize a training error with respect to network parameters in modern neural network learning. However, it typically suffers from proliferation of saddle points in the…

Machine Learning · Computer Science 2017-11-23 Haiping Huang , Taro Toyoizumi

Modern applications in sensitive domains such as biometrics and medicine frequently require the use of non-decomposable loss functions such as precision@k, F-measure etc. Compared to point loss functions such as hinge-loss, these offer much…

Machine Learning · Computer Science 2014-10-27 Purushottam Kar , Harikrishna Narasimhan , Prateek Jain

In online learning from non-stationary data streams, it is necessary to learn robustly to outliers and to adapt quickly to changes in the underlying data generating mechanism. In this paper, we refer to the former attribute of online…

Machine Learning · Statistics 2021-09-29 Shintaro Fukushima , Atsushi Nitanda , Kenji Yamanishi
‹ Prev 1 2 3 10 Next ›