English
Related papers

Related papers: SGB: Stochastic Gradient Bound Method for Optimizi…

200 papers

Partition functions arise in a variety of settings, including conditional random fields, logistic regression, and latent gaussian models. In this paper, we consider semistochastic quadratic bound (SQB) methods for maximum likelihood…

Machine Learning · Statistics 2014-02-19 Aleksandr Y. Aravkin , Anna Choromanska , Tony Jebara , Dimitri Kanevsky

Recently a majorization method for optimizing partition functions of log-linear models was proposed alongside a novel quadratic variational upper-bound. In the batch setting, it outperformed state-of-the-art first- and second-order…

Machine Learning · Computer Science 2013-09-24 Anna Choromanska , Tony Jebara

Stochastic gradient descent (SGD) is a fundamental optimization algorithm widely used in modern machine learning. In this paper, we propose Factor-Augmented SGD (FSGD), a new optimization method that leverages latent factor representations…

Machine Learning · Statistics 2026-05-20 Shubo Li , Yuefeng Han , Xiufan Yu

Stochastic Gradient Descent (SGD) is an important algorithm in machine learning. With constant learning rates, it is a stochastic process that, after an initial phase of convergence, generates samples from a stationary distribution. We show…

Machine Learning · Statistics 2017-09-12 Stephan Mandt , Matthew D. Hoffman , David M. Blei

We present an algorithm for minimizing a sum of functions that combines the computational efficiency of stochastic gradient descent (SGD) with the second order curvature information leveraged by quasi-Newton methods. We unify these…

Machine Learning · Computer Science 2014-12-02 Jascha Sohl-Dickstein , Ben Poole , Surya Ganguli

The stochastic gradient descent (SGD) optimization algorithm plays a central role in a series of machine learning applications. The scientific literature provides a vast amount of upper error bounds for the SGD method. Much less attention…

Numerical Analysis · Mathematics 2020-10-05 Arnulf Jentzen , Philippe von Wurstemberger

Many relevant problems in the area of systems and control, such as controller synthesis, observer design and model reduction, can be viewed as optimization problems involving dynamical systems: for instance, maximizing performance in the…

Optimization and Control · Mathematics 2023-11-15 Pascal Den Boef , Jos Maubach , Wil Schilders , Nathan van de Wouw

In this paper, we consider a general stochastic optimization problem which is often at the core of supervised learning, such as deep learning and linear classification. We consider a standard stochastic gradient descent (SGD) method with a…

Machine Learning · Statistics 2018-12-27 Lam M. Nguyen , Nam H. Nguyen , Dzung T. Phan , Jayant R. Kalagnanam , Katya Scheinberg

Stochastic gradient descent (SGD) is a widely used algorithm in machine learning, particularly for neural network training. Recent studies on SGD for canonical quadratic optimization or linear regression show it attains well generalization…

Machine Learning · Computer Science 2024-09-17 Haihan Zhang , Yuanshi Liu , Qianwen Chen , Cong Fang

We study the class of subdifferentially polynomially bounded (SPB) functions, which is a rich class of locally Lipschitz functions that encompasses all Lipschitz functions, all gradient- or Hessian-Lipschitz functions, and even some…

Optimization and Control · Mathematics 2025-03-18 Ming Lei , Ting Kei Pong , Shuqin Sun , Man-Chung Yue

Stochastic gradient descent (SGD) holds as a classical method to build large scale machine learning models over big data. A stochastic gradient is typically calculated from a limited number of samples (known as mini-batch), so it…

Machine Learning · Computer Science 2016-01-14 Yadong Mu , Wei Liu , Wei Fan

Stochastic gradient algorithms are the main focus of large-scale optimization problems and led to important successes in the recent advancement of the deep learning algorithms. The convergence of SGD depends on the careful choice of…

Machine Learning · Computer Science 2017-03-03 Caglar Gulcehre , Jose Sotelo , Marcin Moczulski , Yoshua Bengio

Bilevel optimization problems are receiving increasing attention in machine learning as they provide a natural framework for hyperparameter optimization and meta-learning. A key step to tackle these problems is the efficient computation of…

Machine Learning · Statistics 2025-05-20 Riccardo Grazzi , Massimiliano Pontil , Saverio Salzo

We propose a new stochastic optimization framework for empirical risk minimization problems such as those that arise in machine learning. The traditional approaches, such as (mini-batch) stochastic gradient descent (SGD), utilize an…

Machine Learning · Statistics 2020-02-04 Kenji Kawaguchi , Haihao Lu

The graduated optimization approach is a method for finding global optimal solutions for nonconvex functions by using a function smoothing operation with stochastic noise. This paper makes three contributions regarding graduated…

Machine Learning · Computer Science 2026-01-27 Naoki Sato , Hideaki Iiduka

We consider stochastic convex optimization problems where the objective is an expectation over smooth functions. For this setting we suggest a novel gradient estimate that combines two recent mechanism that are related to notion of…

Machine Learning · Computer Science 2025-03-06 Tehila Dahan , Kfir Y. Levy

State-of-the-art training algorithms for deep learning models are based on stochastic gradient descent (SGD). Recently, many variations have been explored: perturbing parameters for better accuracy (such as in Extragradient), limiting SGD…

Machine Learning · Computer Science 2022-03-23 Amirkeivan Mohtashami , Martin Jaggi , Sebastian U. Stich

Despite an extensive body of literature on deep learning optimization, our current understanding of what makes an optimization algorithm effective is fragmented. In particular, we do not understand well whether enhanced optimization…

Machine Learning · Computer Science 2024-03-04 Toki Tahmid Inan , Mingrui Liu , Amarda Shehu

In this paper, we initiate a study of functional minimization in Federated Learning. First, in the semi-heterogeneous setting, when the marginal distributions of the feature vectors on client machines are identical, we develop the federated…

Machine Learning · Computer Science 2021-03-15 Zebang Shen , Hamed Hassani , Satyen Kale , Amin Karbasi

Classical stochastic gradient methods are well suited for minimizing expected-value objective functions. However, they do not apply to the minimization of a nonlinear function involving expected values or a composition of two expected-value…

Machine Learning · Statistics 2014-11-17 Mengdi Wang , Ethan X. Fang , Han Liu
‹ Prev 1 2 3 10 Next ›