Related papers: Linear Learning with Sparse Data

Stabilized Sparse Online Learning for Sparse Data

Stochastic gradient descent (SGD) is commonly used for optimization in large-scale machine learning problems. Langford et al. (2009) introduce a sparse online learning method to induce sparsity via truncated gradient. With high-dimensional…

Machine Learning · Statistics 2017-05-10 Yuting Ma , Tian Zheng

An adaptive shortest-solution guided decimation approach to sparse high-dimensional linear regression

High-dimensional linear regression model is the most popular statistical model for high-dimensional data, but it is quite a challenging task to achieve a sparse set of regression coefficients. In this paper, we propose a simple heuristic…

Machine Learning · Computer Science 2022-11-29 Xue Yu , Yifan Sun , Haijun Zhou

When Does Stochastic Gradient Algorithm Work Well?

In this paper, we consider a general stochastic optimization problem which is often at the core of supervised learning, such as deep learning and linear classification. We consider a standard stochastic gradient descent (SGD) method with a…

Machine Learning · Statistics 2018-12-27 Lam M. Nguyen , Nam H. Nguyen , Dzung T. Phan , Jayant R. Kalagnanam , Katya Scheinberg

Stochastic gradient descent with noise of machine learning type. Part II: Continuous time analysis

The representation of functions by artificial neural networks depends on a large number of parameters in a non-linear fashion. Suitable parameters of these are found by minimizing a 'loss functional', typically by stochastic gradient…

Machine Learning · Computer Science 2021-09-16 Stephan Wojtowytsch

Statistical Inference for Model Parameters in Stochastic Gradient Descent

The stochastic gradient descent (SGD) algorithm has been widely used in statistical estimation for large-scale data due to its computational and memory efficiency. While most existing works focus on the convergence of the objective function…

Machine Learning · Statistics 2023-11-02 Xi Chen , Jason D. Lee , Xin T. Tong , Yichen Zhang

Learning sparse gradients for variable selection and dimension reduction

Variable selection and dimension reduction are two commonly adopted approaches for high-dimensional data analysis, but have traditionally been treated separately. Here we propose an integrated approach, called sparse gradient learning…

Machine Learning · Statistics 2010-07-02 Gui-Bo Ye , Xiaohui Xie

Staleness-aware Async-SGD for Distributed Deep Learning

Deep neural networks have been shown to achieve state-of-the-art performance in several machine learning tasks. Stochastic Gradient Descent (SGD) is the preferred optimization algorithm for training these networks and asynchronous SGD…

Machine Learning · Computer Science 2016-04-06 Wei Zhang , Suyog Gupta , Xiangru Lian , Ji Liu

Stochastic Gradient Descent for Nonconvex Learning without Bounded Gradient Assumptions

Stochastic gradient descent (SGD) is a popular and efficient method with wide applications in training deep neural nets and other nonconvex models. While the behavior of SGD is well understood in the convex learning setting, the existing…

Machine Learning · Computer Science 2019-12-16 Yunwen Lei , Ting Hu , Guiying Li , Ke Tang

A Simplified Analysis of SGD for Linear Regression with Weight Averaging

Theoretically understanding stochastic gradient descent (SGD) in overparameterized models has led to the development of several optimization algorithms that are widely used in practice today. Recent work by~\citet{zou2021benign} provides…

Machine Learning · Computer Science 2025-06-19 Alexandru Meterez , Depen Morwani , Costin-Andrei Oncescu , Jingfeng Wu , Cengiz Pehlevan , Sham Kakade

Adaptive learning rates and parallelization for stochastic, sparse, non-smooth gradients

Recent work has established an empirically successful framework for adapting learning rates for stochastic gradient descent (SGD). This effectively removes all needs for tuning, while automatically reducing learning rates over time on…

Machine Learning · Computer Science 2013-03-28 Tom Schaul , Yann LeCun

Guided parallelized stochastic gradient descent for delay compensation

Stochastic gradient descent (SGD) algorithm and its variations have been effectively used to optimize neural network models. However, with the rapid growth of big data and deep learning, SGD is no longer the most suitable choice due to its…

Machine Learning · Computer Science 2024-02-13 Anuraganand Sharma

Always-Sparse Training by Growing Connections with Guided Stochastic Exploration

The excessive computational requirements of modern artificial neural networks (ANNs) are posing limitations on the machines that can run them. Sparsification of ANNs is often motivated by time, memory and energy savings only during model…

Machine Learning · Computer Science 2025-05-01 Mike Heddes , Narayan Srinivasa , Tony Givargis , Alexandru Nicolau

LASG: Lazily Aggregated Stochastic Gradients for Communication-Efficient Distributed Learning

This paper targets solving distributed machine learning problems such as federated learning in a communication-efficient fashion. A class of new stochastic gradient descent (SGD) approaches have been developed, which can be viewed as the…

Optimization and Control · Mathematics 2020-02-27 Tianyi Chen , Yuejiao Sun , Wotao Yin

Stochastic Gradient Descent for Linear Systems with Missing Data

Traditional methods for solving linear systems have quickly become impractical due to an increase in the size of available data. Utilizing massive amounts of data is further complicated when the data is incomplete or has missing entries. In…

Numerical Analysis · Mathematics 2019-01-09 Anna Ma , Deanna Needell

Non asymptotic analysis of Adaptive stochastic gradient algorithms and applications

In stochastic optimization, a common tool to deal sequentially with large sample is to consider the well-known stochastic gradient algorithm. Nevertheless, since the stepsequence is the same for each direction, this can lead to bad results…

Optimization and Control · Mathematics 2023-03-03 Antoine Godichon-Baggioni , Pierre Tarrago

Sparse Communication for Training Deep Networks

Synchronous stochastic gradient descent (SGD) is the most common method used for distributed training of deep learning models. In this algorithm, each worker shares its local gradients with others and updates the parameters using the…

Machine Learning · Computer Science 2020-09-22 Negar Foroutan Eghlidi , Martin Jaggi

Asynchronous Stochastic Gradient Descent with Delay Compensation

With the fast development of deep learning, it has become common to learn big neural networks using massive training data. Asynchronous Stochastic Gradient Descent (ASGD) is widely adopted to fulfill this task for its efficiency, which is,…

Machine Learning · Computer Science 2020-02-19 Shuxin Zheng , Qi Meng , Taifeng Wang , Wei Chen , Nenghai Yu , Zhi-Ming Ma , Tie-Yan Liu

Stochastic Gradient Descent with Biased but Consistent Gradient Estimators

Stochastic gradient descent (SGD), which dates back to the 1950s, is one of the most popular and effective approaches for performing stochastic optimization. Research on SGD resurged recently in machine learning for optimizing convex loss…

Machine Learning · Computer Science 2019-12-24 Jie Chen , Ronny Luss

Asynchronous Stochastic Gradient Descent with Decoupled Backpropagation and Layer-Wise Updates

The increasing size of deep learning models has made distributed training across multiple devices essential. However, current methods such as distributed data-parallel training suffer from large communication and synchronization overheads…

Machine Learning · Computer Science 2025-02-10 Cabrel Teguemne Fokam , Khaleelulla Khan Nazeer , Lukas König , David Kappel , Anand Subramoney

Differential Equations for Modeling Asynchronous Algorithms

Asynchronous stochastic gradient descent (ASGD) is a popular parallel optimization algorithm in machine learning. Most theoretical analysis on ASGD take a discrete view and prove upper bounds for their convergence rates. However, the…

Machine Learning · Statistics 2018-05-09 Li He , Qi Meng , Wei Chen , Zhi-Ming Ma , Tie-Yan Liu