Related papers: Stacking as Accelerated Gradient Descent

StackRec: Efficient Training of Very Deep Sequential Recommender Models by Iterative Stacking

Deep learning has brought great progress for the sequential recommendation (SR) tasks. With advanced network architectures, sequential recommender models can be stacked with many hidden layers, e.g., up to 100 layers on real-world…

Information Retrieval · Computer Science 2021-05-13 Jiachun Wang , Fajie Yuan , Jian Chen , Qingyao Wu , Min Yang , Yang Sun , Guoxiao Zhang

Stacked networks improve physics-informed training: applications to neural networks and deep operator networks

Physics-informed neural networks and operator networks have shown promise for effectively solving equations modeling physical systems. However, these networks can be difficult or impossible to train accurately for some systems of equations.…

Machine Learning · Computer Science 2023-11-22 Amanda A Howard , Sarah H Murphy , Shady E Ahmed , Panos Stinis

Sequential Training of Neural Networks with Gradient Boosting

This paper presents a novel technique based on gradient boosting to train the final layers of a neural network (NN). Gradient boosting is an additive expansion algorithm in which a series of models are trained sequentially to approximate a…

Machine Learning · Computer Science 2023-05-05 Seyedsaman Emami , Gonzalo Martínez-Muñoz

SNN: Stacked Neural Networks

It has been proven that transfer learning provides an easy way to achieve state-of-the-art accuracies on several vision tasks by training a simple classifier on top of features obtained from pre-trained neural networks. The goal of this…

Machine Learning · Computer Science 2016-06-07 Milad Mohammadi , Subhasis Das

A Generalized Stacking for Implementing Ensembles of Gradient Boosting Machines

The gradient boosting machine is one of the powerful tools for solving regression problems. In order to cope with its shortcomings, an approach for constructing ensembles of gradient boosting models is proposed. The main idea behind the…

Machine Learning · Computer Science 2020-10-14 Andrei V. Konstantinov , Lev V. Utkin

Staleness-aware Async-SGD for Distributed Deep Learning

Deep neural networks have been shown to achieve state-of-the-art performance in several machine learning tasks. Stochastic Gradient Descent (SGD) is the preferred optimization algorithm for training these networks and asynchronous SGD…

Machine Learning · Computer Science 2016-04-06 Wei Zhang , Suyog Gupta , Xiangru Lian , Ji Liu

Training Stacked Denoising Autoencoders for Representation Learning

We implement stacked denoising autoencoders, a class of neural networks that are capable of learning powerful representations of high dimensional data. We describe stochastic gradient descent for unsupervised training of autoencoders, as…

Machine Learning · Computer Science 2021-02-17 Jason Liang , Keith Kelly

Distributed Deep Learning using Stochastic Gradient Staleness

Despite the notable success of deep neural networks (DNNs) in solving complex tasks, the training process still remains considerable challenges. A primary obstacle is the substantial time required for training, particularly as high…

Machine Learning · Computer Science 2025-09-09 Viet Hoang Pham , Hyo-Sung Ahn

On the insufficiency of existing momentum schemes for Stochastic Optimization

Momentum based stochastic gradient methods such as heavy ball (HB) and Nesterov's accelerated gradient descent (NAG) method are widely used in practice for training deep networks and other supervised learning models, as they often provide…

Machine Learning · Computer Science 2018-08-02 Rahul Kidambi , Praneeth Netrapalli , Prateek Jain , Sham M. Kakade

Bayesian hierarchical stacking: Some models are (somewhere) useful

Stacking is a widely used model averaging technique that asymptotically yields optimal predictions among linear averages. We show that stacking is most effective when model predictive performance is heterogeneous in inputs, and we can…

Methodology · Statistics 2021-10-29 Yuling Yao , Gregor Pirš , Aki Vehtari , Andrew Gelman

Reinforced stochastic gradient descent for deep neural network learning

Stochastic gradient descent (SGD) is a standard optimization method to minimize a training error with respect to network parameters in modern neural network learning. However, it typically suffers from proliferation of saddle points in the…

Machine Learning · Computer Science 2017-11-23 Haiping Huang , Taro Toyoizumi

On the Inductive Bias of Stacking Towards Improving Reasoning

Given the increasing scale of model sizes, novel training strategies like gradual stacking [Gong et al., 2019, Reddi et al., 2023] have garnered interest. Stacking enables efficient training by gradually growing the depth of a model in…

Computation and Language · Computer Science 2024-10-01 Nikunj Saunshi , Stefani Karp , Shankar Krishnan , Sobhan Miryoosefi , Sashank J. Reddi , Sanjiv Kumar

Towards Guided Descent: Optimization Algorithms for Training Neural Networks At Scale

Neural network optimization remains one of the most consequential yet poorly understood challenges in modern AI research, where improvements in training algorithms can lead to enhanced feature learning in foundation models,…

Machine Learning · Computer Science 2025-12-23 Ansh Nagwekar

StackNet: Stacking Parameters for Continual learning

Training a neural network for a classification task typically assumes that the data to train are given from the beginning. However, in the real world, additional data accumulate gradually and the model requires additional training without…

Machine Learning · Computer Science 2020-04-22 Jangho Kim , Jeesoo Kim , Nojun Kwak

Recurrent Stacking of Layers in Neural Networks: An Application to Neural Machine Translation

In deep neural network modeling, the most common practice is to stack a number of recurrent, convolutional, or feed-forward layers in order to obtain high-quality continuous space representations which in turn improves the quality of the…

Computation and Language · Computer Science 2021-06-21 Raj Dabre , Atsushi Fujita

On the Acceleration of Deep Learning Model Parallelism with Staleness

Training the deep convolutional neural network for computer vision problems is slow and inefficient, especially when it is large and distributed across multiple devices. The inefficiency is caused by the backpropagation algorithm's forward…

Machine Learning · Computer Science 2022-01-20 An Xu , Zhouyuan Huo , Heng Huang

Robust Stochastically-Descending Unrolled Networks

Deep unrolling, or unfolding, is an emerging learning-to-optimize method that unrolls a truncated iterative algorithm in the layers of a trainable neural network. However, the convergence guarantees and generalizability of the unrolled…

Machine Learning · Computer Science 2024-12-02 Samar Hadou , Navid NaderiAlizadeh , Alejandro Ribeiro

Deep Stacking Networks for Low-Resource Chinese Word Segmentation with Transfer Learning

In recent years, neural networks have proven to be effective in Chinese word segmentation. However, this promising performance relies on large-scale training data. Neural networks with conventional architectures cannot achieve the desired…

Computation and Language · Computer Science 2017-11-07 Jingjing Xu , Xu Sun , Sujian Li , Xiaoyan Cai , Bingzhen Wei

Accelerated Gradient Boosting

Gradient tree boosting is a prediction algorithm that sequentially produces a model in the form of linear combinations of decision trees, by solving an infinite-dimensional optimization problem. We combine gradient boosting and Nesterov's…

Machine Learning · Statistics 2018-03-07 Gérard Biau , Benoît Cadre , Laurent Rouvìère

Recursive Algorithmic Reasoning

Learning models that execute algorithms can enable us to address a key problem in deep learning: generalizing to out-of-distribution data. However, neural networks are currently unable to execute recursive algorithms because they do not…

Machine Learning · Computer Science 2023-11-22 Jonas Jürß , Dulhan Jayalath , Petar Veličković