Related papers: Parallelizing non-linear sequential models over th…

Parallelizing Linear Recurrent Neural Nets Over Sequence Length

Recurrent neural networks (RNNs) are widely used to model sequential data but their non-linear dependencies between sequence elements prevent parallelizing training over sequence length. We show the training of RNNs with only linear…

Neural and Evolutionary Computing · Computer Science 2018-02-23 Eric Martin , Chris Cundy

Parallel training of linear models without compromising convergence

In this paper we analyze, evaluate, and improve the performance of training generalized linear models on modern CPUs. We start with a state-of-the-art asynchronous parallel training algorithm, identify system-level performance bottlenecks,…

Machine Learning · Computer Science 2018-12-20 Nikolas Ioannou , Celestine Dünner , Kornilios Kourtis , Thomas Parnell

Single stream parallelization of generalized LSTM-like RNNs on a GPU

Recurrent neural networks (RNNs) have shown outstanding performance on processing sequence data. However, they suffer from long training time, which demands parallel implementations of the training procedure. Parallelization of the training…

Neural and Evolutionary Computing · Computer Science 2015-11-25 Kyuyeon Hwang , Wonyong Sung

Parallel Training of GRU Networks with a Multi-Grid Solver for Long Sequences

Parallelizing Gated Recurrent Unit (GRU) networks is a challenging task, as the training procedure of GRU is inherently sequential. Prior efforts to parallelize GRU have largely focused on conventional parallelization strategies such as…

Computer Vision and Pattern Recognition · Computer Science 2022-03-10 Gordon Euhyun Moon , Eric C. Cyr

Parareal Neural Networks Emulating a Parallel-in-time Algorithm

As deep neural networks (DNNs) become deeper, the training time increases. In this perspective, multi-GPU parallel computing has become a key tool in accelerating the training of DNNs. In this paper, we introduce a novel methodology to…

Numerical Analysis · Mathematics 2024-07-08 Chang-Ock Lee , Youngkyu Lee , Jongho Park

ParaRNN: Unlocking Parallel Training of Nonlinear RNNs for Large Language Models

Recurrent Neural Networks (RNNs) laid the foundation for sequence modeling, but their intrinsic sequential nature restricts parallel computation, creating a fundamental barrier to scaling. This has led to the dominance of parallelizable…

Machine Learning · Computer Science 2025-11-04 Federico Danieli , Pau Rodriguez , Miguel Sarabia , Xavier Suau , Luca Zappella

Unifying Optimization and Dynamics to Parallelize Sequential Computation: A Guide to Parallel Newton Methods for Breaking Sequential Bottlenecks

Massively parallel hardware (GPUs) and long sequence data have made parallel algorithms essential for machine learning at scale. Yet dynamical systems, like recurrent neural networks and Markov chain Monte Carlo, were thought to suffer from…

Numerical Analysis · Mathematics 2026-03-18 Xavier Gonzalez

Layer-Parallel Training with GPU Concurrency of Deep Residual Neural Networks via Nonlinear Multigrid

A Multigrid Full Approximation Storage algorithm for solving Deep Residual Networks is developed to enable neural network parallelized layer-wise training and concurrent computational kernel execution on GPUs. This work demonstrates a 10.2x…

Machine Learning · Computer Science 2020-09-01 Andrew C. Kirby , Siddharth Samsi , Michael Jones , Albert Reuther , Jeremy Kepner , Vijay Gadepally

Blockwise Parallel Decoding for Deep Autoregressive Models

Deep autoregressive sequence-to-sequence models have demonstrated impressive performance across a wide variety of tasks in recent years. While common architecture classes such as recurrent, convolutional, and self-attention networks make…

Machine Learning · Computer Science 2018-11-09 Mitchell Stern , Noam Shazeer , Jakob Uszkoreit

Hybrid Data-Model Parallel Training for Sequence-to-Sequence Recurrent Neural Network Machine Translation

Reduction of training time is an important issue in many tasks like patent translation involving neural networks. Data parallelism and model parallelism are two common approaches for reducing training time using multiple graphics processing…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-09-10 Junya Ono , Masao Utiyama , Eiichiro Sumita

Neural GPUs Learn Algorithms

Learning an algorithm from examples is a fundamental problem that has been widely studied. Recently it has been addressed using neural networks, in particular by Neural Turing Machines (NTMs). These are fully differentiable computers that…

Machine Learning · Computer Science 2016-03-16 Łukasz Kaiser , Ilya Sutskever

Sliced Recurrent Neural Networks

Recurrent neural networks have achieved great success in many NLP tasks. However, they have difficulty in parallelization because of the recurrent structure, so it takes much time to train RNNs. In this paper, we introduce sliced recurrent…

Computation and Language · Computer Science 2018-07-09 Zeping Yu , Gongshen Liu

System Identification with Time-Aware Neural Sequence Models

Established recurrent neural networks are well-suited to solve a wide variety of prediction tasks involving discrete sequences. However, they do not perform as well in the task of dynamical system identification, when dealing with…

Machine Learning · Computer Science 2019-11-22 Thomas Demeester

Gated Associative Memory: A Parallel O(N) Architecture for Efficient Sequence Modeling

The Transformer architecture, underpinned by the self-attention mechanism, has become the de facto standard for sequence modeling tasks. However, its core computational primitive scales quadratically with sequence length (O(N^2)), creating…

Computation and Language · Computer Science 2025-09-03 Rishiraj Acharya

GPU Asynchronous Stochastic Gradient Descent to Speed Up Neural Network Training

The ability to train large-scale neural networks has resulted in state-of-the-art performance in many areas of computer vision. These results have largely come from computational break throughs of two forms: model parallelism, e.g. GPU…

Computer Vision and Pattern Recognition · Computer Science 2013-12-24 Thomas Paine , Hailin Jin , Jianchao Yang , Zhe Lin , Thomas Huang

Parallel algorithms for problems of cluster analysis with very large amount of data

In this paper we solve on GPUs massive problems with large amount of data, which are not appropriate for solution with the SIMD technology. For the given problem we consider a three-level parallelization. The multithreading of CPU is used…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-02-18 Natalya Litvinenko

Experiments on Parallel Training of Deep Neural Network using Model Averaging

In this work we apply model averaging to parallel training of deep neural network (DNN). Parallelization is done in a model averaging manner. Data is partitioned and distributed to different nodes for local model updates, and model…

Machine Learning · Computer Science 2018-07-03 Hang Su , Haoyu Chen

Emulating a large memory with a collection of small ones

Sequential computation is well understood but does not scale well with current technology. Within the next decade, systems will contain large numbers of processors with potentially thousands of processors per chip. Despite this, many…

Hardware Architecture · Computer Science 2015-11-17 James Hanlon

Accelerating recurrent neural network training using sequence bucketing and multi-GPU data parallelization

An efficient algorithm for recurrent neural network training is presented. The approach increases the training speed for tasks where a length of the input sequence may vary significantly. The proposed approach is based on the optimal batch…

Machine Learning · Computer Science 2017-08-21 Viacheslav Khomenko , Oleg Shyshkov , Olga Radyvonenko , Kostiantyn Bokhan

DeepPCR: Parallelizing Sequential Operations in Neural Networks

Parallelization techniques have become ubiquitous for accelerating inference and training of deep neural networks. Despite this, several operations are still performed in a sequential manner. For instance, the forward and backward passes…

Machine Learning · Computer Science 2023-10-30 Federico Danieli , Miguel Sarabia , Xavier Suau , Pau Rodríguez , Luca Zappella