Related papers: Thick-Net: Parallel Network Structure for Sequenti…

Sliced Recurrent Neural Networks

Recurrent neural networks have achieved great success in many NLP tasks. However, they have difficulty in parallelization because of the recurrent structure, so it takes much time to train RNNs. In this paper, we introduce sliced recurrent…

Computation and Language · Computer Science 2018-07-09 Zeping Yu , Gongshen Liu

Exploring Hidden Dimensions in Parallelizing Convolutional Neural Networks

The past few years have witnessed growth in the computational requirements for training deep convolutional neural networks. Current approaches parallelize training onto multiple devices by applying a single parallelization strategy (e.g.,…

Machine Learning · Computer Science 2018-06-12 Zhihao Jia , Sina Lin , Charles R. Qi , Alex Aiken

Multi-Residual Networks: Improving the Speed and Accuracy of Residual Networks

In this article, we take one step toward understanding the learning behavior of deep residual networks, and supporting the observation that deep residual networks behave like ensembles. We propose a new convolutional neural network…

Computer Vision and Pattern Recognition · Computer Science 2024-10-30 Masoud Abdi , Saeid Nahavandi

Parallel Training of Deep Networks with Local Updates

Deep learning models trained on large data sets have been widely successful in both vision and language domains. As state-of-the-art deep learning architectures have continued to grow in parameter count so have the compute budgets and times…

Machine Learning · Computer Science 2021-06-16 Michael Laskin , Luke Metz , Seth Nabarro , Mark Saroufim , Badreddine Noune , Carlo Luschi , Jascha Sohl-Dickstein , Pieter Abbeel

Study of Residual Networks for Image Recognition

Deep neural networks demonstrate to have a high performance on image classification tasks while being more difficult to train. Due to the complexity and vanishing gradient problem, it normally takes a lot of time and more computational…

Computer Vision and Pattern Recognition · Computer Science 2018-05-02 Mohammad Sadegh Ebrahimi , Hossein Karkeh Abadi

Layer-Parallel Training of Deep Residual Neural Networks

Residual neural networks (ResNets) are a promising class of deep neural networks that have shown excellent performance for a number of learning tasks, e.g., image classification and recognition. Mathematically, ResNet architectures can be…

Optimization and Control · Mathematics 2019-07-26 S. Günther , L. Ruthotto , J. B. Schroder , E. C. Cyr , N. R. Gauger

Quasi-Recurrent Neural Networks

Recurrent neural networks are a powerful tool for modeling sequential data, but the dependence of each timestep's computation on the previous timestep's output limits parallelism and makes RNNs unwieldy for very long sequences. We introduce…

Neural and Evolutionary Computing · Computer Science 2016-11-22 James Bradbury , Stephen Merity , Caiming Xiong , Richard Socher

Going Wider: Recurrent Neural Network With Parallel Cells

Recurrent Neural Network (RNN) has been widely applied for sequence modeling. In RNN, the hidden states at current step are full connected to those at previous step, thus the influence from less related features at previous step may…

Computation and Language · Computer Science 2017-05-04 Danhao Zhu , Si Shen , Xin-Yu Dai , Jiajun Chen

BLoad: Enhancing Neural Network Training with Efficient Sequential Data Handling

The increasing complexity of modern deep neural network models and the expanding sizes of datasets necessitate the development of optimized and scalable training methods. In this white paper, we addressed the challenge of efficiently…

Machine Learning · Computer Science 2024-04-29 Raphael Ruschel , A. S. M. Iftekhar , B. S. Manjunath , Suya You

Improving Strong-Scaling of CNN Training by Exploiting Finer-Grained Parallelism

Scaling CNN training is necessary to keep up with growing datasets and reduce training time. We also see an emerging need to handle datasets with very large samples, where memory requirements for training are large. Existing training…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-03-18 Nikoli Dryden , Naoya Maruyama , Tom Benson , Tim Moon , Marc Snir , Brian Van Essen

CompNet: Neural networks growing via the compact network morphism

It is often the case that the performance of a neural network can be improved by adding layers. In real-world practices, we always train dozens of neural network architectures in parallel which is a wasteful process. We explored $CompNet$,…

Neural and Evolutionary Computing · Computer Science 2018-04-30 Jun Lu , Wei Ma , Boi Faltings

Make Deep Networks Shallow Again

Deep neural networks have a good success record and are thus viewed as the best architecture choice for complex applications. Their main shortcoming has been, for a long time, the vanishing gradient which prevented the numerical…

Machine Learning · Computer Science 2024-05-02 Bernhard Bermeitinger , Tomas Hrycej , Siegfried Handschuh

On the Resurgence of Recurrent Models for Long Sequences -- Survey and Research Opportunities in the Transformer Era

A longstanding challenge for the Machine Learning community is the one of developing models that are capable of processing and learning from very long sequences of data. The outstanding results of Transformers-based networks (e.g., Large…

Machine Learning · Computer Science 2024-02-15 Matteo Tiezzi , Michele Casoni , Alessandro Betti , Tommaso Guidi , Marco Gori , Stefano Melacci

Linked Recurrent Neural Networks

Recurrent Neural Networks (RNNs) have been proven to be effective in modeling sequential data and they have been applied to boost a variety of tasks such as document classification, speech recognition and machine translation. Most of…

Computation and Language · Computer Science 2018-08-21 Zhiwei Wang , Yao Ma , Dawei Yin , Jiliang Tang

Rethinking Recurrent Neural Networks and Other Improvements for Image Classification

Over the long history of machine learning, which dates back several decades, recurrent neural networks (RNNs) have been used mainly for sequential data and time series and generally with 1D information. Even in some rare studies on 2D…

Computer Vision and Pattern Recognition · Computer Science 2021-03-05 Nguyen Huu Phong , Bernardete Ribeiro

Online Learning for the Random Feature Model in the Student-Teacher Framework

Deep neural networks are widely used prediction algorithms whose performance often improves as the number of weights increases, leading to over-parametrization. We consider a two-layered neural network whose first layer is frozen while the…

Machine Learning · Computer Science 2023-04-10 Roman Worschech , Bernd Rosenow

Neural Networks Regularization Through Representation Learning

Neural network models and deep models are one of the leading and state of the art models in machine learning. Most successful deep neural models are the ones with many layers which highly increases their number of parameters. Training such…

Machine Learning · Computer Science 2018-07-17 Soufiane Belharbi

Retentive Network: A Successor to Transformer for Large Language Models

In this work, we propose Retentive Network (RetNet) as a foundation architecture for large language models, simultaneously achieving training parallelism, low-cost inference, and good performance. We theoretically derive the connection…

Computation and Language · Computer Science 2023-08-10 Yutao Sun , Li Dong , Shaohan Huang , Shuming Ma , Yuqing Xia , Jilong Xue , Jianyong Wang , Furu Wei

Reframing Neural Networks: Deep Structure in Overcomplete Representations

In comparison to classical shallow representation learning techniques, deep neural networks have achieved superior performance in nearly every application benchmark. But despite their clear empirical advantages, it is still not well…

Machine Learning · Computer Science 2022-01-11 Calvin Murdock , George Cazenavette , Simon Lucey

End-To-End Data-Dependent Routing in Multi-Path Neural Networks

Neural networks are known to give better performance with increased depth due to their ability to learn more abstract features. Although the deepening of networks has been well established, there is still room for efficient feature…

Computer Vision and Pattern Recognition · Computer Science 2023-03-01 Dumindu Tissera , Rukshan Wijessinghe , Kasun Vithanage , Alex Xavier , Subha Fernando , Ranga Rodrigo