English
Related papers

Related papers: Minibatching Offers Improved Generalization Perfor…

200 papers

Large-scale distributed training of deep neural networks suffer from the generalization gap caused by the increase in the effective mini-batch size. Previous approaches try to solve this problem by varying the learning rate and batch size…

Machine Learning · Computer Science 2019-04-02 Kazuki Osawa , Yohei Tsuji , Yuichiro Ueno , Akira Naruse , Rio Yokota , Satoshi Matsuoka

Modern deep neural network training is typically based on mini-batch stochastic gradient optimization. While the use of large mini-batches increases the available computational parallelism, small batch training has been shown to provide…

Machine Learning · Computer Science 2018-04-23 Dominic Masters , Carlo Luschi

This paper investigates how various randomization techniques impact Deep Neural Networks (DNNs). Randomization, like weight noise and dropout, aids in reducing overfitting and enhancing generalization, but their interactions are poorly…

The unprecedented growth of deep learning models has enabled remarkable advances but introduced substantial computational bottlenecks. A key factor contributing to training efficiency is batch-size and learning-rate scheduling in stochastic…

Machine Learning · Computer Science 2025-08-08 Hikaru Umeda , Hideaki Iiduka

First-order stochastic methods are the state-of-the-art in large-scale machine learning optimization owing to efficient per-iteration complexity. Second-order methods, while able to provide faster convergence, have been much less explored…

Machine Learning · Statistics 2017-12-01 Naman Agarwal , Brian Bullins , Elad Hazan

Second-order optimization has been developed to accelerate the training of deep neural networks and it is being applied to increasingly larger-scale models. In this study, towards training on further larger scales, we identify a specific…

Machine Learning · Computer Science 2024-06-11 Satoki Ishikawa , Ryo Karakida

In training neural networks, it is common practice to use partial gradients computed over batches, mostly very small subsets of the training set. This approach is motivated by the argument that such a partial gradient is close to the true…

Machine Learning · Computer Science 2024-11-25 Jan Spörer , Bernhard Bermeitinger , Tomas Hrycej , Niklas Limacher , Siegfried Handschuh

Many machine learning models involve solving optimization problems. Thus, it is important to deal with a large-scale optimization problem in big data applications. Recently, subsampled Newton methods have emerged to attract much attention…

Numerical Analysis · Computer Science 2020-03-24 Haishan Ye , Luo Luo , Zhihua Zhang

Deep neural networks (DNNs) are typically optimized using various forms of mini-batch gradient descent algorithm. A major motivation for mini-batch gradient descent is that with a suitably chosen batch size, available computing resources…

Machine Learning · Computer Science 2022-10-25 Oyebade K. Oyedotun , Konstantinos Papadopoulos , Djamila Aouada

Practical results have shown that deep learning optimizers using small constant learning rates, hyperparameters close to one, and large batch sizes can find the model parameters of deep neural networks that minimize the loss functions. We…

Machine Learning · Computer Science 2022-08-23 Hideaki Iiduka

Distributed training in deep learning (DL) is common practice as data and models grow. The current practice for distributed training of deep neural networks faces the challenges of communication bottlenecks when operating at scale, and…

Machine Learning · Computer Science 2020-12-21 Shubhankar Gahlot , Junqi Yin , Mallikarjun Shankar

Large-scale supervised classification algorithms, especially those based on deep convolutional neural networks (DCNNs), require vast amounts of training data to achieve state-of-the-art performance. Decreasing this data requirement would…

Computer Vision and Pattern Recognition · Computer Science 2016-06-15 Maya Kabkab , Azadeh Alavi , Rama Chellappa

Background: It is still an open research area to theoretically understand why Deep Neural Networks (DNNs)---equipped with many more parameters than training data and trained by (stochastic) gradient-based methods---often achieve remarkably…

Machine Learning · Computer Science 2018-11-30 Zhiqin John Xu

Optimization in machine learning, both theoretical and applied, is presently dominated by first-order gradient methods such as stochastic gradient descent. Second-order optimization methods, that involve second derivatives and/or second…

Machine Learning · Computer Science 2021-03-08 Rohan Anil , Vineet Gupta , Tomer Koren , Kevin Regan , Yoram Singer

Deep learning have achieved promising results on a wide spectrum of AI applications. Larger datasets and models consistently yield better performance. However, we generally spend longer training time on more computation and communication.…

Machine Learning · Computer Science 2021-11-03 Xiaoxin He , Fuzhao Xue , Xiaozhe Ren , Yang You

Synchronous strategies with data parallelism, such as the Synchronous StochasticGradient Descent (S-SGD) and the model averaging methods, are widely utilizedin distributed training of Deep Neural Networks (DNNs), largely owing to itseasy…

Machine Learning · Computer Science 2022-11-04 Qing Ye , Yuhao Zhou , Mingjia Shi , Yanan Sun , Jiancheng Lv

In this paper, we consider both first- and second-order techniques to address continuous optimization problems arising in machine learning. In the first-order case, we propose a framework of transition from deterministic or…

Machine Learning · Computer Science 2021-11-30 Sanae Lotfi , Tiphaine Bonniot de Ruisselet , Dominique Orban , Andrea Lodi

Training time budget and size of the dataset are among the factors affecting the performance of a Deep Neural Network (DNN). This paper shows that Neural Architecture Search (NAS), Hyper Parameters Optimization (HPO), and Data Augmentation…

Machine Learning · Computer Science 2023-01-24 Mahdi Zolnouri , Dounia Lakhmiri , Christophe Tribes , Eyyüb Sari , Sébastien Le Digabel

Mini-batch stochastic gradient methods (SGD) are state of the art for distributed training of deep neural networks. Drastic increases in the mini-batch sizes have lead to key efficiency and scalability gains in recent years. However,…

Machine Learning · Computer Science 2020-02-18 Tao Lin , Sebastian U. Stich , Kumar Kshitij Patel , Martin Jaggi

Recent hardware developments have dramatically increased the scale of data parallelism available for neural network training. Among the simplest ways to harness next-generation hardware is to increase the batch size in standard mini-batch…

Machine Learning · Computer Science 2019-07-22 Christopher J. Shallue , Jaehoon Lee , Joseph Antognini , Jascha Sohl-Dickstein , Roy Frostig , George E. Dahl
‹ Prev 1 2 3 10 Next ›