English
Related papers

Related papers: Image Classification at Supercomputer Scale

200 papers

There has been a strong demand for algorithms that can execute machine learning as faster as possible and the speed of deep learning has accelerated by 30 times only in the past two years. Distributed deep learning using the large…

Scaling the distributed deep learning to a massive GPU cluster level is challenging due to the instability of the large mini-batch training and the overhead of the gradient synchronization. We address the instability of the large mini-batch…

Machine Learning · Computer Science 2019-03-06 Hiroaki Mikami , Hisahiro Suganuma , Pongsakorn U-chupala , Yoshiki Tanaka , Yuichi Kageyama

EfficientNets are a family of state-of-the-art image classification models based on efficiently scaled convolutional neural networks. Currently, EfficientNets can take on the order of days to train; for example, training an EfficientNet-B0…

Machine Learning · Computer Science 2020-11-06 Arissa Wongpanich , Hieu Pham , James Demmel , Mingxing Tan , Quoc Le , Yang You , Sameer Kumar

Synchronized stochastic gradient descent (SGD) optimizers with data parallelism are widely used in training large-scale deep neural networks. Although using larger mini-batch sizes can improve the system scalability by reducing the…

For the past 5 years, the ILSVRC competition and the ImageNet dataset have attracted a lot of interest from the Computer Vision community, allowing for state-of-the-art accuracy to grow tremendously. This should be credited to the use of…

Machine Learning · Statistics 2017-11-17 Valeriu Codreanu , Damian Podareanu , Vikram Saletore

Deep learning thrives with large neural networks and large datasets. However, larger networks and larger datasets result in longer training times that impede research and development progress. Distributed synchronous SGD offers a potential…

Computer Vision and Pattern Recognition · Computer Science 2018-05-02 Priya Goyal , Piotr Dollár , Ross Girshick , Pieter Noordhuis , Lukasz Wesolowski , Aapo Kyrola , Andrew Tulloch , Yangqing Jia , Kaiming He

Distributed training techniques have been widely deployed in large-scale deep neural networks (DNNs) training on dense-GPU clusters. However, on public cloud clusters, due to the moderate inter-connection bandwidth between instances,…

Finishing 90-epoch ImageNet-1k training with ResNet-50 on a NVIDIA M40 GPU takes 14 days. This training requires 10^18 single precision operations in total. On the other hand, the world's current fastest supercomputer can finish 2 * 10^17…

Computer Vision and Pattern Recognition · Computer Science 2018-02-01 Yang You , Zhao Zhang , Cho-Jui Hsieh , James Demmel , Kurt Keutzer

This study aims to optimize the few-shot image classification task and improve the model's feature extraction and classification performance by combining self-supervised learning with the deep network model ResNet-101. During the training…

Computer Vision and Pattern Recognition · Computer Science 2024-11-20 Yuyang Xiao

In this paper, we evaluate training of deep recurrent neural networks with half-precision floats. We implement a distributed, data-parallel, synchronous training algorithm by integrating TensorFlow and CUDA-aware MPI to enable execution…

Machine Learning · Computer Science 2019-12-03 Alexey Svyatkovskiy , Julian Kates-Harbeck , William Tang

We demonstrate that training ResNet-50 on ImageNet for 90 epochs can be achieved in 15 minutes with 1024 Tesla P100 GPUs. This was made possible by using a large minibatch size of 32k. To maintain accuracy with this large minibatch size, we…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-11-15 Takuya Akiba , Shuji Suzuki , Keisuke Fukuda

Understanding the patterns of misclassified ImageNet images is particularly important, as it could guide us to design deep neural networks (DNN) that generalize better. However, the richness of ImageNet imposes difficulties for researchers…

Computer Vision and Pattern Recognition · Computer Science 2022-01-21 Shixian Wen , Amanda Sofie Rios , Kiran Lekkala , Laurent Itti

This paper presents a comparative analysis of distributed training strategies for large-scale neural networks, focusing on data parallelism, model parallelism, and hybrid approaches. We evaluate these strategies on image classification…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-04-01 Vishnu Vardhan Baligodugula , Fathi Amsaad

Batch normalization is a key component of most image classification models, but it has many undesirable properties stemming from its dependence on the batch size and interactions between examples. Although recent work has succeeded in…

Computer Vision and Pattern Recognition · Computer Science 2021-02-12 Andrew Brock , Soham De , Samuel L. Smith , Karen Simonyan

Deep Neural Networks (DNNs) have achieved im- pressive accuracy in many application domains including im- age classification. Training of DNNs is an extremely compute- intensive process and is solved using variants of the stochastic…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-11-03 Sameer Kumar , Dheeraj Sreedhar , Vaibhav Saxena , Yogish Sabharwal , Ashish Verma

Nowadays deep learning-based methods have achieved a remarkable progress at the image classification task among a wide range of commonly used datasets (ImageNet, CIFAR, SVHN, Caltech 101, SUN397, etc.). SOTA performance on each of the…

Computer Vision and Pattern Recognition · Computer Science 2021-08-17 Kirill Prokofiev , Vladislav Sovrasov

GPUs have been favored for training deep learning models due to their highly parallelized architecture. As a result, most studies on training optimization focus on GPUs. There is often a trade-off, however, between cost and efficiency when…

In general, sufficient data is essential for the better performance and generalization of deep-learning models. However, lots of limitations(cost, resources, etc.) of data collection leads to lack of enough data in most of the areas. In…

Computer Vision and Pattern Recognition · Computer Science 2020-07-16 Byeongjo Kim , Chanran Kim , Jaehoon Lee , Jein Song , Gyoungsoo Park

Deep learning methods for pansharpening have advanced rapidly, yet models pretrained on data from a specific sensor often generalize poorly to data from other sensors. Existing methods to tackle such cross-sensor degradation include…

Computer Vision and Pattern Recognition · Computer Science 2025-11-21 Tianyu Xin , Jin-Liang Xiao , Zeyu Xia , Shan Yin , Liang-Jian Deng

Large-batch training is an efficient approach for current distributed deep learning systems. It has enabled researchers to reduce the ImageNet/ResNet-50 training from 29 hours to around 1 minute. In this paper, we focus on studying the…

Machine Learning · Computer Science 2020-06-16 Yang You , Yuhui Wang , Huan Zhang , Zhao Zhang , James Demmel , Cho-Jui Hsieh
‹ Prev 1 2 3 10 Next ›