Related papers: Image Classification at Supercomputer Scale

Yet Another Accelerated SGD: ResNet-50 Training on ImageNet in 74.7 seconds

There has been a strong demand for algorithms that can execute machine learning as faster as possible and the speed of deep learning has accelerated by 30 times only in the past two years. Distributed deep learning using the large…

Machine Learning · Computer Science 2019-04-01 Masafumi Yamazaki , Akihiko Kasagi , Akihiro Tabuchi , Takumi Honda , Masahiro Miwa , Naoto Fukumoto , Tsuguchika Tabaru , Atsushi Ike , Kohta Nakashima

Massively Distributed SGD: ImageNet/ResNet-50 Training in a Flash

Scaling the distributed deep learning to a massive GPU cluster level is challenging due to the instability of the large mini-batch training and the overhead of the gradient synchronization. We address the instability of the large mini-batch…

Machine Learning · Computer Science 2019-03-06 Hiroaki Mikami , Hisahiro Suganuma , Pongsakorn U-chupala , Yoshiki Tanaka , Yuichi Kageyama

Training EfficientNets at Supercomputer Scale: 83% ImageNet Top-1 Accuracy in One Hour

EfficientNets are a family of state-of-the-art image classification models based on efficiently scaled convolutional neural networks. Currently, EfficientNets can take on the order of days to train; for example, training an EfficientNet-B0…

Machine Learning · Computer Science 2020-11-06 Arissa Wongpanich , Hieu Pham , James Demmel , Mingxing Tan , Quoc Le , Yang You , Sameer Kumar

Highly Scalable Deep Learning Training System with Mixed-Precision: Training ImageNet in Four Minutes

Synchronized stochastic gradient descent (SGD) optimizers with data parallelism are widely used in training large-scale deep neural networks. Although using larger mini-batch sizes can improve the system scalability by reducing the…

Machine Learning · Computer Science 2018-07-31 Xianyan Jia , Shutao Song , Wei He , Yangzihao Wang , Haidong Rong , Feihu Zhou , Liqiang Xie , Zhenyu Guo , Yuanzhou Yang , Liwei Yu , Tiegang Chen , Guangxiao Hu , Shaohuai Shi , Xiaowen Chu

Scale out for large minibatch SGD: Residual network training on ImageNet-1K with improved accuracy and reduced time to train

For the past 5 years, the ILSVRC competition and the ImageNet dataset have attracted a lot of interest from the Computer Vision community, allowing for state-of-the-art accuracy to grow tremendously. This should be credited to the use of…

Machine Learning · Statistics 2017-11-17 Valeriu Codreanu , Damian Podareanu , Vikram Saletore

Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour

Deep learning thrives with large neural networks and large datasets. However, larger networks and larger datasets result in longer training times that impede research and development progress. Distributed synchronous SGD offers a potential…

Computer Vision and Pattern Recognition · Computer Science 2018-05-02 Priya Goyal , Piotr Dollár , Ross Girshick , Pieter Noordhuis , Lukasz Wesolowski , Aapo Kyrola , Andrew Tulloch , Yangqing Jia , Kaiming He

Towards Scalable Distributed Training of Deep Learning on Public Cloud Clusters

Distributed training techniques have been widely deployed in large-scale deep neural networks (DNNs) training on dense-GPU clusters. However, on public cloud clusters, due to the moderate inter-connection bandwidth between instances,…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-10-21 Shaohuai Shi , Xianhao Zhou , Shutao Song , Xingyao Wang , Zilin Zhu , Xue Huang , Xinan Jiang , Feihu Zhou , Zhenyu Guo , Liqiang Xie , Rui Lan , Xianbin Ouyang , Yan Zhang , Jieqian Wei , Jing Gong , Weiliang Lin , Ping Gao , Peng Meng , Xiaomin Xu , Chenyang Guo , Bo Yang , Zhibo Chen , Yongjian Wu , Xiaowen Chu

ImageNet Training in Minutes

Finishing 90-epoch ImageNet-1k training with ResNet-50 on a NVIDIA M40 GPU takes 14 days. This training requires 10^18 single precision operations in total. On the other hand, the world's current fastest supercomputer can finish 2 * 10^17…

Computer Vision and Pattern Recognition · Computer Science 2018-02-01 Yang You , Zhao Zhang , Cho-Jui Hsieh , James Demmel , Kurt Keutzer

Self-Supervised Learning in Deep Networks: A Pathway to Robust Few-Shot Classification

This study aims to optimize the few-shot image classification task and improve the model's feature extraction and classification performance by combining self-supervised learning with the deep network model ResNet-101. During the training…

Computer Vision and Pattern Recognition · Computer Science 2024-11-20 Yuyang Xiao

Training Distributed Deep Recurrent Neural Networks with Mixed Precision on GPU Clusters

In this paper, we evaluate training of deep recurrent neural networks with half-precision floats. We implement a distributed, data-parallel, synchronous training algorithm by integrating TensorFlow and CUDA-aware MPI to enable execution…

Machine Learning · Computer Science 2019-12-03 Alexey Svyatkovskiy , Julian Kates-Harbeck , William Tang

Extremely Large Minibatch SGD: Training ResNet-50 on ImageNet in 15 Minutes

We demonstrate that training ResNet-50 on ImageNet for 90 epochs can be achieved in 15 minutes with 1024 Tesla P100 GPUs. This was made possible by using a large minibatch size of 32k. To maintain accuracy with this large minibatch size, we…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-11-15 Takuya Akiba , Shuji Suzuki , Keisuke Fukuda

What can we learn from misclassified ImageNet images?

Understanding the patterns of misclassified ImageNet images is particularly important, as it could guide us to design deep neural networks (DNN) that generalize better. However, the richness of ImageNet imposes difficulties for researchers…

Computer Vision and Pattern Recognition · Computer Science 2022-01-21 Shixian Wen , Amanda Sofie Rios , Kiran Lekkala , Laurent Itti

Optimizing Distributed Training Approaches for Scaling Neural Networks

This paper presents a comparative analysis of distributed training strategies for large-scale neural networks, focusing on data parallelism, model parallelism, and hybrid approaches. We evaluate these strategies on image classification…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-04-01 Vishnu Vardhan Baligodugula , Fathi Amsaad

High-Performance Large-Scale Image Recognition Without Normalization

Batch normalization is a key component of most image classification models, but it has many undesirable properties stemming from its dependence on the batch size and interactions between examples. Although recent work has succeeded in…

Computer Vision and Pattern Recognition · Computer Science 2021-02-12 Andrew Brock , Soham De , Samuel L. Smith , Karen Simonyan

Efficient Training of Convolutional Neural Nets on Large Distributed Systems

Deep Neural Networks (DNNs) have achieved im- pressive accuracy in many application domains including im- age classification. Training of DNNs is an extremely compute- intensive process and is solved using variants of the stochastic…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-11-03 Sameer Kumar , Dheeraj Sreedhar , Vaibhav Saxena , Yogish Sabharwal , Ashish Verma

Towards Efficient and Data Agnostic Image Classification Training Pipeline for Embedded Systems

Nowadays deep learning-based methods have achieved a remarkable progress at the image classification task among a wide range of commonly used datasets (ImageNet, CIFAR, SVHN, Caltech 101, SUN397, etc.). SOTA performance on each of the…

Computer Vision and Pattern Recognition · Computer Science 2021-08-17 Kirill Prokofiev , Vladislav Sovrasov

Deep Learning Models on CPUs: A Methodology for Efficient Training

GPUs have been favored for training deep learning models due to their highly parallelized architecture. As a result, most studies on training optimization focus on GPUs. There is often a trade-off, however, between cost and efficiency when…

Machine Learning · Computer Science 2023-06-21 Quchen Fu , Ramesh Chukka , Keith Achorn , Thomas Atta-fosu , Deepak R. Canchi , Zhongwei Teng , Jules White , Douglas C. Schmidt

Data-Efficient Deep Learning Method for Image Classification Using Data Augmentation, Focal Cosine Loss, and Ensemble

In general, sufficient data is essential for the better performance and generalization of deep-learning models. However, lots of limitations(cost, resources, etc.) of data collection leads to lack of enough data in most of the areas. In…

Computer Vision and Pattern Recognition · Computer Science 2020-07-16 Byeongjo Kim , Chanran Kim , Jaehoon Lee , Jein Song , Gyoungsoo Park

Training and Inference within 1 Second -- Tackle Cross-Sensor Degradation of Real-World Pansharpening with Efficient Residual Feature Tailoring

Deep learning methods for pansharpening have advanced rapidly, yet models pretrained on data from a specific sensor often generalize poorly to data from other sensors. Existing methods to tackle such cross-sensor degradation include…

Computer Vision and Pattern Recognition · Computer Science 2025-11-21 Tianyu Xin , Jin-Liang Xiao , Zeyu Xia , Shan Yin , Liang-Jian Deng

The Limit of the Batch Size

Large-batch training is an efficient approach for current distributed deep learning systems. It has enabled researchers to reduce the ImageNet/ResNet-50 training from 29 hours to around 1 minute. In this paper, we focus on studying the…

Machine Learning · Computer Science 2020-06-16 Yang You , Yuhui Wang , Huan Zhang , Zhao Zhang , James Demmel , Cho-Jui Hsieh