Related papers: Large-batch Optimization for Dense Visual Predicti…

AVG-LLaVA: An Efficient Large Multimodal Model with Adaptive Visual Granularity

Recently, large multimodal models (LMMs) have achieved significant advancements. When dealing with high-resolution images, dominant LMMs typically divide them into multiple local images and a global image, leading to a large number of…

Computer Vision and Pattern Recognition · Computer Science 2025-08-07 Zhibin Lan , Liqiang Niu , Fandong Meng , Wenbo Li , Jie Zhou , Jinsong Su

BADM: Batch ADMM for Deep Learning

Stochastic gradient descent-based algorithms are widely used for training deep neural networks but often suffer from slow convergence. To address the challenge, we leverage the framework of the alternating direction method of multipliers…

Machine Learning · Computer Science 2025-02-03 Ouya Wang , Shenglong Zhou , Geoffrey Ye Li

DIVEBATCH: Accelerating Model Training Through Gradient-Diversity Aware Batch Size Adaptation

The goal of this paper is to accelerate the training of machine learning models, a critical challenge since the training of large-scale deep neural models can be computationally expensive. Stochastic gradient descent (SGD) and its variants…

Machine Learning · Computer Science 2025-09-22 Yuen Chen , Yian Wang , Hari Sundaram

Gradient Monitored Reinforcement Learning

This paper presents a novel neural network training approach for faster convergence and better generalization abilities in deep reinforcement learning. Particularly, we focus on the enhancement of training and evaluation performance in…

Machine Learning · Computer Science 2020-05-26 Mohammed Sharafath Abdul Hameed , Gavneet Singh Chadha , Andreas Schwung , Steven X. Ding

Generative-Discriminative Variational Model for Visual Recognition

The paradigm shift from shallow classifiers with hand-crafted features to end-to-end trainable deep learning models has shown significant improvements on supervised learning tasks. Despite the promising power of deep neural networks (DNN),…

Machine Learning · Computer Science 2017-06-09 Chih-Kuan Yeh , Yao-Hung Hubert Tsai , Yu-Chiang Frank Wang

Adaptive Gradient Regularization: A Faster and Generalizable Optimization Technique for Deep Neural Networks

Stochastic optimization plays a crucial role in the advancement of deep learning technologies. Over the decades, significant effort has been dedicated to improving the training efficiency and robustness of deep neural networks, via various…

Machine Learning · Computer Science 2024-08-21 Huixiu Jiang , Ling Yang , Yu Bao , Rutong Si , Sikun Yang

A Convergent ADMM Framework for Efficient Neural Network Training

As a well-known optimization framework, the Alternating Direction Method of Multipliers (ADMM) has achieved tremendous success in many classification and regression applications. Recently, it has attracted the attention of deep learning…

Machine Learning · Computer Science 2021-12-23 Junxiang Wang , Hongyi Li , Liang Zhao

Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

Training large deep neural networks on massive datasets is computationally very challenging. There has been recent surge in interest in using large batch stochastic optimization methods to tackle this issue. The most prominent algorithm in…

Machine Learning · Computer Science 2020-01-06 Yang You , Jing Li , Sashank Reddi , Jonathan Hseu , Sanjiv Kumar , Srinadh Bhojanapalli , Xiaodan Song , James Demmel , Kurt Keutzer , Cho-Jui Hsieh

Accelerating Augmentation Invariance Pretraining

Our work tackles the computational challenges of contrastive learning methods, particularly for the pretraining of Vision Transformers (ViTs). Despite the effectiveness of contrastive learning, the substantial computational resources…

Computer Vision and Pattern Recognition · Computer Science 2024-11-01 Jinhong Lin , Cheng-En Wu , Yibing Wei , Pedro Morgado

AGGC: Adaptive Group Gradient Clipping for Stabilizing Large Language Model Training

To stabilize the training of Large Language Models (LLMs), gradient clipping is a nearly ubiquitous heuristic used to alleviate exploding gradients. However, traditional global norm clipping erroneously presupposes gradient homogeneity…

Machine Learning · Computer Science 2026-01-21 Zhiyuan Li , Yuan Wu , Yi Chang

AiluRus: A Scalable ViT Framework for Dense Prediction

Vision transformers (ViTs) have emerged as a prevalent architecture for vision tasks owing to their impressive performance. However, when it comes to handling long token sequences, especially in dense prediction tasks that require…

Computer Vision and Pattern Recognition · Computer Science 2023-11-03 Jin Li , Yaoming Wang , Xiaopeng Zhang , Bowen Shi , Dongsheng Jiang , Chenglin Li , Wenrui Dai , Hongkai Xiong , Qi Tian

Large Batch Training Does Not Need Warmup

Training deep neural networks using a large batch size has shown promising results and benefits many real-world applications. However, the optimizer converges slowly at early epochs and there is a gap between large-batch deep learning…

Machine Learning · Computer Science 2020-02-06 Zhouyuan Huo , Bin Gu , Heng Huang

MARS: Unleashing the Power of Variance Reduction for Training Large Models

Training deep neural networks--and more recently, large models demands efficient and scalable optimizers. Adaptive gradient algorithms like Adam, AdamW, and their variants have been central to this task. Despite the development of numerous…

Machine Learning · Computer Science 2025-09-05 Huizhuo Yuan , Yifeng Liu , Shuang Wu , Xun Zhou , Quanquan Gu

Adaptive Computation Modules: Granular Conditional Computation For Efficient Inference

While transformer models have been highly successful, they are computationally inefficient. We observe that for each layer, the full width of the layer may be needed only for a small subset of tokens inside a batch and that the "effective"…

Machine Learning · Computer Science 2024-12-19 Bartosz Wójcik , Alessio Devoto , Karol Pustelnik , Pasquale Minervini , Simone Scardapane

DenSe-AdViT: A novel Vision Transformer for Dense SAR Object Detection

Vision Transformer (ViT) has achieved remarkable results in object detection for synthetic aperture radar (SAR) images, owing to its exceptional ability to extract global features. However, it struggles with the extraction of multi-scale…

Computer Vision and Pattern Recognition · Computer Science 2025-12-22 Yang Zhang , Jingyi Cao , Yanan You , Yuanyuan Qiao

Rethinking Visual Intelligence: Insights from Video Pretraining

Large language models (LLMs) have demonstrated that large-scale pretraining enables systems to adapt rapidly to new problems with little supervision in the language domain. This success, however, has not translated as effectively to the…

Computer Vision and Pattern Recognition · Computer Science 2025-11-04 Pablo Acuaviva , Aram Davtyan , Mariam Hassan , Sebastian Stapf , Ahmad Rahimi , Alexandre Alahi , Paolo Favaro

Classifier-guided Gradient Modulation for Enhanced Multimodal Learning

Multimodal learning has developed very fast in recent years. However, during the multimodal training process, the model tends to rely on only one modality based on which it could learn faster, thus leading to inadequate use of other…

Machine Learning · Computer Science 2024-11-05 Zirun Guo , Tao Jin , Jingyuan Chen , Zhou Zhao

Gradient Descent based Optimization Algorithms for Deep Learning Models Training

In this paper, we aim at providing an introduction to the gradient descent based optimization algorithms for learning deep neural network models. Deep learning models involving multiple nonlinear projection layers are very challenging to…

Machine Learning · Computer Science 2019-03-12 Jiawei Zhang

EA4LLM: A Gradient-Free Approach to Large Language Model Optimization via Evolutionary Algorithms

In recent years, large language models (LLMs) have made remarkable progress, with model optimization primarily relying on gradient-based optimizers such as Adam. However, these gradient-based methods impose stringent hardware requirements,…

Artificial Intelligence · Computer Science 2025-10-24 WenTao Liu , Siyu Song , Hao Hao , Aimin Zhou

Gradient-Regulated Meta-Prompt Learning for Generalizable Vision-Language Models

Prompt tuning, a recently emerging paradigm, enables the powerful vision-language pre-training models to adapt to downstream tasks in a parameter -- and data -- efficient way, by learning the ``soft prompts'' to condition frozen…

Computer Vision and Pattern Recognition · Computer Science 2023-08-21 Juncheng Li , Minghe Gao , Longhui Wei , Siliang Tang , Wenqiao Zhang , Mengze Li , Wei Ji , Qi Tian , Tat-Seng Chua , Yueting Zhuang