Related papers: ZO-SAM: Zero-Order Sharpness-Aware Minimization fo…

Make Sharpness-Aware Minimization Stronger: A Sparsified Perturbation Approach

Deep neural networks often suffer from poor generalization caused by complex and non-convex loss landscapes. One of the popular solutions is Sharpness-Aware Minimization (SAM), which smooths the loss landscape via minimizing the maximized…

Machine Learning · Computer Science 2022-10-25 Peng Mi , Li Shen , Tianhe Ren , Yiyi Zhou , Xiaoshuai Sun , Rongrong Ji , Dacheng Tao

Systematic Investigation of Sparse Perturbed Sharpness-Aware Minimization Optimizer

Deep neural networks often suffer from poor generalization due to complex and non-convex loss landscapes. Sharpness-Aware Minimization (SAM) is a popular solution that smooths the loss landscape by minimizing the maximized change of…

Artificial Intelligence · Computer Science 2023-07-03 Peng Mi , Li Shen , Tianhe Ren , Yiyi Zhou , Tianshuo Xu , Xiaoshuai Sun , Tongliang Liu , Rongrong Ji , Dacheng Tao

Sparse Layer Sharpness-Aware Minimization for Efficient Fine-Tuning

Sharpness-aware minimization (SAM) seeks the minima with a flat loss landscape to improve the generalization performance in machine learning tasks, including fine-tuning. However, its extra parameter perturbation step doubles the…

Machine Learning · Computer Science 2026-02-11 Yifei Cheng , Xianglin Yang , Guoxia Wang , Chao Huang , Fei Ma , Dianhai Yu , Xiaochun Cao , Li Shen

Sparse Perturbations for Improved Convergence in Stochastic Zeroth-Order Optimization

Interest in stochastic zeroth-order (SZO) methods has recently been revived in black-box optimization scenarios such as adversarial black-box attacks to deep neural networks. SZO methods only require the ability to evaluate the objective…

Machine Learning · Statistics 2020-11-11 Mayumi Ohta , Nathaniel Berger , Artem Sokolov , Stefan Riezler

Sharpness-Aware Minimization for Efficiently Improving Generalization

In today's heavily overparameterized models, the value of the training loss provides few guarantees on model generalization ability. Indeed, optimizing only the training loss value, as is commonly done, can easily lead to suboptimal model…

Machine Learning · Computer Science 2021-04-30 Pierre Foret , Ariel Kleiner , Hossein Mobahi , Behnam Neyshabur

Asynchronous Sharpness-Aware Minimization For Fast and Accurate Deep Learning

Sharpness-Aware Minimization (SAM) is an optimization method that improves generalization performance of machine learning models. Despite its superior generalization, SAM has not been actively used in real-world applications due to its…

Machine Learning · Computer Science 2025-03-17 Junhyuk Jo , Jihyun Lim , Sunwoo Lee

DeepZero: Scaling up Zeroth-Order Optimization for Deep Model Training

Zeroth-order (ZO) optimization has become a popular technique for solving machine learning (ML) problems when first-order (FO) information is difficult or impossible to obtain. However, the scalability of ZO optimization remains an open…

Machine Learning · Computer Science 2024-03-18 Aochuan Chen , Yimeng Zhang , Jinghan Jia , James Diffenderfer , Jiancheng Liu , Konstantinos Parasyris , Yihua Zhang , Zheng Zhang , Bhavya Kailkhura , Sijia Liu

Efficient Sharpness-aware Minimization for Improved Training of Neural Networks

Overparametrized Deep Neural Networks (DNNs) often achieve astounding performances, but may potentially result in severe generalization error. Recently, the relation between the sharpness of the loss landscape and the generalization error…

Artificial Intelligence · Computer Science 2022-05-31 Jiawei Du , Hanshu Yan , Jiashi Feng , Joey Tianyi Zhou , Liangli Zhen , Rick Siow Mong Goh , Vincent Y. F. Tan

Zero-Order Sharpness-Aware Minimization

Prompt learning has become a key method for adapting large language models to specific tasks with limited data. However, traditional gradient-based optimization methods for tuning prompts are computationally intensive, posing challenges for…

Statistics Theory · Mathematics 2025-12-30 Yao Fu , Yihang Jin , Chunxia Zhang , Junmin Liu , Guang Dai , Haishan Ye

Momentum-SAM: Sharpness Aware Minimization without Computational Overhead

The recently proposed optimization algorithm for deep neural networks Sharpness Aware Minimization (SAM) suggests perturbing parameters before gradient calculation by a gradient ascent step to guide the optimization into parameter space…

Machine Learning · Computer Science 2025-10-03 Marlon Becker , Frederick Altrock , Benjamin Risse

Powering Up Zeroth-Order Training via Subspace Gradient Orthogonalization

Zeroth-order (ZO) optimization provides a gradient-free alternative to first-order (FO) methods by estimating gradients via finite differences of function evaluations, and has recently emerged as a memory-efficient paradigm for fine-tuning…

Machine Learning · Computer Science 2026-02-24 Yicheng Lang , Changsheng Wang , Yihua Zhang , Mingyi Hong , Zheng Zhang , Wotao Yin , Sijia Liu

Zeroth-Order Sharpness-Aware Learning with Exponential Tilting

Classic zeroth-order optimization approaches typically optimize for a smoothed version of the original function, i.e., the expected objective under randomly perturbed model parameters. This can be interpreted as encouraging the loss values…

Machine Learning · Computer Science 2025-10-21 Xuchen Gong , Tian Li

Sparse MeZO: Less Parameters for Better Performance in Zeroth-Order LLM Fine-Tuning

While fine-tuning large language models (LLMs) for specific tasks often yields impressive results, it comes at the cost of memory inefficiency due to back-propagation in gradient-based training. Memory-efficient Zeroth-order (MeZO)…

Machine Learning · Computer Science 2026-02-17 Yong Liu , Zirui Zhu , Chaoyu Gong , Minhao Cheng , Cho-Jui Hsieh , Yang You

K-SAM: Sharpness-Aware Minimization at the Speed of SGD

Sharpness-Aware Minimization (SAM) has recently emerged as a robust technique for improving the accuracy of deep neural networks. However, SAM incurs a high computational cost in practice, requiring up to twice as much computation as…

Machine Learning · Computer Science 2022-10-25 Renkun Ni , Ping-yeh Chiang , Jonas Geiping , Micah Goldblum , Andrew Gordon Wilson , Tom Goldstein

Efficient Sharpness-Aware Minimization for Molecular Graph Transformer Models

Sharpness-aware minimization (SAM) has received increasing attention in computer vision since it can effectively eliminate the sharp local minima from the training trajectory and mitigate generalization degradation. However, SAM requires…

Machine Learning · Computer Science 2024-06-21 Yili Wang , Kaixiong Zhou , Ninghao Liu , Ying Wang , Xin Wang

Zeroth-Order Regularized Optimization (ZORO): Approximately Sparse Gradients and Adaptive Sampling

We consider the problem of minimizing a high-dimensional objective function, which may include a regularization term, using (possibly noisy) evaluations of the function. Such optimization is also called derivative-free, zeroth-order, or…

Optimization and Control · Mathematics 2023-03-20 HanQin Cai , Daniel Mckenzie , Wotao Yin , Zhenliang Zhang

Preconditioned Sharpness-Aware Minimization: Unifying Analysis and a Novel Learning Algorithm

Targeting solutions over `flat' regions of the loss landscape, sharpness-aware minimization (SAM) has emerged as a powerful tool to improve generalizability of deep neural network based learning. While several SAM variants have been…

Machine Learning · Computer Science 2025-01-14 Yilang Zhang , Bingcong Li , Georgios B. Giannakis

Zeroth-Order Fine-Tuning of LLMs in Random Subspaces

Fine-tuning Large Language Models (LLMs) has proven effective for a variety of downstream tasks. However, as LLMs grow in size, the memory demands for backpropagation become increasingly prohibitive. Zeroth-order (ZO) optimization methods…

Machine Learning · Computer Science 2025-07-25 Ziming Yu , Pan Zhou , Sike Wang , Jia Li , Mi Tian , Hua Huang

Improved Deep Neural Network Generalization Using m-Sharpness-Aware Minimization

Modern deep learning models are over-parameterized, where the optimization setup strongly affects the generalization performance. A key element of reliable optimization for these systems is the modification of the loss function.…

Machine Learning · Computer Science 2022-12-09 Kayhan Behdin , Qingquan Song , Aman Gupta , David Durfee , Ayan Acharya , Sathiya Keerthi , Rahul Mazumder

Simultaneous Computation and Memory Efficient Zeroth-Order Optimizer for Fine-Tuning Large Language Models

Fine-tuning is powerful for adapting large language models to downstream tasks, but it often results in huge memory usages. A promising approach to mitigate this is using Zeroth-Order (ZO) optimization, which estimates gradients to replace…

Machine Learning · Computer Science 2024-10-15 Fei Wang , Li Shen , Liang Ding , Chao Xue , Ye Liu , Changxing Ding