Related papers: Zero-th Order Algorithm for Softmax Attention Opti…

Zeroth-Order Fine-Tuning of LLMs in Random Subspaces

Fine-tuning Large Language Models (LLMs) has proven effective for a variety of downstream tasks. However, as LLMs grow in size, the memory demands for backpropagation become increasingly prohibitive. Zeroth-order (ZO) optimization methods…

Machine Learning · Computer Science 2025-07-25 Ziming Yu , Pan Zhou , Sike Wang , Jia Li , Mi Tian , Hua Huang

Attention Scheme Inspired Softmax Regression

Large language models (LLMs) have made transformed changes for human society. One of the key computation in LLMs is the softmax unit. This operation is important in LLMs because it allows the model to generate a distribution over possible…

Machine Learning · Computer Science 2023-04-27 Yichuan Deng , Zhihang Li , Zhao Song

Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark

In the evolving landscape of natural language processing (NLP), fine-tuning pre-trained Large Language Models (LLMs) with first-order (FO) optimizers like SGD and Adam has become standard. Yet, as LLMs grow {in size}, the substantial memory…

Machine Learning · Computer Science 2024-05-29 Yihua Zhang , Pingzhi Li , Junyuan Hong , Jiaxiang Li , Yimeng Zhang , Wenqing Zheng , Pin-Yu Chen , Jason D. Lee , Wotao Yin , Mingyi Hong , Zhangyang Wang , Sijia Liu , Tianlong Chen

Learning a Zeroth-Order Optimizer for Fine-Tuning LLMs

Zeroth-order optimizers have recently emerged as a practical approach for fine-tuning large language models (LLMs), significantly reducing GPU memory consumption compared to traditional first-order methods. Yet, existing zeroth-order…

Machine Learning · Computer Science 2025-10-02 Kairun Zhang , Haoyu Li , Yanjun Zhao , Yifan Sun , Huan Zhang

Prior-Informed Zeroth-Order Optimization with Adaptive Direction Alignment for Memory-Efficient LLM Fine-Tuning

Fine-tuning large language models (LLMs) has achieved remarkable success across various NLP tasks, but the substantial memory overhead during backpropagation remains a critical bottleneck, especially as model scales grow. Zeroth-order (ZO)…

Computation and Language · Computer Science 2026-01-09 Feihu Jin , Shipeng Cen , Ying Tan

OAT-Rephrase: Optimization-Aware Training Data Rephrasing for Zeroth-Order LLM Fine-Tuning

Fine-tuning large language models (LLMs) using zeroth-order optimization (ZO) offers a memory-efficient alternative to gradient-based methods but suffers from slower convergence and unstable optimization due to noisy gradient estimates.…

Machine Learning · Computer Science 2025-06-24 Jikai Long , Zijian Hu , Xiaodong Yu , Jianwen Xie , Zhaozhuo Xu

Gradientless Descent: High-Dimensional Zeroth-Order Optimization

Zeroth-order optimization is the process of minimizing an objective $f(x)$, given oracle access to evaluations at adaptively chosen inputs $x$. In this paper, we present two simple yet powerful GradientLess Descent (GLD) algorithms that do…

Machine Learning · Computer Science 2020-05-20 Daniel Golovin , John Karro , Greg Kochanski , Chansoo Lee , Xingyou Song , Qiuyi Zhang

Aligned but Fragile: Enhancing LLM Safety Robustness via Zeroth-Order Optimization

Safety alignment for large language models (LLMs) aims to reduce harmful or unsafe behavior while preserving general utility. However, recent findings reveal that alignment effects can be fragile: lightweight post-alignment manipulations,…

Artificial Intelligence · Computer Science 2026-05-29 Zhihao Liu , Yifan Wu , Jian Lou , Di Wang , Yuxi Zhou , Yuke Hu

QuZO: Quantized Zeroth-Order Fine-Tuning for Large Language Models

Language Models (LLMs) are often quantized to lower precision to reduce the memory cost and latency in inference. However, quantization often degrades model performance, thus fine-tuning is required for various down-stream tasks.…

Machine Learning · Computer Science 2025-02-19 Jiajun Zhou , Yifan Yang , Kai Zhen , Ziyue Liu , Yequan Zhao , Ershad Banijamali , Athanasios Mouchtaris , Ngai Wong , Zheng Zhang

Towards Fast LLM Fine-tuning through Zeroth-Order Optimization with Projected Gradient-Aligned Perturbations

Fine-tuning large language models (LLMs) using zeroth-order (ZO) optimization has emerged as a promising alternative to traditional gradient-based methods due to its reduced memory footprint requirement. However, existing ZO methods suffer…

Machine Learning · Computer Science 2025-10-22 Zhendong Mi , Qitao Tan , Grace Li Zhang , Zhaozhuo Xu , Geng Yuan , Shaoyi Huang

Zeroth-Order Optimization Finds Flat Minima

Zeroth-order methods are extensively used in machine learning applications where gradients are infeasible or expensive to compute, such as black-box attacks, reinforcement learning, and language model fine-tuning. Existing optimization…

Machine Learning · Computer Science 2025-11-12 Liang Zhang , Bingcong Li , Kiran Koshy Thekumparampil , Sewoong Oh , Michael Muehlebach , Niao He

ConMeZO: Adaptive Descent-Direction Sampling for Gradient-Free Finetuning of Large Language Models

Zeroth-order or derivative-free optimization (MeZO) is an attractive strategy for finetuning large language models (LLMs) because it eliminates the memory overhead of backpropagation. However, it converges slowly due to the inherent curse…

Machine Learning · Computer Science 2026-04-21 Lejs Deen Behric , Liang Zhang , Bingcong Li , Kiran Koshy Thekumparampil

Efficient Large Language Models with Zero-Shot Adjustable Acceleration

Using Large Language Models (LLMs) in real-world applications presents significant challenges, particularly in balancing computational efficiency with model performance. Optimizing acceleration after fine-tuning and during inference is…

Computation and Language · Computer Science 2025-09-09 Sajjad Kachuee , Mohammad Sharifkhani

Beyond Linear Approximations: A Novel Pruning Approach for Attention Matrix

Large Language Models (LLMs) have shown immense potential in enhancing various aspects of our daily lives, from conversational AI to search and AI assistants. However, their growing capabilities come at the cost of extremely large model…

Machine Learning · Computer Science 2025-02-27 Yingyu Liang , Jiangxuan Long , Zhenmei Shi , Zhao Song , Yufa Zhou

A Primer on Zeroth-Order Optimization in Signal Processing and Machine Learning

Zeroth-order (ZO) optimization is a subset of gradient-free optimization that emerges in many signal processing and machine learning applications. It is used for solving optimization problems similarly to gradient-based methods. However, it…

Machine Learning · Computer Science 2020-06-23 Sijia Liu , Pin-Yu Chen , Bhavya Kailkhura , Gaoyuan Zhang , Alfred Hero , Pramod K. Varshney

Sparse MeZO: Less Parameters for Better Performance in Zeroth-Order LLM Fine-Tuning

While fine-tuning large language models (LLMs) for specific tasks often yields impressive results, it comes at the cost of memory inefficiency due to back-propagation in gradient-based training. Memory-efficient Zeroth-order (MeZO)…

Machine Learning · Computer Science 2026-02-17 Yong Liu , Zirui Zhu , Chaoyu Gong , Minhao Cheng , Cho-Jui Hsieh , Yang You

The Closeness of In-Context Learning and Weight Shifting for Softmax Regression

Large language models (LLMs) are known for their exceptional performance in natural language processing, making them highly effective in many human life-related or even job-related tasks. The attention mechanism in the Transformer…

Computation and Language · Computer Science 2023-04-27 Shuai Li , Zhao Song , Yu Xia , Tong Yu , Tianyi Zhou

Enhancing Zeroth-order Fine-tuning for Language Models with Low-rank Structures

Parameter-efficient fine-tuning (PEFT) significantly reduces memory costs when adapting large language models (LLMs) for downstream applications. However, traditional first-order (FO) fine-tuning algorithms incur substantial memory overhead…

Machine Learning · Computer Science 2024-10-11 Yiming Chen , Yuan Zhang , Liyuan Cao , Kun Yuan , Zaiwen Wen

Low-Rank Curvature for Zeroth-Order Optimization in LLM Fine-Tuning

We introduce LOREN, a curvature-aware zeroth-order (ZO) optimization method for fine-tuning large language models (LLMs). Existing ZO methods, which estimate gradients via finite differences using random perturbations, often suffer from…

Machine Learning · Computer Science 2025-11-12 Hyunseok Seung , Jaewoo Lee , Hyunsuk Ko

Zeroth-Order Algorithms for Nonconvex Minimax Problems with Improved Complexities

In this paper, we study zeroth-order algorithms for minimax optimization problems that are nonconvex in one variable and strongly-concave in the other variable. Such minimax optimization problems have attracted significant attention lately…

Machine Learning · Statistics 2022-04-06 Zhongruo Wang , Krishnakumar Balasubramanian , Shiqian Ma , Meisam Razaviyayn