English
Related papers

Related papers: Zero-th Order Algorithm for Softmax Attention Opti…

200 papers

Fine-tuning Large Language Models (LLMs) has proven effective for a variety of downstream tasks. However, as LLMs grow in size, the memory demands for backpropagation become increasingly prohibitive. Zeroth-order (ZO) optimization methods…

Machine Learning · Computer Science 2025-07-25 Ziming Yu , Pan Zhou , Sike Wang , Jia Li , Mi Tian , Hua Huang

Large language models (LLMs) have made transformed changes for human society. One of the key computation in LLMs is the softmax unit. This operation is important in LLMs because it allows the model to generate a distribution over possible…

Machine Learning · Computer Science 2023-04-27 Yichuan Deng , Zhihang Li , Zhao Song

In the evolving landscape of natural language processing (NLP), fine-tuning pre-trained Large Language Models (LLMs) with first-order (FO) optimizers like SGD and Adam has become standard. Yet, as LLMs grow {in size}, the substantial memory…

Zeroth-order optimizers have recently emerged as a practical approach for fine-tuning large language models (LLMs), significantly reducing GPU memory consumption compared to traditional first-order methods. Yet, existing zeroth-order…

Machine Learning · Computer Science 2025-10-02 Kairun Zhang , Haoyu Li , Yanjun Zhao , Yifan Sun , Huan Zhang

Fine-tuning large language models (LLMs) has achieved remarkable success across various NLP tasks, but the substantial memory overhead during backpropagation remains a critical bottleneck, especially as model scales grow. Zeroth-order (ZO)…

Computation and Language · Computer Science 2026-01-09 Feihu Jin , Shipeng Cen , Ying Tan

Fine-tuning large language models (LLMs) using zeroth-order optimization (ZO) offers a memory-efficient alternative to gradient-based methods but suffers from slower convergence and unstable optimization due to noisy gradient estimates.…

Machine Learning · Computer Science 2025-06-24 Jikai Long , Zijian Hu , Xiaodong Yu , Jianwen Xie , Zhaozhuo Xu

Zeroth-order optimization is the process of minimizing an objective $f(x)$, given oracle access to evaluations at adaptively chosen inputs $x$. In this paper, we present two simple yet powerful GradientLess Descent (GLD) algorithms that do…

Machine Learning · Computer Science 2020-05-20 Daniel Golovin , John Karro , Greg Kochanski , Chansoo Lee , Xingyou Song , Qiuyi Zhang

Safety alignment for large language models (LLMs) aims to reduce harmful or unsafe behavior while preserving general utility. However, recent findings reveal that alignment effects can be fragile: lightweight post-alignment manipulations,…

Artificial Intelligence · Computer Science 2026-05-29 Zhihao Liu , Yifan Wu , Jian Lou , Di Wang , Yuxi Zhou , Yuke Hu

Language Models (LLMs) are often quantized to lower precision to reduce the memory cost and latency in inference. However, quantization often degrades model performance, thus fine-tuning is required for various down-stream tasks.…

Machine Learning · Computer Science 2025-02-19 Jiajun Zhou , Yifan Yang , Kai Zhen , Ziyue Liu , Yequan Zhao , Ershad Banijamali , Athanasios Mouchtaris , Ngai Wong , Zheng Zhang

Fine-tuning large language models (LLMs) using zeroth-order (ZO) optimization has emerged as a promising alternative to traditional gradient-based methods due to its reduced memory footprint requirement. However, existing ZO methods suffer…

Machine Learning · Computer Science 2025-10-22 Zhendong Mi , Qitao Tan , Grace Li Zhang , Zhaozhuo Xu , Geng Yuan , Shaoyi Huang

Zeroth-order methods are extensively used in machine learning applications where gradients are infeasible or expensive to compute, such as black-box attacks, reinforcement learning, and language model fine-tuning. Existing optimization…

Machine Learning · Computer Science 2025-11-12 Liang Zhang , Bingcong Li , Kiran Koshy Thekumparampil , Sewoong Oh , Michael Muehlebach , Niao He

Zeroth-order or derivative-free optimization (MeZO) is an attractive strategy for finetuning large language models (LLMs) because it eliminates the memory overhead of backpropagation. However, it converges slowly due to the inherent curse…

Machine Learning · Computer Science 2026-04-21 Lejs Deen Behric , Liang Zhang , Bingcong Li , Kiran Koshy Thekumparampil

Using Large Language Models (LLMs) in real-world applications presents significant challenges, particularly in balancing computational efficiency with model performance. Optimizing acceleration after fine-tuning and during inference is…

Computation and Language · Computer Science 2025-09-09 Sajjad Kachuee , Mohammad Sharifkhani

Large Language Models (LLMs) have shown immense potential in enhancing various aspects of our daily lives, from conversational AI to search and AI assistants. However, their growing capabilities come at the cost of extremely large model…

Machine Learning · Computer Science 2025-02-27 Yingyu Liang , Jiangxuan Long , Zhenmei Shi , Zhao Song , Yufa Zhou

Zeroth-order (ZO) optimization is a subset of gradient-free optimization that emerges in many signal processing and machine learning applications. It is used for solving optimization problems similarly to gradient-based methods. However, it…

Machine Learning · Computer Science 2020-06-23 Sijia Liu , Pin-Yu Chen , Bhavya Kailkhura , Gaoyuan Zhang , Alfred Hero , Pramod K. Varshney

While fine-tuning large language models (LLMs) for specific tasks often yields impressive results, it comes at the cost of memory inefficiency due to back-propagation in gradient-based training. Memory-efficient Zeroth-order (MeZO)…

Machine Learning · Computer Science 2026-02-17 Yong Liu , Zirui Zhu , Chaoyu Gong , Minhao Cheng , Cho-Jui Hsieh , Yang You

Large language models (LLMs) are known for their exceptional performance in natural language processing, making them highly effective in many human life-related or even job-related tasks. The attention mechanism in the Transformer…

Computation and Language · Computer Science 2023-04-27 Shuai Li , Zhao Song , Yu Xia , Tong Yu , Tianyi Zhou

Parameter-efficient fine-tuning (PEFT) significantly reduces memory costs when adapting large language models (LLMs) for downstream applications. However, traditional first-order (FO) fine-tuning algorithms incur substantial memory overhead…

Machine Learning · Computer Science 2024-10-11 Yiming Chen , Yuan Zhang , Liyuan Cao , Kun Yuan , Zaiwen Wen

We introduce LOREN, a curvature-aware zeroth-order (ZO) optimization method for fine-tuning large language models (LLMs). Existing ZO methods, which estimate gradients via finite differences using random perturbations, often suffer from…

Machine Learning · Computer Science 2025-11-12 Hyunseok Seung , Jaewoo Lee , Hyunsuk Ko

In this paper, we study zeroth-order algorithms for minimax optimization problems that are nonconvex in one variable and strongly-concave in the other variable. Such minimax optimization problems have attracted significant attention lately…

Machine Learning · Statistics 2022-04-06 Zhongruo Wang , Krishnakumar Balasubramanian , Shiqian Ma , Meisam Razaviyayn
‹ Prev 1 2 3 10 Next ›