Related papers: STEP: Staged Parameter-Efficient Pre-training for …

Hidden States as Early Signals: Step-level Trace Evaluation and Pruning for Efficient Test-Time Scaling

Large Language Models (LLMs) can enhance reasoning capabilities through test-time scaling by generating multiple traces. However, the combination of lengthy reasoning traces with multiple sampling introduces substantial computation and high…

Machine Learning · Computer Science 2026-04-29 Zhixiang Liang , Beichen Huang , Zheng Wang , Minjia Zhang

Scalable Parameter and Memory Efficient Pretraining for LLM: Recent Algorithmic Advances and Benchmarking

Fueled by their remarkable ability to tackle diverse tasks across multiple domains, large language models (LLMs) have grown at an unprecedented rate, with some recent models containing trillions of parameters. This growth is accompanied by…

Machine Learning · Computer Science 2025-05-30 Athanasios Glentis , Jiaxiang Li , Qiulin Shang , Andi Han , Ioannis Tsaknakis , Quan Wei , Mingyi Hong

Predictable Scale: Part I, Step Law -- Optimal Hyperparameter Scaling Law in Large Language Model Pretraining

The impressive capabilities of Large Language Models (LLMs) across diverse tasks are now well established, yet their effective deployment necessitates careful hyperparameter optimization. Although existing methods have explored the…

Machine Learning · Computer Science 2025-08-20 Houyi Li , Wenzhen Zheng , Qiufeng Wang , Hanshan Zhang , Zili Wang , Shijie Xuyang , Yuantao Fan , Zhenyu Ding , Haoying Wang , Ning Ding , Shuigeng Zhou , Xiangyu Zhang , Daxin Jiang

Meta-Learning the Difference: Preparing Large Language Models for Efficient Adaptation

Large pretrained language models (PLMs) are often domain- or task-adapted via fine-tuning or prompting. Finetuning requires modifying all of the parameters and having enough data to avoid overfitting while prompting requires no training and…

Computation and Language · Computer Science 2022-07-11 Zejiang Hou , Julian Salazar , George Polovets

STEP: Enhancing Video-LLMs' Compositional Reasoning by Spatio-Temporal Graph-guided Self-Training

Video Large Language Models (Video-LLMs) have recently shown strong performance in basic video understanding tasks, such as captioning and coarse-grained question answering, but struggle with compositional reasoning that requires multi-step…

Computer Vision and Pattern Recognition · Computer Science 2025-04-01 Haiyi Qiu , Minghe Gao , Long Qian , Kaihang Pan , Qifan Yu , Juncheng Li , Wenjie Wang , Siliang Tang , Yueting Zhuang , Tat-Seng Chua

Late-to-Early Training: LET LLMs Learn Earlier, So Faster and Better

As Large Language Models (LLMs) achieve remarkable empirical success through scaling model and data size, pretraining has become increasingly critical yet computationally prohibitive, hindering rapid development. Despite the availability of…

Computation and Language · Computer Science 2026-02-06 Ji Zhao , Yufei Gu , Shitong Shao , Xun Zhou , Liang Xiang , Zeke Xie

LST: Ladder Side-Tuning for Parameter and Memory Efficient Transfer Learning

Fine-tuning large pre-trained models on downstream tasks has been adopted in a variety of domains recently. However, it is costly to update the entire parameter set of large pre-trained models. Although recently proposed parameter-efficient…

Computation and Language · Computer Science 2022-11-01 Yi-Lin Sung , Jaemin Cho , Mohit Bansal

ConPET: Continual Parameter-Efficient Tuning for Large Language Models

Continual learning necessitates the continual adaptation of models to newly emerging tasks while minimizing the catastrophic forgetting of old ones. This is extremely challenging for large language models (LLMs) with vanilla full-parameter…

Computation and Language · Computer Science 2024-10-28 Chenyang Song , Xu Han , Zheni Zeng , Kuai Li , Chen Chen , Zhiyuan Liu , Maosong Sun , Tao Yang

Sparsity-Accelerated Training for Large Language Models

Large language models (LLMs) have demonstrated proficiency across various natural language processing (NLP) tasks but often require additional training, such as continual pre-training and supervised fine-tuning. However, the costs…

Computation and Language · Computer Science 2024-06-07 Da Ma , Lu Chen , Pengyu Wang , Hongshen Xu , Hanqi Li , Liangtai Sun , Su Zhu , Shuai Fan , Kai Yu

Dissecting the Runtime Performance of the Training, Fine-tuning, and Inference of Large Language Models

Large Language Models (LLMs) have seen great advance in both academia and industry, and their popularity results in numerous open-source frameworks and techniques in accelerating LLM pre-training, fine-tuning, and inference. Training and…

Performance · Computer Science 2023-12-04 Longteng Zhang , Xiang Liu , Zeyu Li , Xinglin Pan , Peijie Dong , Ruibo Fan , Rui Guo , Xin Wang , Qiong Luo , Shaohuai Shi , Xiaowen Chu

Scalable Efficient Training of Large Language Models with Low-dimensional Projected Attention

Improving the effectiveness and efficiency of large language models (LLMs) simultaneously is a critical yet challenging research goal. In this paper, we find that low-rank pre-training, normally considered as efficient methods that will…

Computation and Language · Computer Science 2024-11-05 Xingtai Lv , Ning Ding , Kaiyan Zhang , Ermo Hua , Ganqu Cui , Bowen Zhou

Parameter-Efficient Sparsity for Large Language Models Fine-Tuning

With the dramatically increased number of parameters in language models, sparsity methods have received ever-increasing research focus to compress and accelerate the models. While most research focuses on how to accurately retain…

Artificial Intelligence · Computer Science 2022-05-24 Yuchao Li , Fuli Luo , Chuanqi Tan , Mengdi Wang , Songfang Huang , Shen Li , Junjie Bai

Dynamic Subset Tuning: Expanding the Operational Range of Parameter-Efficient Training for Large Language Models

We propose a novel parameter-efficient training (PET) method for large language models that adapts models to downstream tasks by optimizing a small subset of the existing model parameters. Unlike prior methods, this subset is not fixed in…

Computation and Language · Computer Science 2024-11-14 Felix Stahlberg , Jared Lichtarge , Shankar Kumar

Span Fine-tuning for Pre-trained Language Models

Pre-trained language models (PrLM) have to carefully manage input units when training on a very large text with a vocabulary consisting of millions of words. Previous works have shown that incorporating span-level information over…

Computation and Language · Computer Science 2021-09-16 Rongzhou Bao , Zhuosheng Zhang , Hai Zhao

Staged Training for Transformer Language Models

The current standard approach to scaling transformer language models trains each model size from a different random initialization. As an alternative, we consider a staged training setup that begins with a small model and incrementally…

Computation and Language · Computer Science 2022-03-15 Sheng Shen , Pete Walsh , Kurt Keutzer , Jesse Dodge , Matthew Peters , Iz Beltagy

Scaling Performance of Large Language Model Pretraining

Large language models (LLMs) show best-in-class performance across a wide range of natural language processing applications. Training these models is an extremely computationally expensive task; frontier Artificial Intelligence (AI)…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-10-10 Alexander Interrante-Grant , Carla Varela-Rosa , Suhaas Narayan , Chris Connelly , Albert Reuther

Pruning Large Language Models with Semi-Structural Adaptive Sparse Training

The remarkable success of Large Language Models (LLMs) relies heavily on their substantial scale, which poses significant challenges during model deployment in terms of latency and memory consumption. Recently, numerous studies have…

Computation and Language · Computer Science 2024-12-19 Weiyu Huang , Yuezhou Hu , Guohao Jian , Jun Zhu , Jianfei Chen

Step-Opt: Boosting Optimization Modeling in LLMs through Iterative Data Synthesis and Structured Validation

Large Language Models (LLMs) have revolutionized various domains but encounter substantial challenges in tackling optimization modeling tasks for Operations Research (OR), particularly when dealing with complex problem. In this work, we…

Computation and Language · Computer Science 2025-06-24 Yang Wu , Yifan Zhang , Yurong Wu , Yuran Wang , Junkai Zhang , Jian Cheng

Benchmarking down-scaled (not so large) pre-trained language models

Large Transformer-based language models are pre-trained on corpora of varying sizes, for a different number of steps and with different batch sizes. At the same time, more fundamental components, such as the pre-training objective or…

Computation and Language · Computer Science 2021-05-12 M. Aßenmacher , P. Schulze , C. Heumann

LIMA: Less Is More for Alignment

Large language models are trained in two stages: (1) unsupervised pretraining from raw text, to learn general-purpose representations, and (2) large scale instruction tuning and reinforcement learning, to better align to end tasks and user…

Computation and Language · Computer Science 2023-05-22 Chunting Zhou , Pengfei Liu , Puxin Xu , Srini Iyer , Jiao Sun , Yuning Mao , Xuezhe Ma , Avia Efrat , Ping Yu , Lili Yu , Susan Zhang , Gargi Ghosh , Mike Lewis , Luke Zettlemoyer , Omer Levy