English
Related papers

Related papers: STEP: Staged Parameter-Efficient Pre-training for …

200 papers

Large Language Models (LLMs) can enhance reasoning capabilities through test-time scaling by generating multiple traces. However, the combination of lengthy reasoning traces with multiple sampling introduces substantial computation and high…

Machine Learning · Computer Science 2026-04-29 Zhixiang Liang , Beichen Huang , Zheng Wang , Minjia Zhang

Fueled by their remarkable ability to tackle diverse tasks across multiple domains, large language models (LLMs) have grown at an unprecedented rate, with some recent models containing trillions of parameters. This growth is accompanied by…

Machine Learning · Computer Science 2025-05-30 Athanasios Glentis , Jiaxiang Li , Qiulin Shang , Andi Han , Ioannis Tsaknakis , Quan Wei , Mingyi Hong

The impressive capabilities of Large Language Models (LLMs) across diverse tasks are now well established, yet their effective deployment necessitates careful hyperparameter optimization. Although existing methods have explored the…

Large pretrained language models (PLMs) are often domain- or task-adapted via fine-tuning or prompting. Finetuning requires modifying all of the parameters and having enough data to avoid overfitting while prompting requires no training and…

Computation and Language · Computer Science 2022-07-11 Zejiang Hou , Julian Salazar , George Polovets

Video Large Language Models (Video-LLMs) have recently shown strong performance in basic video understanding tasks, such as captioning and coarse-grained question answering, but struggle with compositional reasoning that requires multi-step…

Computer Vision and Pattern Recognition · Computer Science 2025-04-01 Haiyi Qiu , Minghe Gao , Long Qian , Kaihang Pan , Qifan Yu , Juncheng Li , Wenjie Wang , Siliang Tang , Yueting Zhuang , Tat-Seng Chua

As Large Language Models (LLMs) achieve remarkable empirical success through scaling model and data size, pretraining has become increasingly critical yet computationally prohibitive, hindering rapid development. Despite the availability of…

Computation and Language · Computer Science 2026-02-06 Ji Zhao , Yufei Gu , Shitong Shao , Xun Zhou , Liang Xiang , Zeke Xie

Fine-tuning large pre-trained models on downstream tasks has been adopted in a variety of domains recently. However, it is costly to update the entire parameter set of large pre-trained models. Although recently proposed parameter-efficient…

Computation and Language · Computer Science 2022-11-01 Yi-Lin Sung , Jaemin Cho , Mohit Bansal

Continual learning necessitates the continual adaptation of models to newly emerging tasks while minimizing the catastrophic forgetting of old ones. This is extremely challenging for large language models (LLMs) with vanilla full-parameter…

Computation and Language · Computer Science 2024-10-28 Chenyang Song , Xu Han , Zheni Zeng , Kuai Li , Chen Chen , Zhiyuan Liu , Maosong Sun , Tao Yang

Large language models (LLMs) have demonstrated proficiency across various natural language processing (NLP) tasks but often require additional training, such as continual pre-training and supervised fine-tuning. However, the costs…

Computation and Language · Computer Science 2024-06-07 Da Ma , Lu Chen , Pengyu Wang , Hongshen Xu , Hanqi Li , Liangtai Sun , Su Zhu , Shuai Fan , Kai Yu

Large Language Models (LLMs) have seen great advance in both academia and industry, and their popularity results in numerous open-source frameworks and techniques in accelerating LLM pre-training, fine-tuning, and inference. Training and…

Performance · Computer Science 2023-12-04 Longteng Zhang , Xiang Liu , Zeyu Li , Xinglin Pan , Peijie Dong , Ruibo Fan , Rui Guo , Xin Wang , Qiong Luo , Shaohuai Shi , Xiaowen Chu

Improving the effectiveness and efficiency of large language models (LLMs) simultaneously is a critical yet challenging research goal. In this paper, we find that low-rank pre-training, normally considered as efficient methods that will…

Computation and Language · Computer Science 2024-11-05 Xingtai Lv , Ning Ding , Kaiyan Zhang , Ermo Hua , Ganqu Cui , Bowen Zhou

With the dramatically increased number of parameters in language models, sparsity methods have received ever-increasing research focus to compress and accelerate the models. While most research focuses on how to accurately retain…

Artificial Intelligence · Computer Science 2022-05-24 Yuchao Li , Fuli Luo , Chuanqi Tan , Mengdi Wang , Songfang Huang , Shen Li , Junjie Bai

We propose a novel parameter-efficient training (PET) method for large language models that adapts models to downstream tasks by optimizing a small subset of the existing model parameters. Unlike prior methods, this subset is not fixed in…

Computation and Language · Computer Science 2024-11-14 Felix Stahlberg , Jared Lichtarge , Shankar Kumar

Pre-trained language models (PrLM) have to carefully manage input units when training on a very large text with a vocabulary consisting of millions of words. Previous works have shown that incorporating span-level information over…

Computation and Language · Computer Science 2021-09-16 Rongzhou Bao , Zhuosheng Zhang , Hai Zhao

The current standard approach to scaling transformer language models trains each model size from a different random initialization. As an alternative, we consider a staged training setup that begins with a small model and incrementally…

Computation and Language · Computer Science 2022-03-15 Sheng Shen , Pete Walsh , Kurt Keutzer , Jesse Dodge , Matthew Peters , Iz Beltagy

Large language models (LLMs) show best-in-class performance across a wide range of natural language processing applications. Training these models is an extremely computationally expensive task; frontier Artificial Intelligence (AI)…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-10-10 Alexander Interrante-Grant , Carla Varela-Rosa , Suhaas Narayan , Chris Connelly , Albert Reuther

The remarkable success of Large Language Models (LLMs) relies heavily on their substantial scale, which poses significant challenges during model deployment in terms of latency and memory consumption. Recently, numerous studies have…

Computation and Language · Computer Science 2024-12-19 Weiyu Huang , Yuezhou Hu , Guohao Jian , Jun Zhu , Jianfei Chen

Large Language Models (LLMs) have revolutionized various domains but encounter substantial challenges in tackling optimization modeling tasks for Operations Research (OR), particularly when dealing with complex problem. In this work, we…

Computation and Language · Computer Science 2025-06-24 Yang Wu , Yifan Zhang , Yurong Wu , Yuran Wang , Junkai Zhang , Jian Cheng

Large Transformer-based language models are pre-trained on corpora of varying sizes, for a different number of steps and with different batch sizes. At the same time, more fundamental components, such as the pre-training objective or…

Computation and Language · Computer Science 2021-05-12 M. Aßenmacher , P. Schulze , C. Heumann

Large language models are trained in two stages: (1) unsupervised pretraining from raw text, to learn general-purpose representations, and (2) large scale instruction tuning and reinforcement learning, to better align to end tasks and user…

Computation and Language · Computer Science 2023-05-22 Chunting Zhou , Pengfei Liu , Puxin Xu , Srini Iyer , Jiao Sun , Yuning Mao , Xuezhe Ma , Avia Efrat , Ping Yu , Lili Yu , Susan Zhang , Gargi Ghosh , Mike Lewis , Luke Zettlemoyer , Omer Levy
‹ Prev 1 2 3 10 Next ›