English
Related papers

Related papers: Scaling Efficient LLMs

200 papers

AI accelerator processing capabilities and memory constraints largely dictate the scale in which machine learning workloads (e.g., training and inference) can be executed within a desirable time frame. Training a state of the art,…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-10-12 Michael Benington , Leo Phan , Chris Pierre Paul , Evan Shoemaker , Priyanka Ranade , Torstein Collett , Grant Hodgson Perez , Christopher Krieger

Scaling laws for large language models (LLMs) have provided useful guidance in training ever larger models for predictable performance gains. Time series forecasting shares a similar sequential structure to language, and is amenable to…

Machine Learning · Computer Science 2025-01-09 Thomas D. P. Edwards , James Alvey , Justin Alsing , Nam H. Nguyen , Benjamin D. Wandelt

This paper argues that continued AI scaling requires repeated efficiency doublings. Classical AI scaling laws remain useful because they make progress predictable despite diminishing returns, but the compute variable in those laws is best…

Machine Learning · Computer Science 2026-04-10 Chien-Ping Lu

State-of-the-art results in large language models (LLMs) often rely on scale, which becomes computationally expensive. This has sparked a research agenda to reduce these models' parameter counts and computational costs without significantly…

Computation and Language · Computer Science 2024-11-07 Xiuying Wei , Skander Moalla , Razvan Pascanu , Caglar Gulcehre

Scaling laws have played a major role in the modern AI revolution, providing practitioners predictive power over how the model performance will improve with increasing data, compute, and number of model parameters. This has spurred an…

Machine Learning · Computer Science 2026-01-16 Maissam Barkeshli , Alberto Alfarano , Andrey Gromov

Recent studies have demonstrated that the performance of transformers on the task of language modeling obeys a power-law relationship with model size over six orders of magnitude. While transformers exhibit impressive scaling, their…

Machine Learning · Computer Science 2021-10-07 Narsimha Chilkuri , Eric Hunsberger , Aaron Voelker , Gurshaant Malik , Chris Eliasmith

It is commonly believed that scaling language models should commit a significant space or time cost, by increasing the parameters (parameter scaling) or output tokens (inference-time scaling). We introduce the third and more…

Machine Learning · Computer Science 2025-05-16 Mouxiang Chen , Binyuan Hui , Zeyu Cui , Jiaxi Yang , Dayiheng Liu , Jianling Sun , Junyang Lin , Zhongxin Liu

While scaling laws for Large Language Models (LLMs) traditionally focus on proxy metrics like pretraining loss, predicting downstream task performance has been considered unreliable. This paper challenges that view by proposing a direct…

Machine Learning · Computer Science 2025-12-10 Jakub Krajewski , Amitis Shidani , Dan Busbridge , Sam Wiseman , Jason Ramapuram

Large language models (LLMs) are routinely pre-trained on billions of tokens, only to start the process over again once new data becomes available. A much more efficient solution is to continually pre-train these models, saving significant…

This paper presents a simple, effective, and cost-efficient strategy to improve LLM performance by scaling test-time compute. Our strategy builds upon the repeated-sampling-then-voting framework, with a novel twist: incorporating multiple…

Artificial Intelligence · Computer Science 2025-11-11 Jianhao Chen , Zishuo Xun , Bocheng Zhou , Han Qi , Hangfan Zhang , Qiaosheng Zhang , Yang Chen , Wei Hu , Yuzhong Qu , Wanli Ouyang , Shuyue Hu

We study a novel language model architecture that is capable of scaling test-time computation by implicitly reasoning in latent space. Our model works by iterating a recurrent block, thereby unrolling to arbitrary depth at test-time. This…

Scaling laws are useful guides for derisking expensive training runs, as they predict performance of large models using cheaper, small-scale experiments. However, there remain gaps between current scaling studies and how language models are…

Large Language Models (LLMs) have delivered impressive results in language understanding, generation, reasoning, and pushes the ability boundary of multimodal models. Transformer models, as the foundation of modern LLMs, offer a strong…

Computation and Language · Computer Science 2025-08-14 Weigao Sun , Jiaxi Hu , Yucheng Zhou , Jusen Du , Disen Lan , Kexin Wang , Tong Zhu , Xiaoye Qu , Yu Zhang , Xiaoyu Mo , Daizong Liu , Yuxuan Liang , Wenliang Chen , Guoqi Li , Yu Cheng

Recently, Large Language Models (LLMs) have achieved remarkable success. A key factor behind this success is the scaling law observed by OpenAI. Specifically, for models with Transformer architecture, the test loss exhibits a power-law…

Machine Learning · Computer Science 2025-03-04 Yifang Chen , Xuyang Guo , Xiaoyu Li , Yingyu Liang , Zhenmei Shi , Zhao Song

In-context learning (ICL) enables large language models to adapt to new tasks from demonstrations without parameter updates. Despite extensive empirical studies, a principled understanding of ICL emergence at scale remains more elusive. We…

Machine Learning · Computer Science 2025-11-11 Sushant Mehta , Ishan Gupta

Large Language Models (LLMs) are known for their expensive and time-consuming training. Thus, oftentimes, LLMs are fine-tuned to address a specific task, given the pretrained weights of a pre-trained LLM considered a foundation model. In…

Computation and Language · Computer Science 2025-12-05 Eshed Gal , Moshe Eliasof , Javier Turek , Uri Ascher , Eran Treister , Eldad Haber

Transformers have achieved great success in effectively processing sequential data such as text. Their architecture consisting of several attention and feedforward blocks can model relations between elements of a sequence in parallel…

Machine Learning · Computer Science 2025-02-20 Jaemu Heo , Eldor Fozilov , Hyunmin Song , Taehwan Kim

Large language models (LLMs) are trained for downstream tasks by updating their parameters (e.g., via RL). However, updating parameters forces them to absorb task-specific information, which can result in catastrophic forgetting and loss of…

Neural scaling laws define a predictable relationship between a model's parameter count and its performance after training in the form of a power law. However, most research to date has not explicitly investigated whether scaling laws can…

Computation and Language · Computer Science 2022-10-19 Maor Ivgi , Yair Carmon , Jonathan Berant

In recent years, Large Language Models (LLMs) have made significant strides towards Artificial General Intelligence. However, training these models from scratch requires substantial computational resources and vast amounts of text data. In…

Computation and Language · Computer Science 2024-10-03 Wenzhen Zheng , Wenbo Pan , Xu Xu , Libo Qin , Li Yue , Ming Zhou
‹ Prev 1 2 3 10 Next ›