Related papers: Scaling Laws for Code: Every Programming Language …

Scaling Laws for Multilingual Language Models

We propose a novel scaling law for general-purpose decoder-only language models (LMs) trained on multilingual data, tackling the problem of balancing languages during multilingual pretraining. A primary challenge in studying multilingual…

Computation and Language · Computer Science 2024-12-05 Yifei He , Alon Benhaim , Barun Patra , Praneetha Vaddamanu , Sanchit Ahuja , Parul Chopra , Vishrav Chaudhary , Han Zhao , Xia Song

Scaling Laws for Code: A More Data-Hungry Regime

Code Large Language Models (LLMs) are revolutionizing software engineering. However, scaling laws that guide the efficient training are predominantly analyzed on Natural Language (NL). Given the fundamental differences like strict syntax…

Computation and Language · Computer Science 2026-05-19 Xianzhen Luo , Wenzhen Zheng , Qingfu Zhu , Rongyi Zhang , Houyi Li , Siming Huang , YuanTao Fan , Wanxiang Che

Scaling Law for Language Models Training Considering Batch Size

Large language models (LLMs) have made remarkable advances in recent years, with scaling laws playing a critical role in this rapid progress. In this paper, we empirically investigate how a critical hyper-parameter, i.e., the global batch…

Computation and Language · Computer Science 2024-12-03 Xian Shuai , Yiding Wang , Yimeng Wu , Xin Jiang , Xiaozhe Ren

Language models scale reliably with over-training and on downstream tasks

Scaling laws are useful guides for derisking expensive training runs, as they predict performance of large models using cheaper, small-scale experiments. However, there remain gaps between current scaling studies and how language models are…

Computation and Language · Computer Science 2024-06-18 Samir Yitzhak Gadre , Georgios Smyrnis , Vaishaal Shankar , Suchin Gururangan , Mitchell Wortsman , Rulin Shao , Jean Mercat , Alex Fang , Jeffrey Li , Sedrick Keh , Rui Xin , Marianna Nezhurina , Igor Vasiljevic , Jenia Jitsev , Luca Soldaini , Alexandros G. Dimakis , Gabriel Ilharco , Pang Wei Koh , Shuran Song , Thomas Kollar , Yair Carmon , Achal Dave , Reinhard Heckel , Niklas Muennighoff , Ludwig Schmidt

Sloth: scaling laws for LLM skills to predict multi-benchmark performance across families

Scaling laws for large language models (LLMs) predict model performance based on parameters like size and training data. However, differences in training configurations and data processing across model families lead to significant…

Machine Learning · Computer Science 2025-12-03 Felipe Maia Polo , Seamus Somerstep , Leshem Choshen , Yuekai Sun , Mikhail Yurochkin

Performance Law of Large Language Models

Guided by the belief of the scaling law, large language models (LLMs) have achieved impressive performance in recent years. However, scaling law only gives a qualitative estimation of loss, which is influenced by various factors such as…

Computation and Language · Computer Science 2024-09-16 Chuhan Wu , Ruiming Tang

Investigating and Scaling up Code-Switching for Multilingual Language Model Pre-Training

Large language models (LLMs) exhibit remarkable multilingual capabilities despite the extreme language imbalance in the pre-training data. In this paper, we closely examine the reasons behind this phenomenon, focusing on the pre-training…

Computation and Language · Computer Science 2025-04-23 Zhijun Wang , Jiahuan Li , Hao Zhou , Rongxiang Weng , Jingang Wang , Xin Huang , Xue Han , Junlan Feng , Chao Deng , Shujian Huang

A Hitchhiker's Guide to Scaling Law Estimation

Scaling laws predict the loss of a target machine learning model by extrapolating from easier-to-train models with fewer parameters or smaller training sets. This provides an efficient way for practitioners and researchers alike to compare…

Machine Learning · Computer Science 2025-06-04 Leshem Choshen , Yang Zhang , Jacob Andreas

Scaling Laws Under the Microscope: Predicting Transformer Performance from Small Scale Experiments

Neural scaling laws define a predictable relationship between a model's parameter count and its performance after training in the form of a power law. However, most research to date has not explicitly investigated whether scaling laws can…

Computation and Language · Computer Science 2022-10-19 Maor Ivgi , Yair Carmon , Jonathan Berant

Scaling Laws for Differentially Private Language Models

Scaling laws have emerged as important components of large language model (LLM) training as they can predict performance gains through scale, and provide guidance on important hyper-parameter choices that would otherwise be expensive. LLMs…

Machine Learning · Computer Science 2025-02-03 Ryan McKenna , Yangsibo Huang , Amer Sinha , Borja Balle , Zachary Charles , Christopher A. Choquette-Choo , Badih Ghazi , George Kaissis , Ravi Kumar , Ruibo Liu , Da Yu , Chiyuan Zhang

Scaling Laws for Multilingual Neural Machine Translation

In this work, we provide a large-scale empirical study of the scaling properties of multilingual neural machine translation models. We examine how increases in the model size affect the model performance and investigate the role of the…

Computation and Language · Computer Science 2023-02-21 Patrick Fernandes , Behrooz Ghorbani , Xavier Garcia , Markus Freitag , Orhan Firat

LLMs on the Line: Data Determines Loss-to-Loss Scaling Laws

Scaling laws guide the development of large language models (LLMs) by offering estimates for the optimal balance of model size, tokens, and compute. More recently, loss-to-loss scaling laws that relate losses across pretraining datasets and…

Machine Learning · Computer Science 2026-05-21 Prasanna Mayilvahanan , Thaddäus Wiedemer , Sayak Mallick , Matthias Bethge , Wieland Brendel

Breaking Language Barriers: Cross-Lingual Continual Pre-Training at Scale

In recent years, Large Language Models (LLMs) have made significant strides towards Artificial General Intelligence. However, training these models from scratch requires substantial computational resources and vast amounts of text data. In…

Computation and Language · Computer Science 2024-10-03 Wenzhen Zheng , Wenbo Pan , Xu Xu , Libo Qin , Li Yue , Ming Zhou

xLSTM Scaling Laws: Competitive Performance with Linear Time-Complexity

Scaling laws play a central role in the success of Large Language Models (LLMs), enabling the prediction of model performance relative to compute budgets prior to training. While Transformers have been the dominant architecture, recent…

Machine Learning · Computer Science 2026-02-23 Maximilian Beck , Kajetan Schweighofer , Sebastian Böck , Sebastian Lehner , Sepp Hochreiter

Scaling Laws Behind Code Understanding Model

The scaling law is becoming a fundamental law in many machine learning areas. That is, test error falls off with the power law when increasing training data, model size, and computing resource. However, whether this law is suitable for the…

Software Engineering · Computer Science 2024-02-21 Jiayi Lin , Hande Dong , Yutao Xie , Lei Zhang

Scaling Behaviors of LLM Reinforcement Learning Post-Training: An Empirical Study in Mathematical Reasoning

While scaling laws for large language models (LLMs) during pre-training have been extensively studied, their behavior under reinforcement learning (RL) post-training remains largely unexplored. This paper presents a systematic empirical…

Machine Learning · Computer Science 2026-04-20 Zelin Tan , Hejia Geng , Xiaohang Yu , Mulei Zhang , Guancheng Wan , Yifan Zhou , Qiang He , Xiangyuan Xue , Heng Zhou , Yutao Fan , Zhongzhi Li , Zaibin Zhang , Guibin Zhang , Chen Zhang , Zhenfei Yin , Philip Torr , Lei Bai

Scaling Laws for Economic Productivity: Experimental Evidence in LLM-Assisted Translation

This paper derives "scaling laws"--empirical relationships between the training compute of Large Language Models (LLMs) and their performance--for economic outcomes. In a preregistered online experiment, 300 professional translators…

General Economics · Economics 2024-12-10 Ali Merali

Towards Robust Scaling Laws for Optimizers

The quality of Large Language Model (LLM) pretraining depends on multiple factors, including the compute budget and the choice of optimization algorithm. Empirical scaling laws are widely used to predict loss as model size and training data…

Machine Learning · Computer Science 2026-02-25 Alexandra Volkova , Mher Safaryan , Christoph H. Lampert , Dan Alistarh

Scaling Performance of Large Language Model Pretraining

Large language models (LLMs) show best-in-class performance across a wide range of natural language processing applications. Training these models is an extremely computationally expensive task; frontier Artificial Intelligence (AI)…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-10-10 Alexander Interrante-Grant , Carla Varela-Rosa , Suhaas Narayan , Chris Connelly , Albert Reuther

The Law of Multi-Model Collaboration: Scaling Limits of Model Ensembling for Large Language Models

Recent advances in large language models (LLMs) have been largely driven by scaling laws for individual models, which predict performance improvements as model parameters and data volume increase. However, the capabilities of any single LLM…

Machine Learning · Computer Science 2026-01-29 Dakuan Lu , Jiaqi Zhang , Cheng Yuan , Jiawei Shao , Xuelong Li