English
Related papers

Related papers: Scaling Laws for Code: Every Programming Language …

200 papers

We propose a novel scaling law for general-purpose decoder-only language models (LMs) trained on multilingual data, tackling the problem of balancing languages during multilingual pretraining. A primary challenge in studying multilingual…

Computation and Language · Computer Science 2024-12-05 Yifei He , Alon Benhaim , Barun Patra , Praneetha Vaddamanu , Sanchit Ahuja , Parul Chopra , Vishrav Chaudhary , Han Zhao , Xia Song

Code Large Language Models (LLMs) are revolutionizing software engineering. However, scaling laws that guide the efficient training are predominantly analyzed on Natural Language (NL). Given the fundamental differences like strict syntax…

Computation and Language · Computer Science 2026-05-19 Xianzhen Luo , Wenzhen Zheng , Qingfu Zhu , Rongyi Zhang , Houyi Li , Siming Huang , YuanTao Fan , Wanxiang Che

Large language models (LLMs) have made remarkable advances in recent years, with scaling laws playing a critical role in this rapid progress. In this paper, we empirically investigate how a critical hyper-parameter, i.e., the global batch…

Computation and Language · Computer Science 2024-12-03 Xian Shuai , Yiding Wang , Yimeng Wu , Xin Jiang , Xiaozhe Ren

Scaling laws are useful guides for derisking expensive training runs, as they predict performance of large models using cheaper, small-scale experiments. However, there remain gaps between current scaling studies and how language models are…

Scaling laws for large language models (LLMs) predict model performance based on parameters like size and training data. However, differences in training configurations and data processing across model families lead to significant…

Machine Learning · Computer Science 2025-12-03 Felipe Maia Polo , Seamus Somerstep , Leshem Choshen , Yuekai Sun , Mikhail Yurochkin

Guided by the belief of the scaling law, large language models (LLMs) have achieved impressive performance in recent years. However, scaling law only gives a qualitative estimation of loss, which is influenced by various factors such as…

Computation and Language · Computer Science 2024-09-16 Chuhan Wu , Ruiming Tang

Large language models (LLMs) exhibit remarkable multilingual capabilities despite the extreme language imbalance in the pre-training data. In this paper, we closely examine the reasons behind this phenomenon, focusing on the pre-training…

Computation and Language · Computer Science 2025-04-23 Zhijun Wang , Jiahuan Li , Hao Zhou , Rongxiang Weng , Jingang Wang , Xin Huang , Xue Han , Junlan Feng , Chao Deng , Shujian Huang

Scaling laws predict the loss of a target machine learning model by extrapolating from easier-to-train models with fewer parameters or smaller training sets. This provides an efficient way for practitioners and researchers alike to compare…

Machine Learning · Computer Science 2025-06-04 Leshem Choshen , Yang Zhang , Jacob Andreas

Neural scaling laws define a predictable relationship between a model's parameter count and its performance after training in the form of a power law. However, most research to date has not explicitly investigated whether scaling laws can…

Computation and Language · Computer Science 2022-10-19 Maor Ivgi , Yair Carmon , Jonathan Berant

Scaling laws have emerged as important components of large language model (LLM) training as they can predict performance gains through scale, and provide guidance on important hyper-parameter choices that would otherwise be expensive. LLMs…

In this work, we provide a large-scale empirical study of the scaling properties of multilingual neural machine translation models. We examine how increases in the model size affect the model performance and investigate the role of the…

Computation and Language · Computer Science 2023-02-21 Patrick Fernandes , Behrooz Ghorbani , Xavier Garcia , Markus Freitag , Orhan Firat

Scaling laws guide the development of large language models (LLMs) by offering estimates for the optimal balance of model size, tokens, and compute. More recently, loss-to-loss scaling laws that relate losses across pretraining datasets and…

Machine Learning · Computer Science 2026-05-21 Prasanna Mayilvahanan , Thaddäus Wiedemer , Sayak Mallick , Matthias Bethge , Wieland Brendel

In recent years, Large Language Models (LLMs) have made significant strides towards Artificial General Intelligence. However, training these models from scratch requires substantial computational resources and vast amounts of text data. In…

Computation and Language · Computer Science 2024-10-03 Wenzhen Zheng , Wenbo Pan , Xu Xu , Libo Qin , Li Yue , Ming Zhou

Scaling laws play a central role in the success of Large Language Models (LLMs), enabling the prediction of model performance relative to compute budgets prior to training. While Transformers have been the dominant architecture, recent…

Machine Learning · Computer Science 2026-02-23 Maximilian Beck , Kajetan Schweighofer , Sebastian Böck , Sebastian Lehner , Sepp Hochreiter

The scaling law is becoming a fundamental law in many machine learning areas. That is, test error falls off with the power law when increasing training data, model size, and computing resource. However, whether this law is suitable for the…

Software Engineering · Computer Science 2024-02-21 Jiayi Lin , Hande Dong , Yutao Xie , Lei Zhang

While scaling laws for large language models (LLMs) during pre-training have been extensively studied, their behavior under reinforcement learning (RL) post-training remains largely unexplored. This paper presents a systematic empirical…

This paper derives "scaling laws"--empirical relationships between the training compute of Large Language Models (LLMs) and their performance--for economic outcomes. In a preregistered online experiment, 300 professional translators…

General Economics · Economics 2024-12-10 Ali Merali

The quality of Large Language Model (LLM) pretraining depends on multiple factors, including the compute budget and the choice of optimization algorithm. Empirical scaling laws are widely used to predict loss as model size and training data…

Machine Learning · Computer Science 2026-02-25 Alexandra Volkova , Mher Safaryan , Christoph H. Lampert , Dan Alistarh

Large language models (LLMs) show best-in-class performance across a wide range of natural language processing applications. Training these models is an extremely computationally expensive task; frontier Artificial Intelligence (AI)…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-10-10 Alexander Interrante-Grant , Carla Varela-Rosa , Suhaas Narayan , Chris Connelly , Albert Reuther

Recent advances in large language models (LLMs) have been largely driven by scaling laws for individual models, which predict performance improvements as model parameters and data volume increase. However, the capabilities of any single LLM…

Machine Learning · Computer Science 2026-01-29 Dakuan Lu , Jiaqi Zhang , Cheng Yuan , Jiawei Shao , Xuelong Li
‹ Prev 1 2 3 10 Next ›