Related papers: Cliff-Learning

An Empirical Study of Scaling Laws for Transfer

We present a limited empirical study of scaling laws for transfer learning in transformer models. More specifically, we examine a scaling law that incorporates a "transfer gap" term, indicating the effectiveness of pre-training on one…

Machine Learning · Computer Science 2024-09-02 Matthew Barnett

Theoretical Understanding of the Information Flow on Continual Learning Performance

Continual learning (CL) is a setting in which an agent has to learn from an incoming stream of data sequentially. CL performance evaluates the model's ability to continually learn and solve new problems with incremental available…

Machine Learning · Computer Science 2022-05-04 Josh Andle , Salimeh Yasaei Sekeh

A Dynamical Model of Neural Scaling Laws

On a variety of tasks, the performance of neural networks predictably improves with training time, dataset size and model size across many orders of magnitude. This phenomenon is known as a neural scaling law. Of fundamental importance is…

Machine Learning · Statistics 2024-06-25 Blake Bordelon , Alexander Atanasov , Cengiz Pehlevan

Continual Learning with Foundation Models: An Empirical Study of Latent Replay

Rapid development of large-scale pre-training has resulted in foundation models that can act as effective feature extractors on a variety of downstream tasks and domains. Motivated by this, we study the efficacy of pre-trained vision models…

Machine Learning · Computer Science 2022-07-05 Oleksiy Ostapenko , Timothee Lesort , Pau Rodríguez , Md Rifat Arefin , Arthur Douillard , Irina Rish , Laurent Charlin

Fast Escape, Slow Convergence: Learning Dynamics of Phase Retrieval under Power-Law Data

Scaling laws describe how learning performance improves with data, compute, or training time, and have become a central theme in modern deep learning. We study this phenomenon in a canonical nonlinear model: phase retrieval with anisotropic…

Machine Learning · Statistics 2025-11-25 Guillaume Braun , Bruno Loureiro , Ha Quang Minh , Masaaki Imaizumi

Implicit bias produces neural scaling laws in learning curves, from perceptrons to deep networks

Scaling laws in deep learning -- empirical power-law relationships linking model performance to resource growth -- have emerged as simple yet striking regularities across architectures, datasets, and tasks. These laws are particularly…

Machine Learning · Computer Science 2026-05-01 Francesco D'Amico , Dario Bocchi , Matteo Negri

Understanding and Improving Transfer Learning of Deep Models via Neural Collapse

With the ever-increasing complexity of large-scale pre-trained models coupled with a shortage of labeled data for downstream training, transfer learning has become the primary approach in many fields, including natural language processing,…

Machine Learning · Computer Science 2024-07-22 Xiao Li , Sheng Liu , Jinxin Zhou , Xinyu Lu , Carlos Fernandez-Granda , Zhihui Zhu , Qing Qu

Scaling Laws Are Unreliable for Downstream Tasks: A Reality Check

Downstream scaling laws aim to predict task performance at larger scales from the model's performance at smaller scales. Whether such prediction should be possible is unclear: some works discover clear linear scaling trends after simple…

Computation and Language · Computer Science 2025-10-10 Nicholas Lourie , Michael Y. Hu , Kyunghyun Cho

Scaling Laws for the Few-Shot Adaptation of Pre-trained Image Classifiers

Empirical science of neural scaling laws is a rapidly growing area of significant importance to the future of machine learning, particularly in the light of recent breakthroughs achieved by large-scale pre-trained models such as GPT-3, CLIP…

Machine Learning · Computer Science 2021-10-20 Gabriele Prato , Simon Guiroy , Ethan Caballero , Irina Rish , Sarath Chandar

Scaling Laws and In-Context Learning: A Unified Theoretical Framework

In-context learning (ICL) enables large language models to adapt to new tasks from demonstrations without parameter updates. Despite extensive empirical studies, a principled understanding of ICL emergence at scale remains more elusive. We…

Machine Learning · Computer Science 2025-11-11 Sushant Mehta , Ishan Gupta

Analyzing Neural Scaling Laws in Two-Layer Networks with Power-Law Data Spectra

Neural scaling laws describe how the performance of deep neural networks scales with key factors such as training data size, model complexity, and training time, often following power-law behaviors over multiple orders of magnitude. Despite…

Machine Learning · Statistics 2024-10-14 Roman Worschech , Bernd Rosenow

What we learn from the learning rate

The learning rate is an information-theoretical quantity for bipartite Markov chains describing two coupled subsystems. It is defined as the rate at which transitions in the downstream subsystem tend to increase the mutual information…

Statistical Mechanics · Physics 2017-07-04 Rory A. Brittain , Nick S. Jones , Thomas E. Ouldridge

Federated Learning Clients Clustering with Adaptation to Data Drifts

Federated Learning (FL) trains deep models across edge devices without centralizing raw data, preserving user privacy. However, client heterogeneity slows down convergence and limits global model accuracy. Clustered FL (CFL) mitigates this…

Machine Learning · Computer Science 2026-02-10 Minghao Li , Dmitrii Avdiukhin , Rana Shahout , Nikita Ivkin , Vladimir Braverman , Minlan Yu

On the Role of Neural Collapse in Transfer Learning

We study the ability of foundation models to learn representations for classification that are transferable to new, unseen classes. Recent results in the literature show that representations learned by a single classifier over many classes…

Machine Learning · Computer Science 2022-01-05 Tomer Galanti , András György , Marcus Hutter

Curriculum Learning by Transfer Learning: Theory and Experiments with Deep Networks

We provide theoretical investigation of curriculum learning in the context of stochastic gradient descent when optimizing the convex linear regression loss. We prove that the rate of convergence of an ideal curriculum learning method is…

Machine Learning · Computer Science 2023-12-29 Daphna Weinshall , Gad Cohen , Dan Amir

Navigating Scaling Laws: Compute Optimality in Adaptive Model Training

In recent years, the state-of-the-art in deep learning has been dominated by very large models that have been pre-trained on vast amounts of data. The paradigm is very simple: investing more computational resources (optimally) leads to…

Machine Learning · Computer Science 2024-05-24 Sotiris Anagnostidis , Gregor Bachmann , Imanol Schlag , Thomas Hofmann

Deep Learning Scaling is Predictable, Empirically

Deep learning (DL) creates impactful advances following a virtuous recipe: model architecture search, creating large training data sets, and scaling computation. It is widely believed that growing training sets and models should improve…

Machine Learning · Computer Science 2017-12-04 Joel Hestness , Sharan Narang , Newsha Ardalani , Gregory Diamos , Heewoo Jun , Hassan Kianinejad , Md. Mostofa Ali Patwary , Yang Yang , Yanqi Zhou

Scaling Laws for Transfer

We study empirical scaling laws for transfer learning between distributions in an unsupervised, fine-tuning setting. When we train increasingly large neural networks from-scratch on a fixed-size dataset, they eventually become data-limited…

Machine Learning · Computer Science 2021-02-03 Danny Hernandez , Jared Kaplan , Tom Henighan , Sam McCandlish

Scaling Laws Under the Microscope: Predicting Transformer Performance from Small Scale Experiments

Neural scaling laws define a predictable relationship between a model's parameter count and its performance after training in the form of a power law. However, most research to date has not explicitly investigated whether scaling laws can…

Computation and Language · Computer Science 2022-10-19 Maor Ivgi , Yair Carmon , Jonathan Berant

Features are fate: a theory of transfer learning in high-dimensional regression

With the emergence of large-scale pre-trained neural networks, methods to adapt such "foundation" models to data-limited downstream tasks have become a necessity. Fine-tuning, preference optimization, and transfer learning have all been…

Machine Learning · Statistics 2025-07-09 Javan Tahir , Surya Ganguli , Grant M. Rotskoff