Related papers: Grokking Explained: A Statistical Phenomenon

Grokking in Linear Models for Logistic Regression

Grokking, the phenomenon of delayed generalization, is often attributed to the depth and compositional structure of deep neural networks. We study grokking in one of the simplest possible settings: the learning of a linear model with…

Machine Learning · Computer Science 2026-02-10 Nataraj Das , Atreya Vedantam , Chandrashekar Lakshminarayanan

When Data Falls Short: Grokking Below the Critical Threshold

In this paper, we investigate the phenomenon of grokking, where models exhibit delayed generalization following overfitting on training data. We focus on data-scarce regimes where the number of training samples falls below the critical…

Machine Learning · Computer Science 2025-11-10 Vaibhav Singh , Eugene Belilovsky , Rahaf Aljundi

Exploring Grokking: Experimental and Mechanistic Investigations

The phenomenon of grokking in over-parameterized neural networks has garnered significant interest. It involves the neural network initially memorizing the training set with zero training error and near-random test error. Subsequent…

Machine Learning · Computer Science 2024-12-17 Hu Qiye , Zhou Hao , Yu RuoXi

Grokking as the Transition from Lazy to Rich Training Dynamics

We propose that the grokking phenomenon, where the train loss of a neural network decreases much earlier than its test loss, can arise due to a neural network transitioning from lazy training dynamics to a rich, feature learning regime. To…

Machine Learning · Statistics 2024-04-12 Tanishq Kumar , Blake Bordelon , Samuel J. Gershman , Cengiz Pehlevan

Grokking at the Edge of Linear Separability

We investigate the phenomenon of grokking -- delayed generalization accompanied by non-monotonic test loss behavior -- in a simple binary logistic classification task, for which "memorizing" and "generalizing" solutions can be strictly…

Machine Learning · Statistics 2025-07-22 Alon Beck , Noam Levi , Yohai Bar-Sinai

A rationale from frequency perspective for grokking in training neural network

Grokking is the phenomenon where neural networks NNs initially fit the training data and later generalize to the test data during training. In this paper, we empirically provide a frequency perspective to explain the emergence of this…

Machine Learning · Computer Science 2024-05-29 Zhangchen Zhou , Yaoyu Zhang , Zhi-Qin John Xu

Grokking in Linear Estimators -- A Solvable Model that Groks without Understanding

Grokking is the intriguing phenomenon where a model learns to generalize long after it has fit the training data. We show both analytically and numerically that grokking can surprisingly occur in linear networks performing linear tasks in a…

Machine Learning · Statistics 2024-02-06 Noam Levi , Alon Beck , Yohai Bar-Sinai

Let Me Grok for You: Accelerating Grokking via Embedding Transfer from a Weaker Model

''Grokking'' is a phenomenon where a neural network first memorizes training data and generalizes poorly, but then suddenly transitions to near-perfect generalization after prolonged training. While intriguing, this delayed generalization…

Machine Learning · Computer Science 2025-04-21 Zhiwei Xu , Zhiyu Ni , Yixin Wang , Wei Hu

Bridging Lottery Ticket and Grokking: Understanding Grokking from Inner Structure of Networks

Grokking is an intriguing phenomenon of delayed generalization, where neural networks initially memorize training data with perfect accuracy but exhibit poor generalization, subsequently transitioning to a generalizing solution with…

Machine Learning · Computer Science 2025-05-12 Gouki Minegishi , Yusuke Iwasawa , Yutaka Matsuo

Deep Networks Always Grok and Here is Why

Grokking, or delayed generalization, is a phenomenon where generalization in a deep neural network (DNN) occurs long after achieving near zero training error. Previous studies have reported the occurrence of grokking in specific controlled…

Machine Learning · Computer Science 2024-06-10 Ahmed Imtiaz Humayun , Randall Balestriero , Richard Baraniuk

Grokking as an entanglement transition in tensor network machine learning

Grokking is a intriguing phenomenon in machine learning where a neural network, after many training iterations with negligible improvement in generalization, suddenly achieves high accuracy on unseen data. By working in the quantum-inspired…

Quantum Physics · Physics 2025-03-14 Domenico Pomarico , Alfonso Monaco , Giuseppe Magnifico , Antonio Lacalamita , Ester Pantaleo , Loredana Bellantuono , Sabina Tangaro , Tommaso Maggipinto , Marianna La Rocca , Ernesto Picardi , Nicola Amoroso , Graziano Pesole , Sebastiano Stramaglia , Roberto Bellotti

Beyond Progress Measures: Theoretical Insights into the Mechanism of Grokking

Grokking, referring to the abrupt improvement in test accuracy after extended overfitting, offers valuable insights into the mechanisms of model generalization. Existing researches based on progress measures imply that grokking relies on…

Machine Learning · Computer Science 2025-04-15 Zihan Gu , Ruoyu Chen , Hua Zhang , Yue Hu , Xiaochun Cao

Grokking in the Ising Model

Delayed generalization, termed grokking, in a machine learning calculation occurs when the increase in test accuracy is delayed relative to the training accuracy. This paper examines grokking in the context of a dense neural network trained…

Disordered Systems and Neural Networks · Physics 2026-02-06 Karolina Hutchison , David Yevick

Omnigrok: Grokking Beyond Algorithmic Data

Grokking, the unusual phenomenon for algorithmic datasets where generalization happens long after overfitting the training data, has remained elusive. We aim to understand grokking by analyzing the loss landscapes of neural networks,…

Machine Learning · Computer Science 2023-03-24 Ziming Liu , Eric J. Michaud , Max Tegmark

The Norm-Separation Delay Law of Grokking: A First-Principles Theory of Delayed Generalization

Grokking -- the sudden generalisation that appears long after a model has perfectly memorised its training data -- has been widely observed but lacks a quantitative theory explaining the length of the delay. We show that grokking is a…

Artificial Intelligence · Computer Science 2026-05-05 Truong Xuan Khanh , Truong Quynh Hoa , Luu Duc Trung , Phan Thanh Duc

Explaining grokking through circuit efficiency

One of the most surprising puzzles in neural network generalisation is grokking: a network with perfect training accuracy but poor generalisation will, upon further training, transition to perfect generalisation. We propose that grokking…

Machine Learning · Computer Science 2023-09-06 Vikrant Varma , Rohin Shah , Zachary Kenton , János Kramár , Ramana Kumar

A Tale of Two Circuits: Grokking as Competition of Sparse and Dense Subnetworks

Grokking is a phenomenon where a model trained on an algorithmic task first overfits but, then, after a large amount of additional training, undergoes a phase transition to generalize perfectly. We empirically study the internal structure…

Machine Learning · Computer Science 2023-03-22 William Merrill , Nikolaos Tsilivis , Aman Shukla

Why Do You Grok? A Theoretical Analysis of Grokking Modular Addition

We present a theoretical explanation of the ``grokking'' phenomenon, where a model generalizes long after overfitting,for the originally-studied problem of modular addition. First, we show that early in gradient descent, when the ``kernel…

Machine Learning · Computer Science 2024-07-18 Mohamad Amin Mohamadi , Zhiyuan Li , Lei Wu , Danica J. Sutherland

Tracing the Path to Grokking: Embeddings, Dropout, and Network Activation

Grokking refers to delayed generalization in which the increase in test accuracy of a neural network occurs appreciably after the improvement in training accuracy This paper introduces several practical metrics including variance under…

Machine Learning · Computer Science 2025-07-17 Ahmed Salah , David Yevick

Mechanistic Insights into Grokking from the Embedding Layer

Grokking, a delayed generalization in neural networks after perfect training performance, has been observed in Transformers and MLPs, but the components driving it remain underexplored. We show that embeddings are central to grokking:…

Machine Learning · Computer Science 2025-05-22 H. V. AlquBoj , Hilal AlQuabeh , Velibor Bojkovic , Munachiso Nwadike , Kentaro Inui