English
Related papers

Related papers: Grokking Explained: A Statistical Phenomenon

200 papers

Grokking, the phenomenon of delayed generalization, is often attributed to the depth and compositional structure of deep neural networks. We study grokking in one of the simplest possible settings: the learning of a linear model with…

Machine Learning · Computer Science 2026-02-10 Nataraj Das , Atreya Vedantam , Chandrashekar Lakshminarayanan

In this paper, we investigate the phenomenon of grokking, where models exhibit delayed generalization following overfitting on training data. We focus on data-scarce regimes where the number of training samples falls below the critical…

Machine Learning · Computer Science 2025-11-10 Vaibhav Singh , Eugene Belilovsky , Rahaf Aljundi

The phenomenon of grokking in over-parameterized neural networks has garnered significant interest. It involves the neural network initially memorizing the training set with zero training error and near-random test error. Subsequent…

Machine Learning · Computer Science 2024-12-17 Hu Qiye , Zhou Hao , Yu RuoXi

We propose that the grokking phenomenon, where the train loss of a neural network decreases much earlier than its test loss, can arise due to a neural network transitioning from lazy training dynamics to a rich, feature learning regime. To…

Machine Learning · Statistics 2024-04-12 Tanishq Kumar , Blake Bordelon , Samuel J. Gershman , Cengiz Pehlevan

We investigate the phenomenon of grokking -- delayed generalization accompanied by non-monotonic test loss behavior -- in a simple binary logistic classification task, for which "memorizing" and "generalizing" solutions can be strictly…

Machine Learning · Statistics 2025-07-22 Alon Beck , Noam Levi , Yohai Bar-Sinai

Grokking is the phenomenon where neural networks NNs initially fit the training data and later generalize to the test data during training. In this paper, we empirically provide a frequency perspective to explain the emergence of this…

Machine Learning · Computer Science 2024-05-29 Zhangchen Zhou , Yaoyu Zhang , Zhi-Qin John Xu

Grokking is the intriguing phenomenon where a model learns to generalize long after it has fit the training data. We show both analytically and numerically that grokking can surprisingly occur in linear networks performing linear tasks in a…

Machine Learning · Statistics 2024-02-06 Noam Levi , Alon Beck , Yohai Bar-Sinai

''Grokking'' is a phenomenon where a neural network first memorizes training data and generalizes poorly, but then suddenly transitions to near-perfect generalization after prolonged training. While intriguing, this delayed generalization…

Machine Learning · Computer Science 2025-04-21 Zhiwei Xu , Zhiyu Ni , Yixin Wang , Wei Hu

Grokking is an intriguing phenomenon of delayed generalization, where neural networks initially memorize training data with perfect accuracy but exhibit poor generalization, subsequently transitioning to a generalizing solution with…

Machine Learning · Computer Science 2025-05-12 Gouki Minegishi , Yusuke Iwasawa , Yutaka Matsuo

Grokking, or delayed generalization, is a phenomenon where generalization in a deep neural network (DNN) occurs long after achieving near zero training error. Previous studies have reported the occurrence of grokking in specific controlled…

Machine Learning · Computer Science 2024-06-10 Ahmed Imtiaz Humayun , Randall Balestriero , Richard Baraniuk

Grokking is a intriguing phenomenon in machine learning where a neural network, after many training iterations with negligible improvement in generalization, suddenly achieves high accuracy on unseen data. By working in the quantum-inspired…

Grokking, referring to the abrupt improvement in test accuracy after extended overfitting, offers valuable insights into the mechanisms of model generalization. Existing researches based on progress measures imply that grokking relies on…

Machine Learning · Computer Science 2025-04-15 Zihan Gu , Ruoyu Chen , Hua Zhang , Yue Hu , Xiaochun Cao

Delayed generalization, termed grokking, in a machine learning calculation occurs when the increase in test accuracy is delayed relative to the training accuracy. This paper examines grokking in the context of a dense neural network trained…

Disordered Systems and Neural Networks · Physics 2026-02-06 Karolina Hutchison , David Yevick

Grokking, the unusual phenomenon for algorithmic datasets where generalization happens long after overfitting the training data, has remained elusive. We aim to understand grokking by analyzing the loss landscapes of neural networks,…

Machine Learning · Computer Science 2023-03-24 Ziming Liu , Eric J. Michaud , Max Tegmark

Grokking -- the sudden generalisation that appears long after a model has perfectly memorised its training data -- has been widely observed but lacks a quantitative theory explaining the length of the delay. We show that grokking is a…

Artificial Intelligence · Computer Science 2026-05-05 Truong Xuan Khanh , Truong Quynh Hoa , Luu Duc Trung , Phan Thanh Duc

One of the most surprising puzzles in neural network generalisation is grokking: a network with perfect training accuracy but poor generalisation will, upon further training, transition to perfect generalisation. We propose that grokking…

Machine Learning · Computer Science 2023-09-06 Vikrant Varma , Rohin Shah , Zachary Kenton , János Kramár , Ramana Kumar

Grokking is a phenomenon where a model trained on an algorithmic task first overfits but, then, after a large amount of additional training, undergoes a phase transition to generalize perfectly. We empirically study the internal structure…

Machine Learning · Computer Science 2023-03-22 William Merrill , Nikolaos Tsilivis , Aman Shukla

We present a theoretical explanation of the ``grokking'' phenomenon, where a model generalizes long after overfitting,for the originally-studied problem of modular addition. First, we show that early in gradient descent, when the ``kernel…

Machine Learning · Computer Science 2024-07-18 Mohamad Amin Mohamadi , Zhiyuan Li , Lei Wu , Danica J. Sutherland

Grokking refers to delayed generalization in which the increase in test accuracy of a neural network occurs appreciably after the improvement in training accuracy This paper introduces several practical metrics including variance under…

Machine Learning · Computer Science 2025-07-17 Ahmed Salah , David Yevick

Grokking, a delayed generalization in neural networks after perfect training performance, has been observed in Transformers and MLPs, but the components driving it remain underexplored. We show that embeddings are central to grokking:…

Machine Learning · Computer Science 2025-05-22 H. V. AlquBoj , Hilal AlQuabeh , Velibor Bojkovic , Munachiso Nwadike , Kentaro Inui
‹ Prev 1 2 3 10 Next ›