Related papers: Grokking modular arithmetic

Grokking Modular Polynomials

Neural networks readily learn a subset of the modular arithmetic tasks, while failing to generalize on the rest. This limitation remains unmoved by the choice of architecture and training strategies. On the other hand, an analytical…

Machine Learning · Computer Science 2024-06-06 Darshil Doshi , Tianyu He , Aritra Das , Andrey Gromov

Grokking in Linear Estimators -- A Solvable Model that Groks without Understanding

Grokking is the intriguing phenomenon where a model learns to generalize long after it has fit the training data. We show both analytically and numerically that grokking can surprisingly occur in linear networks performing linear tasks in a…

Machine Learning · Statistics 2024-02-06 Noam Levi , Alon Beck , Yohai Bar-Sinai

Emergence in non-neural models: grokking modular arithmetic via average gradient outer product

Neural networks trained to solve modular arithmetic tasks exhibit grokking, a phenomenon where the test accuracy starts improving long after the model achieves 100% training accuracy in the training process. It is often taken as an example…

Machine Learning · Statistics 2025-07-10 Neil Mallinar , Daniel Beaglehole , Libin Zhu , Adityanarayanan Radhakrishnan , Parthe Pandit , Mikhail Belkin

NeuralGrok: Accelerate Grokking by Neural Gradient Transformation

Grokking is proposed and widely studied as an intricate phenomenon in which generalization is achieved after a long-lasting period of overfitting. In this work, we propose NeuralGrok, a novel gradient-based approach that learns an optimal…

Machine Learning · Computer Science 2025-04-28 Xinyu Zhou , Simin Fan , Martin Jaggi , Jie Fu

Why Do You Grok? A Theoretical Analysis of Grokking Modular Addition

We present a theoretical explanation of the ``grokking'' phenomenon, where a model generalizes long after overfitting,for the originally-studied problem of modular addition. First, we show that early in gradient descent, when the ``kernel…

Machine Learning · Computer Science 2024-07-18 Mohamad Amin Mohamadi , Zhiyuan Li , Lei Wu , Danica J. Sutherland

The Geometry of Grokking: Norm Minimization on the Zero-Loss Manifold

Grokking is a puzzling phenomenon in neural networks where full generalization occurs only after a substantial delay following the complete memorization of the training data. Previous research has linked this delayed generalization to…

Machine Learning · Computer Science 2026-01-12 Tiberiu Musat

Explaining grokking through circuit efficiency

One of the most surprising puzzles in neural network generalisation is grokking: a network with perfect training accuracy but poor generalisation will, upon further training, transition to perfect generalisation. We propose that grokking…

Machine Learning · Computer Science 2023-09-06 Vikrant Varma , Rohin Shah , Zachary Kenton , János Kramár , Ramana Kumar

Grokking in the Ising Model

Delayed generalization, termed grokking, in a machine learning calculation occurs when the increase in test accuracy is delayed relative to the training accuracy. This paper examines grokking in the context of a dense neural network trained…

Disordered Systems and Neural Networks · Physics 2026-02-06 Karolina Hutchison , David Yevick

Grokking as Dimensional Phase Transition in Neural Networks

Neural network grokking -- the abrupt memorization-to-generalization transition -- challenges our understanding of learning dynamics. Through finite-size scaling of gradient avalanche dynamics across eight model scales, we find that…

Machine Learning · Computer Science 2026-04-07 Ping Wang

Grokking as the Transition from Lazy to Rich Training Dynamics

We propose that the grokking phenomenon, where the train loss of a neural network decreases much earlier than its test loss, can arise due to a neural network transitioning from lazy training dynamics to a rich, feature learning regime. To…

Machine Learning · Statistics 2024-04-12 Tanishq Kumar , Blake Bordelon , Samuel J. Gershman , Cengiz Pehlevan

Grokking Finite-Dimensional Algebra

This paper investigates the grokking phenomenon, which refers to the sudden transition from a long memorization to generalization observed during neural networks training, in the context of learning multiplication in finite-dimensional…

Machine Learning · Computer Science 2026-05-15 Pascal Jr Tikeng Notsawo , Guillaume Dumas , Guillaume Rabusseau

Understanding Grokking Through A Robustness Viewpoint

Recently, an interesting phenomenon called grokking has gained much attention, where generalization occurs long after the models have initially overfitted the training data. We try to understand this seemingly strange phenomenon through the…

Machine Learning · Computer Science 2024-02-05 Zhiquan Tan , Weiran Huang

Grokking at the Edge of Linear Separability

We investigate the phenomenon of grokking -- delayed generalization accompanied by non-monotonic test loss behavior -- in a simple binary logistic classification task, for which "memorizing" and "generalizing" solutions can be strictly…

Machine Learning · Statistics 2025-07-22 Alon Beck , Noam Levi , Yohai Bar-Sinai

Grokking in Linear Models for Logistic Regression

Grokking, the phenomenon of delayed generalization, is often attributed to the depth and compositional structure of deep neural networks. We study grokking in one of the simplest possible settings: the learning of a linear model with…

Machine Learning · Computer Science 2026-02-10 Nataraj Das , Atreya Vedantam , Chandrashekar Lakshminarayanan

Progress measures for grokking via mechanistic interpretability

Neural networks often exhibit emergent behavior, where qualitatively new capabilities arise from scaling up the amount of parameters, training data, or training steps. One approach to understanding emergence is to find continuous…

Machine Learning · Computer Science 2023-10-23 Neel Nanda , Lawrence Chan , Tom Lieberum , Jess Smith , Jacob Steinhardt

Grokking From Abstraction to Intelligence

Grokking in modular arithmetic has established itself as the quintessential fruit fly experiment, serving as a critical domain for investigating the mechanistic origins of model generalization. Despite its significance, existing research…

Artificial Intelligence · Computer Science 2026-04-01 Junjie Zhang , Zhen Shen , Gang Xiong , Xisong Dong

Controlling Grokking with Nonlinearity and Data Symmetry

This paper demonstrates that grokking behavior in modular arithmetic with a modulus P in a neural network can be controlled by modifying the profile of the activation function as well as the depth and width of the model. Plotting the even…

Machine Learning · Computer Science 2024-11-11 Ahmed Salah , David Yevick

Omnigrok: Grokking Beyond Algorithmic Data

Grokking, the unusual phenomenon for algorithmic datasets where generalization happens long after overfitting the training data, has remained elusive. We aim to understand grokking by analyzing the loss landscapes of neural networks,…

Machine Learning · Computer Science 2023-03-24 Ziming Liu , Eric J. Michaud , Max Tegmark

Grokking Beyond the Euclidean Norm of Model Parameters

Grokking refers to a delayed generalization following overfitting when optimizing artificial neural networks with gradient-based methods. In this work, we demonstrate that grokking can be induced by regularization, either explicit or…

Machine Learning · Computer Science 2025-07-14 Pascal Jr Tikeng Notsawo , Guillaume Dumas , Guillaume Rabusseau

Grokking Beyond Neural Networks: An Empirical Exploration with Model Complexity

In some settings neural networks exhibit a phenomenon known as \textit{grokking}, where they achieve perfect or near-perfect accuracy on the validation set long after the same performance has been achieved on the training set. In this…

Machine Learning · Computer Science 2024-04-02 Jack Miller , Charles O'Neill , Thang Bui