English
Related papers

Related papers: Grokking modular arithmetic

200 papers

Neural networks readily learn a subset of the modular arithmetic tasks, while failing to generalize on the rest. This limitation remains unmoved by the choice of architecture and training strategies. On the other hand, an analytical…

Machine Learning · Computer Science 2024-06-06 Darshil Doshi , Tianyu He , Aritra Das , Andrey Gromov

Grokking is the intriguing phenomenon where a model learns to generalize long after it has fit the training data. We show both analytically and numerically that grokking can surprisingly occur in linear networks performing linear tasks in a…

Machine Learning · Statistics 2024-02-06 Noam Levi , Alon Beck , Yohai Bar-Sinai

Neural networks trained to solve modular arithmetic tasks exhibit grokking, a phenomenon where the test accuracy starts improving long after the model achieves 100% training accuracy in the training process. It is often taken as an example…

Grokking is proposed and widely studied as an intricate phenomenon in which generalization is achieved after a long-lasting period of overfitting. In this work, we propose NeuralGrok, a novel gradient-based approach that learns an optimal…

Machine Learning · Computer Science 2025-04-28 Xinyu Zhou , Simin Fan , Martin Jaggi , Jie Fu

We present a theoretical explanation of the ``grokking'' phenomenon, where a model generalizes long after overfitting,for the originally-studied problem of modular addition. First, we show that early in gradient descent, when the ``kernel…

Machine Learning · Computer Science 2024-07-18 Mohamad Amin Mohamadi , Zhiyuan Li , Lei Wu , Danica J. Sutherland

Grokking is a puzzling phenomenon in neural networks where full generalization occurs only after a substantial delay following the complete memorization of the training data. Previous research has linked this delayed generalization to…

Machine Learning · Computer Science 2026-01-12 Tiberiu Musat

One of the most surprising puzzles in neural network generalisation is grokking: a network with perfect training accuracy but poor generalisation will, upon further training, transition to perfect generalisation. We propose that grokking…

Machine Learning · Computer Science 2023-09-06 Vikrant Varma , Rohin Shah , Zachary Kenton , János Kramár , Ramana Kumar

Delayed generalization, termed grokking, in a machine learning calculation occurs when the increase in test accuracy is delayed relative to the training accuracy. This paper examines grokking in the context of a dense neural network trained…

Disordered Systems and Neural Networks · Physics 2026-02-06 Karolina Hutchison , David Yevick

Neural network grokking -- the abrupt memorization-to-generalization transition -- challenges our understanding of learning dynamics. Through finite-size scaling of gradient avalanche dynamics across eight model scales, we find that…

Machine Learning · Computer Science 2026-04-07 Ping Wang

We propose that the grokking phenomenon, where the train loss of a neural network decreases much earlier than its test loss, can arise due to a neural network transitioning from lazy training dynamics to a rich, feature learning regime. To…

Machine Learning · Statistics 2024-04-12 Tanishq Kumar , Blake Bordelon , Samuel J. Gershman , Cengiz Pehlevan

This paper investigates the grokking phenomenon, which refers to the sudden transition from a long memorization to generalization observed during neural networks training, in the context of learning multiplication in finite-dimensional…

Machine Learning · Computer Science 2026-05-15 Pascal Jr Tikeng Notsawo , Guillaume Dumas , Guillaume Rabusseau

Recently, an interesting phenomenon called grokking has gained much attention, where generalization occurs long after the models have initially overfitted the training data. We try to understand this seemingly strange phenomenon through the…

Machine Learning · Computer Science 2024-02-05 Zhiquan Tan , Weiran Huang

We investigate the phenomenon of grokking -- delayed generalization accompanied by non-monotonic test loss behavior -- in a simple binary logistic classification task, for which "memorizing" and "generalizing" solutions can be strictly…

Machine Learning · Statistics 2025-07-22 Alon Beck , Noam Levi , Yohai Bar-Sinai

Grokking, the phenomenon of delayed generalization, is often attributed to the depth and compositional structure of deep neural networks. We study grokking in one of the simplest possible settings: the learning of a linear model with…

Machine Learning · Computer Science 2026-02-10 Nataraj Das , Atreya Vedantam , Chandrashekar Lakshminarayanan

Neural networks often exhibit emergent behavior, where qualitatively new capabilities arise from scaling up the amount of parameters, training data, or training steps. One approach to understanding emergence is to find continuous…

Machine Learning · Computer Science 2023-10-23 Neel Nanda , Lawrence Chan , Tom Lieberum , Jess Smith , Jacob Steinhardt

Grokking in modular arithmetic has established itself as the quintessential fruit fly experiment, serving as a critical domain for investigating the mechanistic origins of model generalization. Despite its significance, existing research…

Artificial Intelligence · Computer Science 2026-04-01 Junjie Zhang , Zhen Shen , Gang Xiong , Xisong Dong

This paper demonstrates that grokking behavior in modular arithmetic with a modulus P in a neural network can be controlled by modifying the profile of the activation function as well as the depth and width of the model. Plotting the even…

Machine Learning · Computer Science 2024-11-11 Ahmed Salah , David Yevick

Grokking, the unusual phenomenon for algorithmic datasets where generalization happens long after overfitting the training data, has remained elusive. We aim to understand grokking by analyzing the loss landscapes of neural networks,…

Machine Learning · Computer Science 2023-03-24 Ziming Liu , Eric J. Michaud , Max Tegmark

Grokking refers to a delayed generalization following overfitting when optimizing artificial neural networks with gradient-based methods. In this work, we demonstrate that grokking can be induced by regularization, either explicit or…

Machine Learning · Computer Science 2025-07-14 Pascal Jr Tikeng Notsawo , Guillaume Dumas , Guillaume Rabusseau

In some settings neural networks exhibit a phenomenon known as \textit{grokking}, where they achieve perfect or near-perfect accuracy on the validation set long after the same performance has been achieved on the training set. In this…

Machine Learning · Computer Science 2024-04-02 Jack Miller , Charles O'Neill , Thang Bui
‹ Prev 1 2 3 10 Next ›