Complexity Control Facilitates Reasoning-Based Compositional Generalization in Transformers

Zhongwang Zhang; Pengxiao Lin; Zhiwei Wang; Yaoyu Zhang; Zhi-Qin John Xu

Complexity Control Facilitates Reasoning-Based Compositional Generalization in Transformers

Computation and Language 2025-01-16 v1 Machine Learning

Authors: Zhongwang Zhang , Pengxiao Lin , Zhiwei Wang , Yaoyu Zhang , Zhi-Qin John Xu

Abstract

Transformers have demonstrated impressive capabilities across various tasks, yet their performance on compositional problems remains a subject of debate. In this study, we investigate the internal mechanisms underlying Transformers' behavior in compositional tasks. We find that complexity control strategies significantly influence whether the model learns primitive-level rules that generalize out-of-distribution (reasoning-based solutions) or relies solely on memorized mappings (memory-based solutions). By applying masking strategies to the model's information circuits and employing multiple complexity metrics, we reveal distinct internal working mechanisms associated with different solution types. Further analysis reveals that reasoning-based solutions exhibit a lower complexity bias, which aligns with the well-studied neuron condensation phenomenon. This lower complexity bias is hypothesized to be the key factor enabling these solutions to learn reasoning rules. We validate these conclusions across multiple real-world datasets, including image generation and natural language processing tasks, confirming the broad applicability of our findings.

Keywords

transformer logical reasoning generalization in machine learning

Cite

@article{arxiv.2501.08537,
  title  = {Complexity Control Facilitates Reasoning-Based Compositional Generalization in Transformers},
  author = {Zhongwang Zhang and Pengxiao Lin and Zhiwei Wang and Yaoyu Zhang and Zhi-Qin John Xu},
  journal= {arXiv preprint arXiv:2501.08537},
  year   = {2025}
}

Comments

Mistakenly submitted as a replacement to 2405.05409v4

Complexity Control Facilitates Reasoning-Based Compositional Generalization in Transformers

Abstract

Keywords

Cite

Comments

Related papers