English

Code2Math: Can Your Code Agent Effectively Evolve Math Problems Through Exploration?

Computation and Language 2026-03-05 v2

Abstract

As large language models (LLMs) advance their mathematical capabilities toward the IMO level, the scarcity of challenging, high-quality problems for training and evaluation has become a significant bottleneck. Simultaneously, recent code agents have demonstrated sophisticated skills in agentic coding and reasoning, suggesting that code execution can serve as a scalable environment for mathematical experimentation. In this paper, we investigate the potential of code agents to autonomously evolve existing math problems into more complex variations. We introduce a multi-agent framework designed to perform problem evolution while validating the solvability and increased difficulty of the generated problems. Our experiments demonstrate that, given sufficient test-time exploration, code agents can synthesize new, solvable problems that are structurally distinct from and more challenging than the originals. This work provides empirical evidence that code-driven agents can serve as a viable mechanism for synthesizing high-difficulty mathematical reasoning problems within scalable computational environments. Our data is available at https://github.com/TarferSoul/Code2Math.

Keywords

Cite

@article{arxiv.2603.03202,
  title  = {Code2Math: Can Your Code Agent Effectively Evolve Math Problems Through Exploration?},
  author = {Dadi Guo and Yuejin Xie and Qingyu Liu and Jiayu Liu and Zhiyuan Fan and Qihan Ren and Shuai Shao and Tianyi Zhou and Dongrui Liu and Yi R. Fung},
  journal= {arXiv preprint arXiv:2603.03202},
  year   = {2026}
}

Comments

32 pages, 4 figures