English

AutoCode: LLMs as Problem Setters for Competitive Programming

Software Engineering 2025-10-16 v1 Artificial Intelligence Computation and Language Programming Languages

Abstract

Writing competitive programming problems is exacting. Authors must: set constraints, input distributions, and edge cases that rule out shortcuts; target specific algorithms (e.g., max-flow, dynamic programming, data structures); and calibrate complexity beyond the reach of most competitors. We argue that this makes for an ideal test of general large language model capabilities and study whether they can do this reliably. We introduce AutoCode, which uses multiple rounds of validation to yield competition-grade problem statements and test cases. On held-out problems, AutoCode test suites approach 99% consistency with official judgments, a significant improvement over current state-of-the-art methods like HardTests, which achieve less than 81%. Furthermore, starting with a random seed problem, AutoCode can create novel variants with reference and brute-force solutions. By cross-verifying these generated solutions against test cases, we can further filter out malformed problems. Our system ensures high correctness, as verified by human experts. AutoCode successfully produces novel problems judged by Grandmaster-level (top 0.3%) competitive programmers to be of contest quality.

Keywords

Cite

@article{arxiv.2510.12803,
  title  = {AutoCode: LLMs as Problem Setters for Competitive Programming},
  author = {Shang Zhou and Zihan Zheng and Kaiyuan Liu and Zeyu Shen and Zerui Cheng and Zexing Chen and Hansen He and Jianzhu Yao and Huanzhi Mao and Qiuyang Mang and Tianfu Fu and Beichen Li and Dongruixuan Li and Wenhao Chai and Zhuang Liu and Aleksandra Korolova and Peter Henderson and Natasha Jaques and Pramod Viswanath and Saining Xie and Jingbo Shang},
  journal= {arXiv preprint arXiv:2510.12803},
  year   = {2025}
}

Comments

Project page: https://livecodebenchpro.com/projects/autocode/overview