PTCG-Bench: Can LLM Agents Master Pok\'emon Trading Card Game?

Authors: Dongdong Hua, Yifei Sun, Renhong Huang, Feng Gao, Chunping Wang, Yang Yang

Artificial Intelligence2026-05v1license

Abstract

Given a strategically complex board game, human players can quickly learn to devise strategies after playing a few rounds. Autonomous agents require similar capabilities in realistic interactive environments, yet existing agent benchmarks often fail to fully capture such strategic and evolving decision-making scenarios. We present PTCG-Bench, a benchmark built on the Pok'{e}mon Trading Card Game (PTCG) that evaluates LLM agents at two complementary levels: (1) their decision-making performance within a single complex environment, and (2) their ability to self-evolving through accumulated experience. We further include a modular harness ablation to better interpret agent performance without conflating it with model capability. Our experiments show that, although LLM agents can achieve non-trivial gameplay performance, sustained and stable self-evolution remains challenging, and performance is sensitive to harness design. We hope that PTCG-Bench will facilitate future research on harness-aware and self-evolving agents in realistic interactive environments.

Cite

@article{arxiv.2605.29653,
  title  = {PTCG-Bench: Can LLM Agents Master Pok\'emon Trading Card Game?},
  author = {Dongdong Hua and Yifei Sun and Renhong Huang and Feng Gao and Chunping Wang and Yang Yang},
  journal= {arXiv preprint arXiv:2605.29653},
  year   = {2026}
}

← Artificial Intelligence · Home