English

Chat2Workflow: A Benchmark for Generating Executable Visual Workflows with Natural Language

Computation and Language 2026-05-27 v2 Artificial Intelligence Computer Vision and Pattern Recognition Machine Learning Multiagent Systems

Abstract

At present, executable visual workflows have emerged as a mainstream paradigm in real-world industrial deployments, offering strong reliability and controllability. However, in current practice, such workflows are almost entirely constructed through manual engineering: developers must carefully design workflows, write prompts for each step, and repeatedly revise the logic as requirements evolve -- making development costly, time-consuming, and error-prone. To study whether large language models can automate this multi-round interaction process, we introduce Chat2Workflow, a benchmark for generating executable visual workflows directly from natural language, and propose a robust agentic baseline to improve performance. The benchmark is built from a large collection of real-world business workflows, with each instance designed so that the generated workflow can be transformed and directly deployed to practical workflow platforms such as Dify and Coze. Experimental results show that while state-of-the-art language models can often capture high-level intent, they struggle to generate correct, stable, and executable workflows, especially given complex and evolving requirements. Although our agentic baseline yields up to 6.05% resolve rate gains, the remaining real-world gap positions Chat2Workflow as a foundation for advancing industrial-grade automation. Code is available at https://github.com/zjunlp/Chat2Workflow.

Keywords

Cite

@article{arxiv.2604.19667,
  title  = {Chat2Workflow: A Benchmark for Generating Executable Visual Workflows with Natural Language},
  author = {Yi Zhong and Buqiang Xu and Yijun Wang and Zifei Shan and Shuofei Qiao and Guozhou Zheng and Ningyu Zhang},
  journal= {arXiv preprint arXiv:2604.19667},
  year   = {2026}
}

Comments

Work in progress

R2 v1 2026-07-01T12:28:44.798Z