MCCD: Multi-Agent Collaboration-based Compositional Diffusion for Complex Text-to-Image Generation

Mingcheng Li; Xiaolu Hou; Ziyang Liu; Dingkang Yang; Ziyun Qian; Jiawei Chen; Jinjie Wei; Yue Jiang; Qingyao Xu; Lihua Zhang

MCCD: Multi-Agent Collaboration-based Compositional Diffusion for Complex Text-to-Image Generation

Computer Vision and Pattern Recognition 2025-05-07 v2

Authors: Mingcheng Li , Xiaolu Hou , Ziyang Liu , Dingkang Yang , Ziyun Qian , Jiawei Chen , Jinjie Wei , Yue Jiang , Qingyao Xu , Lihua Zhang

View on arXiv ↗ PDF ↗

Abstract

Diffusion models have shown excellent performance in text-to-image generation. Nevertheless, existing methods often suffer from performance bottlenecks when handling complex prompts that involve multiple objects, characteristics, and relations. Therefore, we propose a Multi-agent Collaboration-based Compositional Diffusion (MCCD) for text-to-image generation for complex scenes. Specifically, we design a multi-agent collaboration-based scene parsing module that generates an agent system comprising multiple agents with distinct tasks, utilizing MLLMs to extract various scene elements effectively. In addition, Hierarchical Compositional diffusion utilizes a Gaussian mask and filtering to refine bounding box regions and enhance objects through region enhancement, resulting in the accurate and high-fidelity generation of complex scenes. Comprehensive experiments demonstrate that our MCCD significantly improves the performance of the baseline models in a training-free manner, providing a substantial advantage in complex scene generation.

Keywords

diffusion model text-to-3d generation image generation

Cite

@article{arxiv.2505.02648,
  title  = {MCCD: Multi-Agent Collaboration-based Compositional Diffusion for Complex Text-to-Image Generation},
  author = {Mingcheng Li and Xiaolu Hou and Ziyang Liu and Dingkang Yang and Ziyun Qian and Jiawei Chen and Jinjie Wei and Yue Jiang and Qingyao Xu and Lihua Zhang},
  journal= {arXiv preprint arXiv:2505.02648},
  year   = {2025}
}

MCCD: Multi-Agent Collaboration-based Compositional Diffusion for Complex Text-to-Image Generation

Abstract

Keywords

Cite

Related papers