Related papers: SIDiffAgent: Self-Improving Diffusion Agent

DiffusionAgent: Navigating Expert Models for Agentic Image Generation

In the accelerating era of human-instructed visual content creation, diffusion models have demonstrated remarkable generative potential. Yet their deployment is constrained by a dual bottleneck: semantic ambiguity in diverse prompts and the…

Computer Vision and Pattern Recognition · Computer Science 2026-01-21 Jie Qin , Jie Wu , Weifeng Chen , Yueming Lyu

Beyond Local Views: Global State Inference with Diffusion Models for Cooperative Multi-Agent Reinforcement Learning

In partially observable multi-agent systems, agents typically only have access to local observations. This severely hinders their ability to make precise decisions, particularly during decentralized execution. To alleviate this problem and…

Multiagent Systems · Computer Science 2024-08-20 Zhiwei Xu , Hangyu Mao , Nianmin Zhang , Xin Xin , Pengjie Ren , Dapeng Li , Bin Zhang , Guoliang Fan , Zhumin Chen , Changwei Wang , Jiangjin Yin

Frontend Diffusion: Empowering Self-Representation of Junior Researchers and Designers Through Multi-agent System

With the continuous development of generative AI's logical reasoning abilities, AI's growing code-generation potential poses challenges for both technical and creative professionals. But how can these advances be directed toward empowering…

Human-Computer Interaction · Computer Science 2025-08-12 Zijian Ding , Qinshi Zhang , Mohan Chi , Ziyi Wang

DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model

Text-to-image (T2I) generative models have attracted significant attention and found extensive applications within and beyond academic research. For example, the Civitai community, a platform for T2I innovation, currently hosts an…

Computation and Language · Computer Science 2024-04-03 Lirui Zhao , Yue Yang , Kaipeng Zhang , Wenqi Shao , Yuxin Zhang , Yu Qiao , Ping Luo , Rongrong Ji

PIDiff: Image Customization for Personalized Identities with Diffusion Models

Text-to-image generation for personalized identities aims at incorporating the specific identity into images using a text prompt and an identity image. Based on the powerful generative capabilities of DDPMs, many previous works adopt…

Computer Vision and Pattern Recognition · Computer Science 2025-05-13 Jinyu Gu , Haipeng Liu , Meng Wang , Yang Wang

eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers

Large-scale diffusion-based generative models have led to breakthroughs in text-conditioned high-resolution image synthesis. Starting from random noise, such text-to-image diffusion models gradually synthesize images in an iterative fashion…

Computer Vision and Pattern Recognition · Computer Science 2023-03-15 Yogesh Balaji , Seungjun Nah , Xun Huang , Arash Vahdat , Jiaming Song , Qinsheng Zhang , Karsten Kreis , Miika Aittala , Timo Aila , Samuli Laine , Bryan Catanzaro , Tero Karras , Ming-Yu Liu

IQA-Adapter: Exploring Knowledge Transfer from Image Quality Assessment to Diffusion-based Generative Models

Diffusion-based models have recently revolutionized image generation, achieving unprecedented levels of fidelity. However, consistent generation of high-quality images remains challenging partly due to the lack of conditioning mechanisms…

Computer Vision and Pattern Recognition · Computer Science 2025-03-18 Khaled Abud , Sergey Lavrushkin , Alexey Kirillov , Dmitriy Vatolin

GenAgent: Scaling Text-to-Image Generation via Agentic Multimodal Reasoning

We introduce GenAgent, unifying visual understanding and generation through an agentic multimodal model. Unlike unified models that face expensive training costs and understanding-generation trade-offs, GenAgent decouples these capabilities…

Computer Vision and Pattern Recognition · Computer Science 2026-01-29 Kaixun Jiang , Yuzheng Wang , Junjie Zhou , Pandeng Li , Zhihang Liu , Chen-Wei Xie , Zhaoyu Chen , Yun Zheng , Wenqiang Zhang

DiffGraph: An Automated Agent-driven Model Merging Framework for In-the-Wild Text-to-Image Generation

The rapid growth of the text-to-image (T2I) community has fostered a thriving online ecosystem of expert models, which are variants of pretrained diffusion models specialized for diverse generative abilities. Yet, existing model merging…

Artificial Intelligence · Computer Science 2026-03-24 Zhuoling Li , Hossein Rahmani , Jiarui Zhang , Yu Xue , Majid Mirmehdi , Jason Kuen , Jiuxiang Gu , Jun Liu

DiffEditor: Boosting Accuracy and Flexibility on Diffusion-based Image Editing

Large-scale Text-to-Image (T2I) diffusion models have revolutionized image generation over the last few years. Although owning diverse and high-quality generation capabilities, translating these abilities to fine-grained image editing…

Computer Vision and Pattern Recognition · Computer Science 2024-02-06 Chong Mou , Xintao Wang , Jiechong Song , Ying Shan , Jian Zhang

DiffBench Meets DiffAgent: End-to-End LLM-Driven Diffusion Acceleration Code Generation

Diffusion models have achieved remarkable success in image and video generation. However, their inherently multiple step inference process imposes substantial computational overhead, hindering real-world deployment. Accelerating diffusion…

Computer Vision and Pattern Recognition · Computer Science 2026-01-07 Jiajun jiao , Haowei Zhu , Puyuan Yang , Jianghui Wang , Ji Liu , Ziqiong Liu , Dong Li , Yuejian Fang , Junhai Yong , Bin Wang , Emad Barsoum

Diffusion-Based Data Augmentation for Medical Image Segmentation

Medical image segmentation models struggle with rare abnormalities due to scarce annotated pathological data. We propose DiffAug a novel framework that combines textguided diffusion-based generation with automatic segmentation validation to…

Computer Vision and Pattern Recognition · Computer Science 2025-08-26 Maham Nazir , Muhammad Aqeel , Francesco Setti

See and Fix the Flaws: Enabling VLMs and Diffusion Models to Comprehend Visual Artifacts via Agentic Data Synthesis

Despite recent advances in diffusion models, AI generated images still often contain visual artifacts that compromise realism. Although more thorough pre-training and bigger models might reduce artifacts, there is no assurance that they can…

Computer Vision and Pattern Recognition · Computer Science 2026-03-27 Jaehyun Park , Minyoung Ahn , Minkyu Kim , Jonghyun Lee , Jae-Gil Lee , Dongmin Park

Towards Small Object Editing: A Benchmark Dataset and A Training-Free Approach

A plethora of text-guided image editing methods has recently been developed by leveraging the impressive capabilities of large-scale diffusion-based generative models especially Stable Diffusion. Despite the success of diffusion models in…

Computer Vision and Pattern Recognition · Computer Science 2024-11-05 Qihe Pan , Zhen Zhao , Zicheng Wang , Sifan Long , Yiming Wu , Wei Ji , Haoran Liang , Ronghua Liang

AttDiff-GAN: A Hybrid Diffusion-GAN Framework for Facial Attribute Editing

Facial attribute editing aims to modify target attributes while preserving attribute-irrelevant content and overall image fidelity. Existing GAN-based methods provide favorable controllability, but often suffer from weak alignment between…

Computer Vision and Pattern Recognition · Computer Science 2026-04-24 Wenmin Huang , Weiqi Luo , Xiaochun Cao , Jiwu Huang

ArtiFree: Detecting and Reducing Generative Artifacts in Diffusion-based Speech Enhancement

Diffusion-based speech enhancement (SE) achieves natural-sounding speech and strong generalization, yet suffers from key limitations like generative artifacts and high inference latency. In this work, we systematically study artifact…

Sound · Computer Science 2025-09-25 Bhawana Chhaglani , Yang Gao , Julius Richter , Xilin Li , Syavosh Zadissa , Tarun Pruthi , Andrew Lovitt

Agentic Retoucher for Text-To-Image Generation

Text-to-image (T2I) diffusion models such as SDXL and FLUX have achieved impressive photorealism, yet small-scale distortions remain pervasive in limbs, face, text and so on. Existing refinement approaches either perform costly iterative…

Computer Vision and Pattern Recognition · Computer Science 2026-03-17 Shaocheng Shen , Jianfeng Liang , Chunlei Cai , Cong Geng , Huiyu Duan , Xiaoyun Zhang , Qiang Hu , Guangtao Zhai

Improving Diffusion-Based Image Synthesis with Context Prediction

Diffusion models are a new class of generative models, and have dramatically promoted image generation with unprecedented quality and diversity. Existing diffusion models mainly try to reconstruct input image from a corrupted one with a…

Computer Vision and Pattern Recognition · Computer Science 2024-06-05 Ling Yang , Jingwei Liu , Shenda Hong , Zhilong Zhang , Zhilin Huang , Zheming Cai , Wentao Zhang , Bin Cui

ReGeneration Learning of Diffusion Models with Rich Prompts for Zero-Shot Image Translation

Large-scale text-to-image models have demonstrated amazing ability to synthesize diverse and high-fidelity images. However, these models are often violated by several limitations. Firstly, they require the user to provide precise and…

Computer Vision and Pattern Recognition · Computer Science 2023-05-09 Yupei Lin , Sen Zhang , Xiaojun Yang , Xiao Wang , Yukai Shi

When StyleGAN Meets Stable Diffusion: a $\mathscr{W}_+$ Adapter for Personalized Image Generation

Text-to-image diffusion models have remarkably excelled in producing diverse, high-quality, and photo-realistic images. This advancement has spurred a growing interest in incorporating specific identities into generated content. Most…

Computer Vision and Pattern Recognition · Computer Science 2023-11-30 Xiaoming Li , Xinyu Hou , Chen Change Loy