Related papers: DiffusionAgent: Navigating Expert Models for Agent…

LayoutAgent: A Vision-Language Agent Guided Compositional Diffusion for Spatial Layout Planning

Designing realistic multi-object scenes requires not only generating images, but also planning spatial layouts that respect semantic relations and physical plausibility. On one hand, while recent advances in diffusion models have enabled…

Computer Vision and Pattern Recognition · Computer Science 2025-09-30 Zezhong Fan , Xiaohan Li , Luyi Ma , Kai Zhao , Liang Peng , Topojoy Biswas , Evren Korpeoglu , Kaushiki Nag , Kannan Achan

GenAgent: Scaling Text-to-Image Generation via Agentic Multimodal Reasoning

We introduce GenAgent, unifying visual understanding and generation through an agentic multimodal model. Unlike unified models that face expensive training costs and understanding-generation trade-offs, GenAgent decouples these capabilities…

Computer Vision and Pattern Recognition · Computer Science 2026-01-29 Kaixun Jiang , Yuzheng Wang , Junjie Zhou , Pandeng Li , Zhihang Liu , Chen-Wei Xie , Zhaoyu Chen , Yun Zheng , Wenqiang Zhang

SIDiffAgent: Self-Improving Diffusion Agent

Text-to-image diffusion models have revolutionized generative AI, enabling high-quality and photorealistic image synthesis. However, their practical deployment remains hindered by several limitations: sensitivity to prompt phrasing,…

Artificial Intelligence · Computer Science 2026-02-03 Shivank Garg , Ayush Singh , Gaurav Kumar Nayak

DiffBench Meets DiffAgent: End-to-End LLM-Driven Diffusion Acceleration Code Generation

Diffusion models have achieved remarkable success in image and video generation. However, their inherently multiple step inference process imposes substantial computational overhead, hindering real-world deployment. Accelerating diffusion…

Computer Vision and Pattern Recognition · Computer Science 2026-01-07 Jiajun jiao , Haowei Zhu , Puyuan Yang , Jianghui Wang , Ji Liu , Ziqiong Liu , Dong Li , Yuejian Fang , Junhai Yong , Bin Wang , Emad Barsoum

Evaluation Agent: Efficient and Promptable Evaluation Framework for Visual Generative Models

Recent advancements in visual generative models have enabled high-quality image and video generation, opening diverse applications. However, evaluating these models often demands sampling hundreds or thousands of images or videos, making…

Computer Vision and Pattern Recognition · Computer Science 2025-08-22 Fan Zhang , Shulin Tian , Ziqi Huang , Yu Qiao , Ziwei Liu

Generative AI Meets 6G and Beyond: Diffusion Models for Semantic Communications

Semantic communications mark a paradigm shift from bit-accurate transmission toward meaning-centric communication, essential as wireless systems approach theoretical capacity limits. The emergence of generative AI has catalyzed generative…

Signal Processing · Electrical Eng. & Systems 2026-05-08 Hai-Long Qin , Jincheng Dai , Guo Lu , Shuo Shao , Sixian Wang , Tongda Xu , Wenjun Zhang , Ping Zhang , Khaled B. Letaief

GANTASTIC: GAN-based Transfer of Interpretable Directions for Disentangled Image Editing in Text-to-Image Diffusion Models

The rapid advancement in image generation models has predominantly been driven by diffusion models, which have demonstrated unparalleled success in generating high-fidelity, diverse images from textual prompts. Despite their success,…

Computer Vision and Pattern Recognition · Computer Science 2024-03-29 Yusuf Dalva , Hidir Yesiltepe , Pinar Yanardag

Learning Graph Representation of Agent Diffusers

Diffusion-based generative models have significantly advanced text-to-image synthesis, demonstrating impressive text comprehension and zero-shot generalization. These models refine images from random noise based on textual prompts, with…

Machine Learning · Computer Science 2025-05-16 Youcef Djenouri , Nassim Belmecheri , Tomasz Michalak , Jan Dubiński , Ahmed Nabil Belbachir , Anis Yazidi

Emotion-Director: Bridging Affective Shortcut in Emotion-Oriented Image Generation

Image generation based on diffusion models has demonstrated impressive capability, motivating exploration into diverse and specialized applications. Owing to the importance of emotion in advertising, emotion-oriented image generation has…

Computer Vision and Pattern Recognition · Computer Science 2025-12-23 Guoli Jia , Junyao Hu , Xinwei Long , Kai Tian , Kaiyan Zhang , KaiKai Zhao , Ning Ding , Bowen Zhou

DiffGraph: An Automated Agent-driven Model Merging Framework for In-the-Wild Text-to-Image Generation

The rapid growth of the text-to-image (T2I) community has fostered a thriving online ecosystem of expert models, which are variants of pretrained diffusion models specialized for diverse generative abilities. Yet, existing model merging…

Artificial Intelligence · Computer Science 2026-03-24 Zhuoling Li , Hossein Rahmani , Jiarui Zhang , Yu Xue , Majid Mirmehdi , Jason Kuen , Jiuxiang Gu , Jun Liu

PromptAgent: Strategic Planning with Language Models Enables Expert-level Prompt Optimization

Highly effective, task-specific prompts are often heavily engineered by experts to integrate detailed instructions and domain insights based on a deep understanding of both instincts of large language models (LLMs) and the intricacies of…

Computation and Language · Computer Science 2023-12-08 Xinyuan Wang , Chenxi Li , Zhen Wang , Fan Bai , Haotian Luo , Jiayou Zhang , Nebojsa Jojic , Eric P. Xing , Zhiting Hu

In-Context Learning Unlocked for Diffusion Models

We present Prompt Diffusion, a framework for enabling in-context learning in diffusion-based generative models. Given a pair of task-specific example images, such as depth from/to image and scribble from/to image, and a text guidance, our…

Computer Vision and Pattern Recognition · Computer Science 2023-10-20 Zhendong Wang , Yifan Jiang , Yadong Lu , Yelong Shen , Pengcheng He , Weizhu Chen , Zhangyang Wang , Mingyuan Zhou

DreamTalk: When Emotional Talking Head Generation Meets Diffusion Probabilistic Models

Emotional talking head generation has attracted growing attention. Previous methods, which are mainly GAN-based, still struggle to consistently produce satisfactory results across diverse emotions and cannot conveniently specify…

Computer Vision and Pattern Recognition · Computer Science 2024-08-13 Yifeng Ma , Shiwei Zhang , Jiayu Wang , Xiang Wang , Yingya Zhang , Zhidong Deng

See and Fix the Flaws: Enabling VLMs and Diffusion Models to Comprehend Visual Artifacts via Agentic Data Synthesis

Despite recent advances in diffusion models, AI generated images still often contain visual artifacts that compromise realism. Although more thorough pre-training and bigger models might reduce artifacts, there is no assurance that they can…

Computer Vision and Pattern Recognition · Computer Science 2026-03-27 Jaehyun Park , Minyoung Ahn , Minkyu Kim , Jonghyun Lee , Jae-Gil Lee , Dongmin Park

Prompting Diffusion Representations for Cross-Domain Semantic Segmentation

While originally designed for image generation, diffusion models have recently shown to provide excellent pretrained feature representations for semantic segmentation. Intrigued by this result, we set out to explore how well…

Computer Vision and Pattern Recognition · Computer Science 2023-07-06 Rui Gong , Martin Danelljan , Han Sun , Julio Delgado Mangas , Luc Van Gool

ARTIST: Improving the Generation of Text-rich Images with Disentangled Diffusion Models and Large Language Models

Diffusion models have demonstrated exceptional capabilities in generating a broad spectrum of visual content, yet their proficiency in rendering text is still limited: they often generate inaccurate characters or words that fail to blend…

Computer Vision and Pattern Recognition · Computer Science 2024-12-03 Jianyi Zhang , Yufan Zhou , Jiuxiang Gu , Curtis Wigington , Tong Yu , Yiran Chen , Tong Sun , Ruiyi Zhang

FusionAgent: A Multimodal Agent with Dynamic Model Selection for Human Recognition

Model fusion is a key strategy for robust recognition in unconstrained scenarios, as different models provide complementary strengths. This is especially important for whole-body human recognition, where biometric cues such as face, gait,…

Computer Vision and Pattern Recognition · Computer Science 2026-03-31 Jie Zhu , Xiao Guo , Yiyang Su , Anil Jain , Xiaoming Liu

Merging and Splitting Diffusion Paths for Semantically Coherent Panoramas

Diffusion models have become the State-of-the-Art for text-to-image generation, and increasing research effort has been dedicated to adapting the inference process of pretrained diffusion models to achieve zero-shot capabilities. An example…

Computer Vision and Pattern Recognition · Computer Science 2024-08-29 Fabio Quattrini , Vittorio Pippi , Silvia Cascianelli , Rita Cucchiara

Diff-Prompt: Diffusion-Driven Prompt Generator with Mask Supervision

Prompt learning has demonstrated promising results in fine-tuning pre-trained multimodal models. However, the performance improvement is limited when applied to more complex and fine-grained tasks. The reason is that most existing methods…

Computer Vision and Pattern Recognition · Computer Science 2025-05-01 Weicai Yan , Wang Lin , Zirun Guo , Ye Wang , Fangming Feng , Xiaoda Yang , Zehan Wang , Tao Jin

Boosting Generative Image Modeling via Joint Image-Feature Synthesis

Latent diffusion models (LDMs) dominate high-quality image generation, yet integrating representation learning with generative modeling remains a challenge. We introduce a novel generative image modeling framework that seamlessly bridges…

Computer Vision and Pattern Recognition · Computer Science 2026-01-23 Theodoros Kouzelis , Efstathios Karypidis , Ioannis Kakogeorgiou , Spyros Gidaris , Nikos Komodakis