English
Related papers

Related papers: SIDiffAgent: Self-Improving Diffusion Agent

200 papers

In the accelerating era of human-instructed visual content creation, diffusion models have demonstrated remarkable generative potential. Yet their deployment is constrained by a dual bottleneck: semantic ambiguity in diverse prompts and the…

Computer Vision and Pattern Recognition · Computer Science 2026-01-21 Jie Qin , Jie Wu , Weifeng Chen , Yueming Lyu

In partially observable multi-agent systems, agents typically only have access to local observations. This severely hinders their ability to make precise decisions, particularly during decentralized execution. To alleviate this problem and…

Multiagent Systems · Computer Science 2024-08-20 Zhiwei Xu , Hangyu Mao , Nianmin Zhang , Xin Xin , Pengjie Ren , Dapeng Li , Bin Zhang , Guoliang Fan , Zhumin Chen , Changwei Wang , Jiangjin Yin

With the continuous development of generative AI's logical reasoning abilities, AI's growing code-generation potential poses challenges for both technical and creative professionals. But how can these advances be directed toward empowering…

Human-Computer Interaction · Computer Science 2025-08-12 Zijian Ding , Qinshi Zhang , Mohan Chi , Ziyi Wang

Text-to-image (T2I) generative models have attracted significant attention and found extensive applications within and beyond academic research. For example, the Civitai community, a platform for T2I innovation, currently hosts an…

Computation and Language · Computer Science 2024-04-03 Lirui Zhao , Yue Yang , Kaipeng Zhang , Wenqi Shao , Yuxin Zhang , Yu Qiao , Ping Luo , Rongrong Ji

Text-to-image generation for personalized identities aims at incorporating the specific identity into images using a text prompt and an identity image. Based on the powerful generative capabilities of DDPMs, many previous works adopt…

Computer Vision and Pattern Recognition · Computer Science 2025-05-13 Jinyu Gu , Haipeng Liu , Meng Wang , Yang Wang

Large-scale diffusion-based generative models have led to breakthroughs in text-conditioned high-resolution image synthesis. Starting from random noise, such text-to-image diffusion models gradually synthesize images in an iterative fashion…

Computer Vision and Pattern Recognition · Computer Science 2023-03-15 Yogesh Balaji , Seungjun Nah , Xun Huang , Arash Vahdat , Jiaming Song , Qinsheng Zhang , Karsten Kreis , Miika Aittala , Timo Aila , Samuli Laine , Bryan Catanzaro , Tero Karras , Ming-Yu Liu

Diffusion-based models have recently revolutionized image generation, achieving unprecedented levels of fidelity. However, consistent generation of high-quality images remains challenging partly due to the lack of conditioning mechanisms…

Computer Vision and Pattern Recognition · Computer Science 2025-03-18 Khaled Abud , Sergey Lavrushkin , Alexey Kirillov , Dmitriy Vatolin

We introduce GenAgent, unifying visual understanding and generation through an agentic multimodal model. Unlike unified models that face expensive training costs and understanding-generation trade-offs, GenAgent decouples these capabilities…

Computer Vision and Pattern Recognition · Computer Science 2026-01-29 Kaixun Jiang , Yuzheng Wang , Junjie Zhou , Pandeng Li , Zhihang Liu , Chen-Wei Xie , Zhaoyu Chen , Yun Zheng , Wenqiang Zhang

The rapid growth of the text-to-image (T2I) community has fostered a thriving online ecosystem of expert models, which are variants of pretrained diffusion models specialized for diverse generative abilities. Yet, existing model merging…

Artificial Intelligence · Computer Science 2026-03-24 Zhuoling Li , Hossein Rahmani , Jiarui Zhang , Yu Xue , Majid Mirmehdi , Jason Kuen , Jiuxiang Gu , Jun Liu

Large-scale Text-to-Image (T2I) diffusion models have revolutionized image generation over the last few years. Although owning diverse and high-quality generation capabilities, translating these abilities to fine-grained image editing…

Computer Vision and Pattern Recognition · Computer Science 2024-02-06 Chong Mou , Xintao Wang , Jiechong Song , Ying Shan , Jian Zhang

Diffusion models have achieved remarkable success in image and video generation. However, their inherently multiple step inference process imposes substantial computational overhead, hindering real-world deployment. Accelerating diffusion…

Computer Vision and Pattern Recognition · Computer Science 2026-01-07 Jiajun jiao , Haowei Zhu , Puyuan Yang , Jianghui Wang , Ji Liu , Ziqiong Liu , Dong Li , Yuejian Fang , Junhai Yong , Bin Wang , Emad Barsoum

Medical image segmentation models struggle with rare abnormalities due to scarce annotated pathological data. We propose DiffAug a novel framework that combines textguided diffusion-based generation with automatic segmentation validation to…

Computer Vision and Pattern Recognition · Computer Science 2025-08-26 Maham Nazir , Muhammad Aqeel , Francesco Setti

Despite recent advances in diffusion models, AI generated images still often contain visual artifacts that compromise realism. Although more thorough pre-training and bigger models might reduce artifacts, there is no assurance that they can…

Computer Vision and Pattern Recognition · Computer Science 2026-03-27 Jaehyun Park , Minyoung Ahn , Minkyu Kim , Jonghyun Lee , Jae-Gil Lee , Dongmin Park

A plethora of text-guided image editing methods has recently been developed by leveraging the impressive capabilities of large-scale diffusion-based generative models especially Stable Diffusion. Despite the success of diffusion models in…

Computer Vision and Pattern Recognition · Computer Science 2024-11-05 Qihe Pan , Zhen Zhao , Zicheng Wang , Sifan Long , Yiming Wu , Wei Ji , Haoran Liang , Ronghua Liang

Facial attribute editing aims to modify target attributes while preserving attribute-irrelevant content and overall image fidelity. Existing GAN-based methods provide favorable controllability, but often suffer from weak alignment between…

Computer Vision and Pattern Recognition · Computer Science 2026-04-24 Wenmin Huang , Weiqi Luo , Xiaochun Cao , Jiwu Huang

Diffusion-based speech enhancement (SE) achieves natural-sounding speech and strong generalization, yet suffers from key limitations like generative artifacts and high inference latency. In this work, we systematically study artifact…

Text-to-image (T2I) diffusion models such as SDXL and FLUX have achieved impressive photorealism, yet small-scale distortions remain pervasive in limbs, face, text and so on. Existing refinement approaches either perform costly iterative…

Computer Vision and Pattern Recognition · Computer Science 2026-03-17 Shaocheng Shen , Jianfeng Liang , Chunlei Cai , Cong Geng , Huiyu Duan , Xiaoyun Zhang , Qiang Hu , Guangtao Zhai

Diffusion models are a new class of generative models, and have dramatically promoted image generation with unprecedented quality and diversity. Existing diffusion models mainly try to reconstruct input image from a corrupted one with a…

Computer Vision and Pattern Recognition · Computer Science 2024-06-05 Ling Yang , Jingwei Liu , Shenda Hong , Zhilong Zhang , Zhilin Huang , Zheming Cai , Wentao Zhang , Bin Cui

Large-scale text-to-image models have demonstrated amazing ability to synthesize diverse and high-fidelity images. However, these models are often violated by several limitations. Firstly, they require the user to provide precise and…

Computer Vision and Pattern Recognition · Computer Science 2023-05-09 Yupei Lin , Sen Zhang , Xiaojun Yang , Xiao Wang , Yukai Shi

Text-to-image diffusion models have remarkably excelled in producing diverse, high-quality, and photo-realistic images. This advancement has spurred a growing interest in incorporating specific identities into generated content. Most…

Computer Vision and Pattern Recognition · Computer Science 2023-11-30 Xiaoming Li , Xinyu Hou , Chen Change Loy
‹ Prev 1 2 3 10 Next ›