Related papers: Refine-by-Align: Reference-Guided Artifacts Refine…

Refining Visual Artifacts in Diffusion Models via Explainable AI-based Flaw Activation Maps

Diffusion models have achieved remarkable success in image synthesis. However, addressing artifacts and unrealistic regions remains a critical challenge. We propose self-refining diffusion, a novel framework that enhances image generation…

Computer Vision and Pattern Recognition · Computer Science 2025-12-10 Seoyeon Lee , Gwangyeol Yu , Chaewon Kim , Jonghyuk Park

RAGAR: Retrieval Augmented Personalized Image Generation Guided by Recommendation

Personalized image generation is crucial for improving the user experience, as it renders reference images into preferred ones according to user visual preferences. Although effective, existing methods face two main issues. First, existing…

Information Retrieval · Computer Science 2025-08-14 Run Ling , Wenji Wang , Yuting Liu , Guibing Guo , Haowei Liu , Jian Lu , Quanwei Zhang , Yexing Xu , Shuo Lu , Yun Wang , Yihua Shao , Zhanjie Zhang , Ao Ma , Linying Jiang , Xingwei Wang

OmniRefiner: Reinforcement-Guided Local Diffusion Refinement

Reference-guided image generation has progressed rapidly, yet current diffusion models still struggle to preserve fine-grained visual details when refining a generated image using a reference. This limitation arises because VAE-based latent…

Computer Vision and Pattern Recognition · Computer Science 2025-11-26 Yaoli Liu , Ziheng Ouyang , Shengtao Lou , Yiren Song

AlignGen: Boosting Personalized Image Generation with Cross-Modality Prior Alignment

Personalized image generation aims to integrate user-provided concepts into text-to-image models, enabling the generation of customized content based on a given prompt. Recent zero-shot approaches, particularly those leveraging diffusion…

Computer Vision and Pattern Recognition · Computer Science 2025-05-29 Yiheng Lin , Shifang Zhao , Ting Liu , Xiaochao Qu , Luoqi Liu , Yao Zhao , Yunchao Wei

Facial Reenactment Through a Personalized Generator

In recent years, the role of image generative models in facial reenactment has been steadily increasing. Such models are usually subject-agnostic and trained on domain-wide datasets. The appearance of the reenacted individual is learned…

Computer Vision and Pattern Recognition · Computer Science 2023-07-13 Ariel Elazary , Yotam Nitzan , Daniel Cohen-Or

The Consistency Critic: Correcting Inconsistencies in Generated Images via Reference-Guided Attentive Alignment

Previous works have explored various customized generation tasks given a reference image, but they still face limitations in generating consistent fine-grained details. In this paper, our aim is to solve the inconsistency problem of…

Computer Vision and Pattern Recognition · Computer Science 2025-11-26 Ziheng Ouyang , Yiren Song , Yaoli Liu , Shihao Zhu , Qibin Hou , Ming-Ming Cheng , Mike Zheng Shou

Beyond Fine-Tuning: A Systematic Study of Sampling Techniques in Personalized Image Generation

Personalized text-to-image generation aims to create images tailored to user-defined concepts and textual descriptions. Balancing the fidelity of the learned concept with its ability for generation in various contexts presents a significant…

Computer Vision and Pattern Recognition · Computer Science 2025-02-11 Vera Soboleva , Maksim Nakhodnov , Aibek Alanov

ReAlign: Optimizing the Visual Document Retriever with Reasoning-Guided Fine-Grained Alignment

Visual document retrieval aims to retrieve a set of document pages relevant to a query from visually rich collections. Existing methods often employ Vision-Language Models (VLMs) to encode queries and visual pages into a shared embedding…

Information Retrieval · Computer Science 2026-04-10 Hao Yang , Yifan Ji , Zhipeng Xu , Zhenghao Liu , Yukun Yan , Zulong Chen , Shuo Wang , Yu Gu , Ge Yu

ColorFlow: Retrieval-Augmented Image Sequence Colorization

Automatic black-and-white image sequence colorization while preserving character and object identity (ID) is a complex task with significant market demand, such as in cartoon or comic series colorization. Despite advancements in visual…

Computer Vision and Pattern Recognition · Computer Science 2025-05-06 Junhao Zhuang , Xuan Ju , Zhaoyang Zhang , Yong Liu , Shiyi Zhang , Chun Yuan , Ying Shan

MyStyle: A Personalized Generative Prior

We introduce MyStyle, a personalized deep generative prior trained with a few shots of an individual. MyStyle allows to reconstruct, enhance and edit images of a specific person, such that the output is faithful to the person's key facial…

Computer Vision and Pattern Recognition · Computer Science 2022-10-07 Yotam Nitzan , Kfir Aberman , Qiurui He , Orly Liba , Michal Yarom , Yossi Gandelsman , Inbar Mosseri , Yael Pritch , Daniel Cohen-or

RefAlign: Representation Alignment for Reference-to-Video Generation

Reference-to-video (R2V) generation is a controllable video synthesis paradigm that constrains the generation process using both text prompts and reference images, enabling applications such as personalized advertising and virtual try-on.…

Computer Vision and Pattern Recognition · Computer Science 2026-03-27 Lei Wang , YuXin Song , Ge Wu , Haocheng Feng , Hang Zhou , Jingdong Wang , Yaxing Wang , jian Yang

StyleGallery: Training-free and Semantic-aware Personalized Style Transfer from Arbitrary Image References

Despite the advancements in diffusion-based image style transfer, existing methods are commonly limited by 1) semantic gap: the style reference could miss proper content semantics, causing uncontrollable stylization; 2) reliance on extra…

Computer Vision and Pattern Recognition · Computer Science 2026-03-13 Boyu He , Yunfan Ye , Chang Liu , Weishang Wu , Fang Liu , Zhiping Cai

Fine-grained Semantic Constraint in Image Synthesis

In this paper, we propose a multi-stage and high-resolution model for image synthesis that uses fine-grained attributes and masks as input. With a fine-grained attribute, the proposed model can detailedly constrain the features of the…

Computer Vision and Pattern Recognition · Computer Science 2021-01-13 Pengyang Li , Donghui Wang

AlignIT: Enhancing Prompt Alignment in Customization of Text-to-Image Models

We consider the problem of customizing text-to-image diffusion models with user-supplied reference images. Given new prompts, the existing methods can capture the key concept from the reference images but fail to align the generated image…

Computer Vision and Pattern Recognition · Computer Science 2024-07-01 Aishwarya Agarwal , Srikrishna Karanam , Balaji Vasan Srinivasan

RefineAnything: Multimodal Region-Specific Refinement for Perfect Local Details

We introduce region-specific image refinement as a dedicated problem setting: given an input image and a user-specified region (e.g., a scribble mask or a bounding box), the goal is to restore fine-grained details while keeping all…

Computer Vision and Pattern Recognition · Computer Science 2026-04-09 Dewei Zhou , You Li , Zongxin Yang , Yi Yang

Auto-regressive transformation for image alignment

Existing methods for image alignment struggle in cases involving feature-sparse regions, extreme scale and field-of-view differences, and large deformations, often resulting in suboptimal accuracy. Robustness to these challenges can be…

Computer Vision and Pattern Recognition · Computer Science 2026-04-14 Kanggeon Lee , Soochahn Lee , Kyoung Mu Lee

Personalized Restoration via Dual-Pivot Tuning

Generative diffusion models can serve as a prior which ensures that solutions of image restoration systems adhere to the manifold of natural images. However, for restoring facial images, a personalized prior is necessary to accurately…

Computer Vision and Pattern Recognition · Computer Science 2023-12-29 Pradyumna Chari , Sizhuo Ma , Daniil Ostashev , Achuta Kadambi , Gurunandan Krishnan , Jian Wang , Kfir Aberman

From Denoising to Refining: A Corrective Framework for Vision-Language Diffusion Model

Discrete diffusion models have emerged as a promising direction for vision-language tasks, offering bidirectional context modeling and theoretical parallelization. However, their practical application is severely hindered by a…

Computation and Language · Computer Science 2025-10-24 Yatai Ji , Teng Wang , Yuying Ge , Zhiheng Liu , Sidi Yang , Ying Shan , Ping Luo

Text-to-Image Alignment in Denoising-Based Models through Step Selection

Visual generative AI models often encounter challenges related to text-image alignment and reasoning limitations. This paper presents a novel method for selectively enhancing the signal at critical denoising steps, optimizing image…

Computer Vision and Pattern Recognition · Computer Science 2025-04-25 Paul Grimal , Hervé Le Borgne , Olivier Ferret

An efficient semi-supervised quality control system trained using physics-based MRI-artefact generators and adversarial training

Large medical imaging data sets are becoming increasingly available, but ensuring sample quality without significant artefacts is challenging. Existing methods for identifying imperfections in medical imaging rely on data-intensive…

Image and Video Processing · Electrical Eng. & Systems 2023-11-15 Daniele Ravi , Frederik Barkhof , Daniel C. Alexander , Lemuel Puglisi , Geoffrey JM Parker , Arman Eshaghi