Related papers: Exposing Text-Image Inconsistency Using Diffusion …

DiffusionPID: Interpreting Diffusion via Partial Information Decomposition

Text-to-image diffusion models have made significant progress in generating naturalistic images from textual inputs, and demonstrate the capacity to learn and represent complex visual-semantic relationships. While these diffusion models…

Computer Vision and Pattern Recognition · Computer Science 2024-11-18 Rushikesh Zawar , Shaurya Dewan , Prakanshul Saxena , Yingshan Chang , Andrew Luo , Yonatan Bisk

Finetuning Text-to-Image Diffusion Models for Fairness

The rapid adoption of text-to-image diffusion models in society underscores an urgent need to address their biases. Without interventions, these biases could propagate a skewed worldview and restrict opportunities for minority groups. In…

Machine Learning · Computer Science 2024-03-18 Xudong Shen , Chao Du , Tianyu Pang , Min Lin , Yongkang Wong , Mohan Kankanhalli

MIST: Mitigating Intersectional Bias with Disentangled Cross-Attention Editing in Text-to-Image Diffusion Models

Diffusion-based text-to-image models have rapidly gained popularity for their ability to generate detailed and realistic images from textual descriptions. However, these models often reflect the biases present in their training data,…

Computer Vision and Pattern Recognition · Computer Science 2024-04-01 Hidir Yesiltepe , Kiymet Akdemir , Pinar Yanardag

Membership Inference on Text-to-Image Diffusion Models via Conditional Likelihood Discrepancy

Text-to-image diffusion models have achieved tremendous success in the field of controllable image generation, while also coming along with issues of privacy leakage and data copyrights. Membership inference arises in these contexts as a…

Cryptography and Security · Computer Science 2024-10-29 Shengfang Zhai , Huanran Chen , Yinpeng Dong , Jiajun Li , Qingni Shen , Yansong Gao , Hang Su , Yang Liu

Image-Perfect Imperfections: Safety, Bias, and Authenticity in the Shadow of Text-To-Image Model Evolution

Text-to-image models, such as Stable Diffusion (SD), undergo iterative updates to improve image quality and address concerns such as safety. Improvements in image quality are straightforward to assess. However, how model updates resolve…

Cryptography and Security · Computer Science 2024-09-02 Yixin Wu , Yun Shen , Michael Backes , Yang Zhang

Harnessing the Spatial-Temporal Attention of Diffusion Models for High-Fidelity Text-to-Image Synthesis

Diffusion-based models have achieved state-of-the-art performance on text-to-image synthesis tasks. However, one critical limitation of these models is the low fidelity of generated images with respect to the text description, such as…

Computer Vision and Pattern Recognition · Computer Science 2023-04-11 Qiucheng Wu , Yujian Liu , Handong Zhao , Trung Bui , Zhe Lin , Yang Zhang , Shiyu Chang

Debiasing Text-to-Image Diffusion Models

Learning-based Text-to-Image (TTI) models like Stable Diffusion have revolutionized the way visual content is generated in various domains. However, recent research has shown that nonnegligible social bias exists in current state-of-the-art…

Computer Vision and Pattern Recognition · Computer Science 2024-02-23 Ruifei He , Chuhui Xue , Haoru Tan , Wenqing Zhang , Yingchen Yu , Song Bai , Xiaojuan Qi

Eliminating Hallucination in Diffusion-Augmented Interactive Text-to-Image Retrieval

Diffusion-Augmented Interactive Text-to-Image Retrieval (DAI-TIR) is a promising paradigm that improves retrieval performance by generating query images via diffusion models and using them as additional ``views'' of the user's intent.…

Information Retrieval · Computer Science 2026-01-29 Zhuocheng Zhang , Kangheng Liang , Guanxuan Li , Paul Henderson , Richard Mccreadie , Zijun Long

DiffUTE: Universal Text Editing Diffusion Model

Diffusion model based language-guided image editing has achieved great success recently. However, existing state-of-the-art diffusion models struggle with rendering correct text and text style during generation. To tackle this problem, we…

Computer Vision and Pattern Recognition · Computer Science 2023-10-19 Haoxing Chen , Zhuoer Xu , Zhangxuan Gu , Jun Lan , Xing Zheng , Yaohui Li , Changhua Meng , Huijia Zhu , Weiqiang Wang

EditVal: Benchmarking Diffusion Based Text-Guided Image Editing Methods

A plethora of text-guided image editing methods have recently been developed by leveraging the impressive capabilities of large-scale diffusion-based generative models such as Imagen and Stable Diffusion. A standardized evaluation protocol,…

Computer Vision and Pattern Recognition · Computer Science 2023-10-05 Samyadeep Basu , Mehrdad Saberi , Shweta Bhardwaj , Atoosa Malemir Chegini , Daniela Massiceti , Maziar Sanjabi , Shell Xu Hu , Soheil Feizi

Information Theoretic Text-to-Image Alignment

Diffusion models for Text-to-Image (T2I) conditional generation have recently achieved tremendous success. Yet, aligning these models with user's intentions still involves a laborious trial-and-error process, and this challenging alignment…

Machine Learning · Computer Science 2025-02-12 Chao Wang , Giulio Franzese , Alessandro Finamore , Massimo Gallo , Pietro Michiardi

Test-time Conditional Text-to-Image Synthesis Using Diffusion Models

We consider the problem of conditional text-to-image synthesis with diffusion models. Most recent works need to either finetune specific parts of the base diffusion model or introduce new trainable parameters, leading to deployment…

Computer Vision and Pattern Recognition · Computer Science 2024-11-19 Tripti Shukla , Srikrishna Karanam , Balaji Vasan Srinivasan

TEXTS-Diff: TEXTS-Aware Diffusion Model for Real-World Text Image Super-Resolution

Real-world text image super-resolution aims to restore overall visual quality and text legibility in images suffering from diverse degradations and text distortions. However, the scarcity of text image data in existing datasets results in…

Computer Vision and Pattern Recognition · Computer Science 2026-01-27 Haodong He , Xin Zhan , Yancheng Bai , Rui Lan , Lei Sun , Xiangxiang Chu

CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching

Diffusion models have demonstrated great success in the field of text-to-image generation. However, alleviating the misalignment between the text prompts and images is still challenging. The root reason behind the misalignment has not been…

Computer Vision and Pattern Recognition · Computer Science 2024-11-28 Dongzhi Jiang , Guanglu Song , Xiaoshi Wu , Renrui Zhang , Dazhong Shen , Zhuofan Zong , Yu Liu , Hongsheng Li

Origin Identification for Text-Guided Image-to-Image Diffusion Models

Text-guided image-to-image diffusion models excel in translating images based on textual prompts, allowing for precise and creative visual modifications. However, such a powerful technique can be misused for spreading misinformation,…

Computer Vision and Pattern Recognition · Computer Science 2025-05-20 Wenhao Wang , Yifan Sun , Zongxin Yang , Zhentao Tan , Zhengdong Hu , Yi Yang

Multimodal Misinformation Detection Using Early Fusion of Linguistic, Visual, and Social Features

Amid a tidal wave of misinformation flooding social media during elections and crises, extensive research has been conducted on misinformation detection, primarily focusing on text-based or image-based approaches. However, only a few…

Machine Learning · Computer Science 2025-07-04 Gautam Kishore Shahi

Diverse Diffusion: Enhancing Image Diversity in Text-to-Image Generation

Latent diffusion models excel at producing high-quality images from text. Yet, concerns appear about the lack of diversity in the generated imagery. To tackle this, we introduce Diverse Diffusion, a method for boosting image diversity…

Computer Vision and Pattern Recognition · Computer Science 2023-10-20 Mariia Zameshina , Olivier Teytaud , Laurent Najman

SimInversion: A Simple Framework for Inversion-Based Text-to-Image Editing

Diffusion models demonstrate impressive image generation performance with text guidance. Inspired by the learning process of diffusion, existing images can be edited according to text by DDIM inversion. However, the vanilla DDIM inversion…

Computer Vision and Pattern Recognition · Computer Science 2024-09-17 Qi Qian , Haiyang Xu , Ming Yan , Juhua Hu

ARTIST: Improving the Generation of Text-rich Images with Disentangled Diffusion Models and Large Language Models

Diffusion models have demonstrated exceptional capabilities in generating a broad spectrum of visual content, yet their proficiency in rendering text is still limited: they often generate inaccurate characters or words that fail to blend…

Computer Vision and Pattern Recognition · Computer Science 2024-12-03 Jianyi Zhang , Yufan Zhou , Jiuxiang Gu , Curtis Wigington , Tong Yu , Yiran Chen , Tong Sun , Ruiyi Zhang

Diff-Tracker: Text-to-Image Diffusion Models are Unsupervised Trackers

We introduce Diff-Tracker, a novel approach for the challenging unsupervised visual tracking task leveraging the pre-trained text-to-image diffusion model. Our main idea is to leverage the rich knowledge encapsulated within the pre-trained…

Computer Vision and Pattern Recognition · Computer Science 2024-07-17 Zhengbo Zhang , Li Xu , Duo Peng , Hossein Rahmani , Jun Liu