Related papers: Learning Visual Generative Priors without Text

TF-TI2I: Training-Free Text-and-Image-to-Image Generation via Multi-Modal Implicit-Context Learning in Text-to-Image Models

Text-and-Image-To-Image (TI2I), an extension of Text-To-Image (T2I), integrates image inputs with textual instructions to enhance image generation. Existing methods often partially utilize image inputs, focusing on specific elements like…

Computer Vision and Pattern Recognition · Computer Science 2025-03-20 Teng-Fang Hsiao , Bo-Kai Ruan , Yi-Lun Wu , Tzu-Ling Lin , Hong-Han Shuai

Synthetic Perception: Can Generated Images Unlock Latent Visual Prior for Text-Centric Reasoning?

A significant ``modality gap" exists between the abundance of text-only data and the increasing power of multimodal models. This work systematically investigates whether images generated on-the-fly by Text-to-Image (T2I) models can serve as…

Multimedia · Computer Science 2026-03-04 Yuesheng Huang , Peng Zhang , Xiaoxin Wu , Riliang Liu , Jiaqi Liang

Unified Text-Image Generation with Weakness-Targeted Post-Training

Unified multimodal generation architectures that jointly produce text and images have recently emerged as a promising direction for text-to-image (T2I) synthesis. However, many existing systems rely on explicit modality switching,…

Computer Vision and Pattern Recognition · Computer Science 2026-01-22 Jiahui Chen , Philippe Hansen-Estruch , Xiaochuang Han , Yushi Hu , Emily Dinan , Amita Kamath , Michal Drozdzal , Reyhane Askari-Hemmat , Luke Zettlemoyer , Marjan Ghazvininejad

Text to Image Generation and Editing: A Survey

Text-to-image generation (T2I) refers to the text-guided generation of high-quality images. In the past few years, T2I has attracted widespread attention and numerous works have emerged. In this survey, we comprehensively review 141 works…

Computer Vision and Pattern Recognition · Computer Science 2025-05-06 Pengfei Yang , Ngai-Man Cheung , Xinda Ma

Visual Programming for Text-to-Image Generation and Evaluation

As large language models have demonstrated impressive performance in many domains, recent works have adopted language models (LMs) as controllers of visual modules for vision-and-language tasks. While existing work focuses on equipping LMs…

Computer Vision and Pattern Recognition · Computer Science 2023-10-30 Jaemin Cho , Abhay Zala , Mohit Bansal

Evaluating Text-to-Image Generative Models: An Empirical Study on Human Image Synthesis

In this paper, we present an empirical study introducing a nuanced evaluation framework for text-to-image (T2I) generative models, applied to human image synthesis. Our framework categorizes evaluations into two distinct groups: first,…

Computer Vision and Pattern Recognition · Computer Science 2024-10-29 Muxi Chen , Yi Liu , Jian Yi , Changran Xu , Qiuxia Lai , Hongliang Wang , Tsung-Yi Ho , Qiang Xu

ProxT2I: Efficient Reward-Guided Text-to-Image Generation via Proximal Diffusion

Diffusion models have emerged as a dominant paradigm for generative modeling across a wide range of domains, including prompt-conditional generation. The vast majority of samplers, however, rely on forward discretization of the reverse…

Computer Vision and Pattern Recognition · Computer Science 2025-12-01 Zhenghan Fang , Jian Zheng , Qiaozi Gao , Xiaofeng Gao , Jeremias Sulam

VersaT2I: Improving Text-to-Image Models with Versatile Reward

Recent text-to-image (T2I) models have benefited from large-scale and high-quality data, demonstrating impressive performance. However, these T2I models still struggle to produce images that are aesthetically pleasing, geometrically…

Computer Vision and Pattern Recognition · Computer Science 2024-03-28 Jianshu Guo , Wenhao Chai , Jie Deng , Hsiang-Wei Huang , Tian Ye , Yichen Xu , Jiawei Zhang , Jenq-Neng Hwang , Gaoang Wang

Discriminative Probing and Tuning for Text-to-Image Generation

Despite advancements in text-to-image generation (T2I), prior methods often face text-image misalignment problems such as relation confusion in generated images. Existing solutions involve cross-attention manipulation for better…

Computer Vision and Pattern Recognition · Computer Science 2024-03-15 Leigang Qu , Wenjie Wang , Yongqi Li , Hanwang Zhang , Liqiang Nie , Tat-Seng Chua

Text-to-Image Synthesis: A Decade Survey

When humans read a specific text, they often visualize the corresponding images, and we hope that computers can do the same. Text-to-image synthesis (T2I), which focuses on generating high-quality images from textual descriptions, has…

Computer Vision and Pattern Recognition · Computer Science 2024-11-26 Nonghai Zhang , Hao Tang

Make-A-Video: Text-to-Video Generation without Text-Video Data

We propose Make-A-Video -- an approach for directly translating the tremendous recent progress in Text-to-Image (T2I) generation to Text-to-Video (T2V). Our intuition is simple: learn what the world looks like and how it is described from…

Computer Vision and Pattern Recognition · Computer Science 2022-09-30 Uriel Singer , Adam Polyak , Thomas Hayes , Xi Yin , Jie An , Songyang Zhang , Qiyuan Hu , Harry Yang , Oron Ashual , Oran Gafni , Devi Parikh , Sonal Gupta , Yaniv Taigman

SVG-T2I: Scaling Up Text-to-Image Latent Diffusion Model Without Variational Autoencoder

Visual generation grounded in Visual Foundation Model (VFM) representations offers a highly promising unified pathway for integrating visual understanding, perception, and generation. Despite this potential, training large-scale…

Computer Vision and Pattern Recognition · Computer Science 2025-12-15 Minglei Shi , Haolin Wang , Borui Zhang , Wenzhao Zheng , Bohan Zeng , Ziyang Yuan , Xiaoshi Wu , Yuanxing Zhang , Huan Yang , Xintao Wang , Pengfei Wan , Kun Gai , Jie Zhou , Jiwen Lu

FairT2I: Mitigating Social Bias in Text-to-Image Generation via Large Language Model-Assisted Detection and Attribute Rebalancing

Text-to-image (T2I) models have advanced creative content generation, yet their reliance on large uncurated datasets often reproduces societal biases. We present FairT2I, a training-free and interactive framework grounded in a…

Computer Vision and Pattern Recognition · Computer Science 2026-01-08 Jinya Sakurai , Yuki Koyama , Issei Sato

I2T2I: Learning Text to Image Synthesis with Textual Data Augmentation

Translating information between text and image is a fundamental problem in artificial intelligence that connects natural language processing and computer vision. In the past few years, performance in image caption generation has seen…

Computer Vision and Pattern Recognition · Computer Science 2017-06-06 Hao Dong , Jingqing Zhang , Douglas McIlwraith , Yike Guo

Still-Moving: Customized Video Generation without Customized Video Data

Customizing text-to-image (T2I) models has seen tremendous progress recently, particularly in areas such as personalization, stylization, and conditional generation. However, expanding this progress to video generation is still in its…

Computer Vision and Pattern Recognition · Computer Science 2024-07-12 Hila Chefer , Shiran Zada , Roni Paiss , Ariel Ephrat , Omer Tov , Michael Rubinstein , Lior Wolf , Tali Dekel , Tomer Michaeli , Inbar Mosseri

A Framework For Image Synthesis Using Supervised Contrastive Learning

Text-to-image (T2I) generation aims at producing realistic images corresponding to text descriptions. Generative Adversarial Network (GAN) has proven to be successful in this task. Typical T2I GANs are 2 phase methods that first pretrain an…

Computer Vision and Pattern Recognition · Computer Science 2024-12-06 Yibin Liu , Jianyu Zhang , Li Zhang , Shijian Li , Gang Pan

Evaluating the Generation of Spatial Relations in Text and Image Generative Models

Understanding spatial relations is a crucial cognitive ability for both humans and AI. While current research has predominantly focused on the benchmarking of text-to-image (T2I) models, we propose a more comprehensive evaluation that…

Computer Vision and Pattern Recognition · Computer Science 2024-11-13 Shang Hong Sim , Clarence Lee , Alvin Tan , Cheston Tan

Rethinking The Training And Evaluation of Rich-Context Layout-to-Image Generation

Recent advancements in generative models have significantly enhanced their capacity for image generation, enabling a wide range of applications such as image editing, completion and video editing. A specialized area within generative…

Computer Vision and Pattern Recognition · Computer Science 2025-01-14 Jiaxin Cheng , Zixu Zhao , Tong He , Tianjun Xiao , Yicong Zhou , Zheng Zhang

CoF-T2I: Video Models as Pure Visual Reasoners for Text-to-Image Generation

Recent video generation models have revealed the emergence of Chain-of-Frame (CoF) reasoning, enabling frame-by-frame visual inference. With this capability, video models have been successfully applied to various visual tasks (e.g., maze…

Computer Vision and Pattern Recognition · Computer Science 2026-01-16 Chengzhuo Tong , Mingkun Chang , Shenglong Zhang , Yuran Wang , Cheng Liang , Zhizheng Zhao , Ruichuan An , Bohan Zeng , Yang Shi , Yifan Dai , Ziming Zhao , Guanbin Li , Pengfei Wan , Yuanxing Zhang , Wentao Zhang

Instant Preference Alignment for Text-to-Image Diffusion Models

Text-to-image (T2I) generation has greatly enhanced creative expression, yet achieving preference-aligned generation in a real-time and training-free manner remains challenging. Previous methods often rely on static, pre-collected…

Computer Vision and Pattern Recognition · Computer Science 2025-08-26 Yang Li , Songlin Yang , Xiaoxuan Han , Wei Wang , Jing Dong , Yueming Lyu , Ziyu Xue