English
Related papers

Related papers: Learning Visual Generative Priors without Text

200 papers

Text-and-Image-To-Image (TI2I), an extension of Text-To-Image (T2I), integrates image inputs with textual instructions to enhance image generation. Existing methods often partially utilize image inputs, focusing on specific elements like…

Computer Vision and Pattern Recognition · Computer Science 2025-03-20 Teng-Fang Hsiao , Bo-Kai Ruan , Yi-Lun Wu , Tzu-Ling Lin , Hong-Han Shuai

A significant ``modality gap" exists between the abundance of text-only data and the increasing power of multimodal models. This work systematically investigates whether images generated on-the-fly by Text-to-Image (T2I) models can serve as…

Multimedia · Computer Science 2026-03-04 Yuesheng Huang , Peng Zhang , Xiaoxin Wu , Riliang Liu , Jiaqi Liang

Unified multimodal generation architectures that jointly produce text and images have recently emerged as a promising direction for text-to-image (T2I) synthesis. However, many existing systems rely on explicit modality switching,…

Text-to-image generation (T2I) refers to the text-guided generation of high-quality images. In the past few years, T2I has attracted widespread attention and numerous works have emerged. In this survey, we comprehensively review 141 works…

Computer Vision and Pattern Recognition · Computer Science 2025-05-06 Pengfei Yang , Ngai-Man Cheung , Xinda Ma

As large language models have demonstrated impressive performance in many domains, recent works have adopted language models (LMs) as controllers of visual modules for vision-and-language tasks. While existing work focuses on equipping LMs…

Computer Vision and Pattern Recognition · Computer Science 2023-10-30 Jaemin Cho , Abhay Zala , Mohit Bansal

In this paper, we present an empirical study introducing a nuanced evaluation framework for text-to-image (T2I) generative models, applied to human image synthesis. Our framework categorizes evaluations into two distinct groups: first,…

Computer Vision and Pattern Recognition · Computer Science 2024-10-29 Muxi Chen , Yi Liu , Jian Yi , Changran Xu , Qiuxia Lai , Hongliang Wang , Tsung-Yi Ho , Qiang Xu

Diffusion models have emerged as a dominant paradigm for generative modeling across a wide range of domains, including prompt-conditional generation. The vast majority of samplers, however, rely on forward discretization of the reverse…

Computer Vision and Pattern Recognition · Computer Science 2025-12-01 Zhenghan Fang , Jian Zheng , Qiaozi Gao , Xiaofeng Gao , Jeremias Sulam

Recent text-to-image (T2I) models have benefited from large-scale and high-quality data, demonstrating impressive performance. However, these T2I models still struggle to produce images that are aesthetically pleasing, geometrically…

Computer Vision and Pattern Recognition · Computer Science 2024-03-28 Jianshu Guo , Wenhao Chai , Jie Deng , Hsiang-Wei Huang , Tian Ye , Yichen Xu , Jiawei Zhang , Jenq-Neng Hwang , Gaoang Wang

Despite advancements in text-to-image generation (T2I), prior methods often face text-image misalignment problems such as relation confusion in generated images. Existing solutions involve cross-attention manipulation for better…

Computer Vision and Pattern Recognition · Computer Science 2024-03-15 Leigang Qu , Wenjie Wang , Yongqi Li , Hanwang Zhang , Liqiang Nie , Tat-Seng Chua

When humans read a specific text, they often visualize the corresponding images, and we hope that computers can do the same. Text-to-image synthesis (T2I), which focuses on generating high-quality images from textual descriptions, has…

Computer Vision and Pattern Recognition · Computer Science 2024-11-26 Nonghai Zhang , Hao Tang

We propose Make-A-Video -- an approach for directly translating the tremendous recent progress in Text-to-Image (T2I) generation to Text-to-Video (T2V). Our intuition is simple: learn what the world looks like and how it is described from…

Computer Vision and Pattern Recognition · Computer Science 2022-09-30 Uriel Singer , Adam Polyak , Thomas Hayes , Xi Yin , Jie An , Songyang Zhang , Qiyuan Hu , Harry Yang , Oron Ashual , Oran Gafni , Devi Parikh , Sonal Gupta , Yaniv Taigman

Visual generation grounded in Visual Foundation Model (VFM) representations offers a highly promising unified pathway for integrating visual understanding, perception, and generation. Despite this potential, training large-scale…

Computer Vision and Pattern Recognition · Computer Science 2025-12-15 Minglei Shi , Haolin Wang , Borui Zhang , Wenzhao Zheng , Bohan Zeng , Ziyang Yuan , Xiaoshi Wu , Yuanxing Zhang , Huan Yang , Xintao Wang , Pengfei Wan , Kun Gai , Jie Zhou , Jiwen Lu

Text-to-image (T2I) models have advanced creative content generation, yet their reliance on large uncurated datasets often reproduces societal biases. We present FairT2I, a training-free and interactive framework grounded in a…

Computer Vision and Pattern Recognition · Computer Science 2026-01-08 Jinya Sakurai , Yuki Koyama , Issei Sato

Translating information between text and image is a fundamental problem in artificial intelligence that connects natural language processing and computer vision. In the past few years, performance in image caption generation has seen…

Computer Vision and Pattern Recognition · Computer Science 2017-06-06 Hao Dong , Jingqing Zhang , Douglas McIlwraith , Yike Guo

Customizing text-to-image (T2I) models has seen tremendous progress recently, particularly in areas such as personalization, stylization, and conditional generation. However, expanding this progress to video generation is still in its…

Computer Vision and Pattern Recognition · Computer Science 2024-07-12 Hila Chefer , Shiran Zada , Roni Paiss , Ariel Ephrat , Omer Tov , Michael Rubinstein , Lior Wolf , Tali Dekel , Tomer Michaeli , Inbar Mosseri

Text-to-image (T2I) generation aims at producing realistic images corresponding to text descriptions. Generative Adversarial Network (GAN) has proven to be successful in this task. Typical T2I GANs are 2 phase methods that first pretrain an…

Computer Vision and Pattern Recognition · Computer Science 2024-12-06 Yibin Liu , Jianyu Zhang , Li Zhang , Shijian Li , Gang Pan

Understanding spatial relations is a crucial cognitive ability for both humans and AI. While current research has predominantly focused on the benchmarking of text-to-image (T2I) models, we propose a more comprehensive evaluation that…

Computer Vision and Pattern Recognition · Computer Science 2024-11-13 Shang Hong Sim , Clarence Lee , Alvin Tan , Cheston Tan

Recent advancements in generative models have significantly enhanced their capacity for image generation, enabling a wide range of applications such as image editing, completion and video editing. A specialized area within generative…

Computer Vision and Pattern Recognition · Computer Science 2025-01-14 Jiaxin Cheng , Zixu Zhao , Tong He , Tianjun Xiao , Yicong Zhou , Zheng Zhang

Recent video generation models have revealed the emergence of Chain-of-Frame (CoF) reasoning, enabling frame-by-frame visual inference. With this capability, video models have been successfully applied to various visual tasks (e.g., maze…

Computer Vision and Pattern Recognition · Computer Science 2026-01-16 Chengzhuo Tong , Mingkun Chang , Shenglong Zhang , Yuran Wang , Cheng Liang , Zhizheng Zhao , Ruichuan An , Bohan Zeng , Yang Shi , Yifan Dai , Ziming Zhao , Guanbin Li , Pengfei Wan , Yuanxing Zhang , Wentao Zhang

Text-to-image (T2I) generation has greatly enhanced creative expression, yet achieving preference-aligned generation in a real-time and training-free manner remains challenging. Previous methods often rely on static, pre-collected…

Computer Vision and Pattern Recognition · Computer Science 2025-08-26 Yang Li , Songlin Yang , Xiaoxuan Han , Wei Wang , Jing Dong , Yueming Lyu , Ziyu Xue
‹ Prev 1 2 3 10 Next ›