Related papers: Attribute-Centric Compositional Text-to-Image Gene…

Adma-GAN: Attribute-Driven Memory Augmented GANs for Text-to-Image Generation

As a challenging task, text-to-image generation aims to generate photo-realistic and semantically consistent images according to the given text descriptions. Existing methods mainly extract the text information from only one sentence to…

Computer Vision and Pattern Recognition · Computer Science 2022-09-29 Xintian Wu , Hanbin Zhao , Liangli Zheng , Shouhong Ding , Xi Li

Improving Compositional Attribute Binding in Text-to-Image Generative Models via Enhanced Text Embeddings

Text-to-image diffusion-based generative models have the stunning ability to generate photo-realistic images and achieve state-of-the-art low FID scores on challenging image generation benchmarks. However, one of the primary failure modes…

Computer Vision and Pattern Recognition · Computer Science 2025-03-26 Arman Zarei , Keivan Rezaei , Samyadeep Basu , Mehrdad Saberi , Mazda Moayeri , Priyatham Kattakinda , Soheil Feizi

Benchmarking and Improving Compositional Generalization of Multi-aspect Controllable Text Generation

Compositional generalization, representing the model's ability to generate text with new attribute combinations obtained by recombining single attributes from the training data, is a crucial property for multi-aspect controllable text…

Computation and Language · Computer Science 2024-06-04 Tianqi Zhong , Zhaoyi Li , Quan Wang , Linqi Song , Ying Wei , Defu Lian , Zhendong Mao

AgentComp: From Agentic Reasoning to Compositional Mastery in Text-to-Image Models

Text-to-image generative models have achieved remarkable visual quality but still struggle with compositionality$-$accurately capturing object relationships, attribute bindings, and fine-grained details in prompts. A key limitation is that…

Computer Vision and Pattern Recognition · Computer Science 2025-12-11 Arman Zarei , Jiacheng Pan , Matthew Gwilliam , Soheil Feizi , Zhenheng Yang

Generating Intermediate Representations for Compositional Text-To-Image Generation

Text-to-image diffusion models have demonstrated an impressive ability to produce high-quality outputs. However, they often struggle to accurately follow fine-grained spatial information in an input text. To this end, we propose a…

Computer Vision and Pattern Recognition · Computer Science 2024-10-22 Ran Galun , Sagie Benaim

AITTI: Learning Adaptive Inclusive Token for Text-to-Image Generation

Despite the high-quality results of text-to-image generation, stereotypical biases have been spotted in their generated contents, compromising the fairness of generative models. In this work, we propose to learn adaptive inclusive tokens to…

Computer Vision and Pattern Recognition · Computer Science 2026-01-16 Xinyu Hou , Xiaoming Li , Chen Change Loy

VSC: Visual Search Compositional Text-to-Image Diffusion Model

Text-to-image diffusion models have shown impressive capabilities in generating realistic visuals from natural-language prompts, yet they often struggle with accurately binding attributes to corresponding objects, especially in prompts…

Computer Vision and Pattern Recognition · Computer Science 2025-05-05 Do Huu Dat , Nam Hyeonu , Po-Yuan Mao , Tae-Hyun Oh

StyleT2I: Toward Compositional and High-Fidelity Text-to-Image Synthesis

Although progress has been made for text-to-image synthesis, previous methods fall short of generalizing to unseen or underrepresented attribute compositions in the input text. Lacking compositionality could have severe implications for…

Computer Vision and Pattern Recognition · Computer Science 2022-03-30 Zhiheng Li , Martin Renqiang Min , Kai Li , Chenliang Xu

ITI-GEN: Inclusive Text-to-Image Generation

Text-to-image generative models often reflect the biases of the training data, leading to unequal representations of underrepresented groups. This study investigates inclusive text-to-image generative models that generate images based on…

Computer Vision and Pattern Recognition · Computer Science 2023-09-12 Cheng Zhang , Xuanbai Chen , Siqi Chai , Chen Henry Wu , Dmitry Lagun , Thabo Beeler , Fernando De la Torre

SegAttnGAN: Text to Image Generation with Segmentation Attention

In this paper, we propose a novel generative network (SegAttnGAN) that utilizes additional segmentation information for the text-to-image synthesis task. As the segmentation data introduced to the model provides useful guidance on the…

Computer Vision and Pattern Recognition · Computer Science 2020-05-27 Yuchuan Gou , Qiancheng Wu , Minghao Li , Bo Gong , Mei Han

Unsupervised Compositional Concepts Discovery with Text-to-Image Generative Models

Text-to-image generative models have enabled high-resolution image synthesis across different domains, but require users to specify the content they wish to generate. In this paper, we consider the inverse problem -- given a collection of…

Computer Vision and Pattern Recognition · Computer Science 2023-08-04 Nan Liu , Yilun Du , Shuang Li , Joshua B. Tenenbaum , Antonio Torralba

Revolutionizing Text-to-Image Retrieval as Autoregressive Token-to-Voken Generation

Text-to-image retrieval is a fundamental task in multimedia processing, aiming to retrieve semantically relevant cross-modal content. Traditional studies have typically approached this task as a discriminative problem, matching the text and…

Multimedia · Computer Science 2024-07-25 Yongqi Li , Hongru Cai , Wenjie Wang , Leigang Qu , Yinwei Wei , Wenjie Li , Liqiang Nie , Tat-Seng Chua

Enhancing Image-Text Matching with Adaptive Feature Aggregation

Image-text matching aims to find matched cross-modal pairs accurately. While current methods often rely on projecting cross-modal features into a common embedding space, they frequently suffer from imbalanced feature representations across…

Information Retrieval · Computer Science 2024-01-19 Zuhui Wang , Yunting Yin , I. V. Ramakrishnan

Evaluating Text-to-Image Generative Models: An Empirical Study on Human Image Synthesis

In this paper, we present an empirical study introducing a nuanced evaluation framework for text-to-image (T2I) generative models, applied to human image synthesis. Our framework categorizes evaluations into two distinct groups: first,…

Computer Vision and Pattern Recognition · Computer Science 2024-10-29 Muxi Chen , Yi Liu , Jian Yi , Changran Xu , Qiuxia Lai , Hongliang Wang , Tsung-Yi Ho , Qiang Xu

ComposeMe: Attribute-Specific Image Prompts for Controllable Human Image Generation

Generating high-fidelity images of humans with fine-grained control over attributes such as hairstyle and clothing remains a core challenge in personalized text-to-image synthesis. While prior methods emphasize identity preservation from a…

Computer Vision and Pattern Recognition · Computer Science 2025-10-17 Guocheng Gordon Qian , Daniil Ostashev , Egor Nemchinov , Avihay Assouline , Sergey Tulyakov , Kuan-Chieh Jackson Wang , Kfir Aberman

Image Search with Text Feedback by Additive Attention Compositional Learning

Effective image retrieval with text feedback stands to impact a range of real-world applications, such as e-commerce. Given a source image and text feedback that describes the desired modifications to that image, the goal is to retrieve the…

Computer Vision and Pattern Recognition · Computer Science 2022-03-09 Yuxin Tian , Shawn Newsam , Kofi Boakye

Air-Decoding: Attribute Distribution Reconstruction for Decoding-Time Controllable Text Generation

Controllable text generation (CTG) aims to generate text with desired attributes, and decoding-time-based methods have shown promising performance on this task. However, in this paper, we identify the phenomenon of Attribute Collapse for…

Computation and Language · Computer Science 2023-11-03 Tianqi Zhong , Quan Wang , Jingxuan Han , Yongdong Zhang , Zhendong Mao

Limitations of Face Image Generation

Text-to-image diffusion models have achieved widespread popularity due to their unprecedented image generation capability. In particular, their ability to synthesize and modify human faces has spurred research into using generated face…

Computer Vision and Pattern Recognition · Computer Science 2023-12-22 Harrison Rosenberg , Shimaa Ahmed , Guruprasad V Ramesh , Ramya Korlakai Vinayak , Kassem Fawaz

Semi-supervised Image Attribute Editing using Generative Adversarial Networks

Image attribute editing is a challenging problem that has been recently studied by many researchers using generative networks. The challenge is in the manipulation of selected attributes of images while preserving the other details. The…

Computer Vision and Pattern Recognition · Computer Science 2020-04-14 Yahya Dogan , Hacer Yalim Keles

Conditional Text Image Generation with Diffusion Models

Current text recognition systems, including those for handwritten scripts and scene text, have relied heavily on image synthesis and augmentation, since it is difficult to realize real-world complexity and diversity through collecting and…

Computer Vision and Pattern Recognition · Computer Science 2023-06-21 Yuanzhi Zhu , Zhaohai Li , Tianwei Wang , Mengchao He , Cong Yao