Related papers: Iterative Refinement Improves Compositional Image …

Idea2Img: Iterative Self-Refinement with GPT-4V(ision) for Automatic Image Design and Generation

We introduce ``Idea to Image,'' a system that enables multimodal iterative self-refinement with GPT-4V(ision) for automatic image design and generation. Humans can quickly identify the characteristics of different text-to-image (T2I) models…

Computer Vision and Pattern Recognition · Computer Science 2024-08-15 Zhengyuan Yang , Jianfeng Wang , Linjie Li , Kevin Lin , Chung-Ching Lin , Zicheng Liu , Lijuan Wang

RePrompt: Reasoning-Augmented Reprompting for Text-to-Image Generation via Reinforcement Learning

Despite recent progress in text-to-image (T2I) generation, existing models often struggle to faithfully capture user intentions from short and under-specified prompts. While prior work has attempted to enhance prompts using large language…

Computer Vision and Pattern Recognition · Computer Science 2025-05-26 Mingrui Wu , Lu Wang , Pu Zhao , Fangkai Yang , Jianjin Zhang , Jianfeng Liu , Yuefeng Zhan , Weihao Han , Hao Sun , Jiayi Ji , Xiaoshuai Sun , Qingwei Lin , Weiwei Deng , Dongmei Zhang , Feng Sun , Qi Zhang , Rongrong Ji

Improving Text-to-Image Generation with Input-Side Inference-Time Scaling

Recent advances in text-to-image (T2I) generation have achieved impressive results, yet existing models often struggle with simple or underspecified prompts, leading to suboptimal image-text alignment, aesthetics, and quality. We propose a…

Computation and Language · Computer Science 2025-10-16 Ruibo Chen , Jiacheng Pan , Heng Huang , Zhenheng Yang

No Concept Left Behind: Test-Time Optimization for Compositional Text-to-Image Generation

Despite recent advances in text-to-image (T2I) models, they often fail to faithfully render all elements of complex prompts, frequently omitting or misrepresenting specific objects and attributes. Test-time optimization has emerged as a…

Computer Vision and Pattern Recognition · Computer Science 2025-09-30 Mohammad Hossein Sameti , Amir M. Mansourian , Arash Marioriyad , Soheil Fadaee Oshyani , Mohammad Hossein Rohban , Mahdieh Soleymani Baghshah

Iterative Prompt Refinement for Safer Text-to-Image Generation

Text-to-Image (T2I) models have made remarkable progress in generating images from text prompts, but their output quality and safety still depend heavily on how prompts are phrased. Existing safety methods typically refine prompts using…

Computer Vision and Pattern Recognition · Computer Science 2025-09-18 Jinwoo Jeon , JunHyeok Oh , Hayeong Lee , Byung-Jun Lee

Easier Painting Than Thinking: Can Text-to-Image Models Set the Stage, but Not Direct the Play?

Text-to-image (T2I) generation aims to synthesize images from textual prompts, which jointly specify what must be shown and imply what can be inferred, which thus correspond to two core capabilities: \textbf{\textit{composition}} and…

Computer Vision and Pattern Recognition · Computer Science 2026-03-03 Ouxiang Li , Yuan Wang , Xinting Hu , Huijuan Huang , Rui Chen , Jiarong Ou , Xin Tao , Pengfei Wan , Xiaojuan Qi , Fuli Feng

GraPE: A Generate-Plan-Edit Framework for Compositional T2I Synthesis

Text-to-image (T2I) generation has seen significant progress with diffusion models, enabling generation of photo-realistic images from text prompts. Despite this progress, existing methods still face challenges in following complex text…

Computer Vision and Pattern Recognition · Computer Science 2025-03-12 Ashish Goswami , Satyam Kumar Modi , Santhosh Rishi Deshineni , Harman Singh , Prathosh A. P , Parag Singla

CompAlign: Improving Compositional Text-to-Image Generation with a Complex Benchmark and Fine-Grained Feedback

State-of-the-art T2I models are capable of generating high-resolution images given textual prompts. However, they still struggle with accurately depicting compositional scenes that specify multiple objects, attributes, and spatial…

Computer Vision and Pattern Recognition · Computer Science 2025-05-19 Yixin Wan , Kai-Wei Chang

Test-time Prompt Refinement for Text-to-Image Models

Text-to-image (T2I) generation models have made significant strides but still struggle with prompt sensitivity: even minor changes in prompt wording can yield inconsistent or inaccurate outputs. To address this challenge, we introduce a…

Machine Learning · Computer Science 2025-07-31 Mohammad Abdul Hafeez Khan , Yash Jain , Siddhartha Bhattacharyya , Vibhav Vineet

EPIC: Efficient Predicate-Guided Inference-Time Control for Compositional Text-to-Image Generation

Recent text-to-image (T2I) generators can synthesize realistic images, but still struggle with compositional prompts involving multiple objects, counts, attributes, and relations. We introduce EPIC (Efficient Predicate-Guided Inference-Time…

Computer Vision and Pattern Recognition · Computer Science 2026-05-13 Sunung Mun , Sunghyun Cho , Jungseul Ok

T2I-FineEval: Fine-Grained Compositional Metric for Text-to-Image Evaluation

Although recent text-to-image generative models have achieved impressive performance, they still often struggle with capturing the compositional complexities of prompts including attribute binding, and spatial relationships between…

Computer Vision and Pattern Recognition · Computer Science 2025-03-17 Seyed Mohammad Hadi Hosseini , Amir Mohammad Izadi , Ali Abdollahi , Armin Saghafian , Mahdieh Soleymani Baghshah

ConceptMix++: Leveling the Playing Field in Text-to-Image Benchmarking via Iterative Prompt Optimization

Current text-to-image (T2I) benchmarks evaluate models on rigid prompts, potentially underestimating true generative capabilities due to prompt sensitivity and creating biases that favor certain models while disadvantaging others. We…

Computer Vision and Pattern Recognition · Computer Science 2025-07-08 Haosheng Gan , Berk Tinaz , Mohammad Shahab Sepehri , Zalan Fabian , Mahdi Soltanolkotabi

Culture-TRIP: Culturally-Aware Text-to-Image Generation with Iterative Prompt Refinement

Text-to-Image models, including Stable Diffusion, have significantly improved in generating images that are highly semantically aligned with the given prompts. However, existing models may fail to produce appropriate images for the cultural…

Computer Vision and Pattern Recognition · Computer Science 2025-05-20 Suchae Jeong , Inseong Choi , Youngsik Yun , Jihie Kim

TextMatch: Enhancing Image-Text Consistency Through Multimodal Optimization

Text-to-image generative models excel in creating images from text but struggle with ensuring alignment and consistency between outputs and prompts. This paper introduces TextMatch, a novel framework that leverages multimodal optimization…

Computer Vision and Pattern Recognition · Computer Science 2025-01-28 Yucong Luo , Mingyue Cheng , Jie Ouyang , Xiaoyu Tao , Qi Liu

IA-T2I: Internet-Augmented Text-to-Image Generation

Current text-to-image (T2I) generation models achieve promising results, but they fail on the scenarios where the knowledge implied in the text prompt is uncertain. For example, a T2I model released in February would struggle to generate a…

Computer Vision and Pattern Recognition · Computer Science 2025-05-22 Chuanhao Li , Jianwen Sun , Yukang Feng , Mingliang Zhai , Yifan Chang , Kaipeng Zhang

Improving Text-to-Image Consistency via Automatic Prompt Optimization

Impressive advances in text-to-image (T2I) generative models have yielded a plethora of high performing models which are able to generate aesthetically appealing, photorealistic images. Despite the progress, these models still struggle to…

Computer Vision and Pattern Recognition · Computer Science 2024-03-27 Oscar Mañas , Pietro Astolfi , Melissa Hall , Candace Ross , Jack Urbanek , Adina Williams , Aishwarya Agrawal , Adriana Romero-Soriano , Michal Drozdzal

Synthetic Curriculum Reinforces Compositional Text-to-Image Generation

Text-to-Image (T2I) generation has long been an open problem, with compositional synthesis remaining particularly challenging. This task requires accurate rendering of complex scenes containing multiple objects that exhibit diverse…

Computer Vision and Pattern Recognition · Computer Science 2025-11-25 Shijian Wang , Runhao Fu , Siyi Zhao , Qingqin Zhan , Xingjian Wang , Jiarui Jin , Yuan Lu , Hanqian Wu , Cunjian Chen

Collaborative Generative AI: Integrating GPT-k for Efficient Editing in Text-to-Image Generation

The field of text-to-image (T2I) generation has garnered significant attention both within the research community and among everyday users. Despite the advancements of T2I models, a common issue encountered by users is the need for…

Computation and Language · Computer Science 2023-10-31 Wanrong Zhu , Xinyi Wang , Yujie Lu , Tsu-Jui Fu , Xin Eric Wang , Miguel Eckstein , William Yang Wang

TIPO: Text to Image with Text Presampling for Prompt Optimization

TIPO (Text-to-Image Prompt Optimization) introduces an efficient approach for automatic prompt refinement in text-to-image (T2I) generation. Starting from simple user prompts, TIPO leverages a lightweight pre-trained model to expand these…

Computer Vision and Pattern Recognition · Computer Science 2026-02-10 Shih-Ying Yeh , Yi Li , Sang-Hyun Park , Giyeong Oh , Xuehai Wang , Min Song , Youngjae Yu , Shang-Hong Lai

Transferring Visual Attributes from Natural Language to Verified Image Generation

Text to image generation methods (T2I) are widely popular in generating art and other creative artifacts. While visual hallucinations can be a positive factor in scenarios where creativity is appreciated, such artifacts are poorly suited…

Computer Vision and Pattern Recognition · Computer Science 2023-05-30 Rodrigo Valerio , Joao Bordalo , Michal Yarom , Yonatan Bitton , Idan Szpektor , Joao Magalhaes