English
Related papers

Related papers: Generating Illustrated Instructions

200 papers

Multistep instructions, such as recipes and how-to guides, greatly benefit from visual aids, such as a series of images that accompany the instruction steps. While Large Language Models (LLMs) have become adept at generating coherent…

Computer Vision and Pattern Recognition · Computer Science 2024-05-17 João Bordalo , Vasco Ramos , Rodrigo Valério , Diogo Glória-Silva , Yonatan Bitton , Michal Yarom , Idan Szpektor , Joao Magalhaes

In the text-to-image generation field, recent remarkable progress in Stable Diffusion makes it possible to generate rich kinds of novel photorealistic images. However, current models still face misalignment issues (e.g., problematic spatial…

Computer Vision and Pattern Recognition · Computer Science 2023-08-15 Leigang Qu , Shengqiong Wu , Hao Fei , Liqiang Nie , Tat-Seng Chua

Despite the advances in text-to-image synthesis, particularly with diffusion models, generating visual instructions that require consistent representation and smooth state transitions of objects across sequential steps remains a formidable…

Computer Vision and Pattern Recognition · Computer Science 2024-06-11 Quynh Phung , Songwei Ge , Jia-Bin Huang

Recently, researchers have proposed powerful systems for generating and manipulating images using natural language instructions. However, it is difficult to precisely specify many common classes of image transformations with text alone. For…

Computer Vision and Pattern Recognition · Computer Science 2024-04-09 Alec Helbling , Seongmin Lee , Polo Chau

Graphic design visually conveys information and data by creating and combining text, images and graphics. Two-stage methods that rely primarily on layout generation lack creativity and intelligence, making graphic design still…

Computer Vision and Pattern Recognition · Computer Science 2025-07-15 Yadong Qu , Shancheng Fang , Yuxin Wang , Xiaorui Wang , Zhineng Chen , Hongtao Xie , Yongdong Zhang

Despite rapid advancements in the capabilities of generative models, pretrained text-to-image models still struggle in capturing the semantics conveyed by complex prompts that compound multiple objects and instance-level attributes.…

Computer Vision and Pattern Recognition · Computer Science 2025-05-20 Etai Sella , Yanir Kleiman , Hadar Averbuch-Elor

This paper presents instruct-imagen, a model that tackles heterogeneous image generation tasks and generalizes across unseen tasks. We introduce *multi-modal instruction* for image generation, a task representation articulating a range of…

Computer Vision and Pattern Recognition · Computer Science 2024-01-05 Hexiang Hu , Kelvin C. K. Chan , Yu-Chuan Su , Wenhu Chen , Yandong Li , Kihyuk Sohn , Yang Zhao , Xue Ben , Boqing Gong , William Cohen , Ming-Wei Chang , Xuhui Jia

The effective communication of procedural knowledge remains a significant challenge in natural language processing (NLP), as purely textual instructions often fail to convey complex physical actions and spatial relationships. We address…

Computation and Language · Computer Science 2025-05-23 Jing Bi , Pinxin Liu , Ali Vosoughi , Jiarui Wu , Jinxi He , Chenliang Xu

Diffusion-based generative models have significantly advanced text-to-image generation but encounter challenges when processing lengthy and intricate text prompts describing complex scenes with multiple objects. While excelling in…

Computer Vision and Pattern Recognition · Computer Science 2024-02-27 Hanan Gani , Shariq Farooq Bhat , Muzammal Naseer , Salman Khan , Peter Wonka

Recent advancements in text-to-image diffusion models have yielded impressive results in generating realistic and diverse images. However, these models still struggle with complex prompts, such as those that involve numeracy and spatial…

Computer Vision and Pattern Recognition · Computer Science 2024-03-05 Long Lian , Boyi Li , Adam Yala , Trevor Darrell

Preference-conditioned image generation seeks to adapt generative models to individual users, producing outputs that reflect personal aesthetic choices beyond the given textual prompt. Despite recent progress, existing approaches either…

Computer Vision and Pattern Recognition · Computer Science 2025-12-09 Wenyi Mo , Tianyu Zhang , Yalong Bai , Ligong Han , Ying Ba , Dimitris N. Metaxas

Virtual Reality (VR) has emerged as a powerful tool for workforce training, offering immersive, interactive, and risk-free environments that enhance skill acquisition, decision-making, and confidence. Despite its advantages, developing VR…

Computer Vision and Pattern Recognition · Computer Science 2025-08-07 Subin Raj Peter

Large language models (LLMs) have enabled the automatic generation of step-by-step augmented reality (AR) instructions for a wide range of physical tasks. However, existing LLM-based AR guidance often lacks rich visual augmentations to…

Human-Computer Interaction · Computer Science 2025-09-25 Ada Yi Zhao , Aditya Gunturu , Ellen Yi-Luen Do , Ryo Suzuki

To address the challenge of information overload from massive web contents, recommender systems are widely applied to retrieve and present personalized results for users. However, recommendation tasks are inherently constrained to filtering…

Artificial Intelligence · Computer Science 2025-06-04 Jiongnan Liu , Zhicheng Dou , Ning Hu , Chenyan Xiong

Automatically generating training supervision for embodied tasks is crucial, as manual designing is tedious and not scalable. While prior works use large language models (LLMs) or vision-language models (VLMs) to generate rewards, these…

Computer Vision and Pattern Recognition · Computer Science 2025-03-14 Xiaowen Qiu , Yian Wang , Jiting Cai , Zhehuan Chen , Chunru Lin , Tsun-Hsuan Wang , Chuang Gan

With the capabilities of understanding and executing natural language instructions, Large language models (LLMs) can potentially act as a powerful tool for textual data augmentation. However, the quality of augmented data depends heavily on…

Computation and Language · Computer Science 2024-04-30 Yichuan Li , Kaize Ding , Jianling Wang , Kyumin Lee

Large-scale generative models are capable of producing high-quality images from detailed text descriptions. However, many aspects of an image are difficult or impossible to convey through text. We introduce self-guidance, a method that…

Computer Vision and Pattern Recognition · Computer Science 2023-06-13 Dave Epstein , Allan Jabri , Ben Poole , Alexei A. Efros , Aleksander Holynski

Recent advances in generative diffusion models have enabled text-controlled synthesis of realistic and diverse images with impressive quality. Despite these remarkable advances, the application of text-to-image generative models in computer…

Computer Vision and Pattern Recognition · Computer Science 2024-03-19 Yulu Gan , Sungwoo Park , Alexander Schubert , Anthony Philippakis , Ahmed M. Alaa

Editing images via instruction provides a natural way to generate interactive content, but it is a big challenge due to the higher requirement of scene understanding and generation. Prior work utilizes a chain of large language models,…

Computer Vision and Pattern Recognition · Computer Science 2026-02-27 Liya Ji , Chenyang Qi , Qifeng Chen

While recent advancements in multimodal language models have enabled image generation from expressive multi-image instructions, existing methods struggle to maintain performance under complex interleaved instructions. This limitation stems…

Computer Vision and Pattern Recognition · Computer Science 2026-05-13 Yabo Zhang , Kunchang Li , Dewei Zhou , Xinyu Huang , Xun Wang
‹ Prev 1 2 3 10 Next ›