English
Related papers

Related papers: CLIP-Diffusion-LM: Apply Diffusion Model on Image …

200 papers

The image captioning task is typically realized by an auto-regressive method that decodes the text tokens one by one. We present a diffusion-based captioning model, dubbed the name DDCap, to allow more decoding flexibility. Unlike image…

Computer Vision and Pattern Recognition · Computer Science 2022-12-12 Zixin Zhu , Yixuan Wei , Jianfeng Wang , Zhe Gan , Zheng Zhang , Le Wang , Gang Hua , Lijuan Wang , Zicheng Liu , Han Hu

There has been a significant progress in text conditional image generation models. Recent advancements in this field depend not only on improvements in model structures, but also vast quantities of text-image paired datasets. However,…

Computer Vision and Pattern Recognition · Computer Science 2024-03-25 Seungdae Han , Joohee Kim

While impressive performance has been achieved in image captioning, the limited diversity of the generated captions and the large parameter scale remain major barriers to the real-word application of these systems. In this work, we propose…

Computer Vision and Pattern Recognition · Computer Science 2023-10-18 Guisheng Liu , Yi Li , Zhengcong Fei , Haiyan Fu , Xiangyang Luo , Yanqing Guo

Current image captioning works usually focus on generating descriptions in an autoregressive manner. However, there are limited works that focus on generating descriptions non-autoregressively, which brings more decoding diversity. Inspired…

Computer Vision and Pattern Recognition · Computer Science 2023-05-23 Yufeng He , Zefan Cai , Xu Gan , Baobao Chang

Contrastive models like CLIP have been shown to learn robust representations of images that capture both semantics and style. To leverage these representations for image generation, we propose a two-stage model: a prior that generates a…

Computer Vision and Pattern Recognition · Computer Science 2022-04-14 Aditya Ramesh , Prafulla Dhariwal , Alex Nichol , Casey Chu , Mark Chen

Existing text-to-image diffusion models struggle to synthesize realistic images given dense captions, where each text prompt provides a detailed description for a specific image region. To address this, we propose DenseDiffusion, a…

Computer Vision and Pattern Recognition · Computer Science 2023-08-25 Yunji Kim , Jiyoung Lee , Jin-Hwa Kim , Jung-Woo Ha , Jun-Yan Zhu

Image captioning is conventionally formulated as the task of generating captions for images that match the distribution of reference image-caption pairs. However, reference captions in standard captioning datasets are short and may not…

Computer Vision and Pattern Recognition · Computer Science 2023-08-01 Simon Kornblith , Lala Li , Zirui Wang , Thao Nguyen

Instead of performing text-conditioned denoising in the image domain, latent diffusion models (LDMs) operate in latent space of a variational autoencoder (VAE), enabling more efficient processing at reduced computational costs. However,…

Computer Vision and Pattern Recognition · Computer Science 2025-03-12 Jason Becker , Chris Wendler , Peter Baylies , Robert West , Christian Wressnegger

Denoising diffusion models have emerged as a powerful tool for various image generation and editing tasks, facilitating the synthesis of visual content in an unconditional or input-conditional manner. The core idea behind them is learning…

Computer Vision and Pattern Recognition · Computer Science 2025-03-12 Yi Huang , Jiancheng Huang , Yifan Liu , Mingfu Yan , Jiaxi Lv , Jianzhuang Liu , Wei Xiong , He Zhang , Liangliang Cao , Shifeng Chen

Generating a novel textual description of an image is an interesting problem that connects computer vision and natural language processing. In this paper, we present a simple model that is able to generate descriptive sentences given a…

Computation and Language · Computer Science 2015-04-10 Rémi Lebret , Pedro O. Pinheiro , Ronan Collobert

The excellent generative capabilities of text-to-image diffusion models suggest they learn informative representations of image-text data. However, what knowledge their representations capture is not fully understood, and they have not been…

Computer Vision and Pattern Recognition · Computer Science 2023-09-07 Kevin Clark , Priyank Jaini

Denoising diffusion probabilistic models are a promising new class of generative models that mark a milestone in high-quality image generation. This paper showcases their ability to sequentially generate video, surpassing prior methods in…

Computer Vision and Pattern Recognition · Computer Science 2022-12-09 Ruihan Yang , Prakhar Srivastava , Stephan Mandt

Text-to-image diffusion models achieved a remarkable leap in capabilities over the last few years, enabling high-quality and diverse synthesis of images from a textual prompt. However, even the most advanced models often struggle to…

Computer Vision and Pattern Recognition · Computer Science 2023-10-26 Eyal Segalis , Dani Valevski , Danny Lumen , Yossi Matias , Yaniv Leviathan

Current video captioning methods usually use an encoder-decoder structure to generate text autoregressively. However, autoregressive methods have inherent limitations such as slow generation speed and large cumulative error. Furthermore,…

Computer Vision and Pattern Recognition · Computer Science 2026-04-10 Junbo Wang , Liangyu Fu , Yuke Li , Yining Zhu , Ya Jing , Xuecheng Wu , Jiangbin Zheng

Image captioning creates informative text from an input image by creating a relationship between the words and the actual content of an image. Recently, deep learning models that utilize transformers have been the most successful in…

Computer Vision and Pattern Recognition · Computer Science 2025-01-28 Israa Al Badarneh , Bassam Hammo , Omar Al-Kadi

We propose a text-to-image generation algorithm based on deep neural networks when text captions for images are unavailable during training. In this work, instead of simply generating pseudo-ground-truth sentences of training images using…

Computer Vision and Pattern Recognition · Computer Science 2023-03-29 Minsoo Kang , Doyup Lee , Jiseob Kim , Saehoon Kim , Bohyung Han

Diffusion models have revolted the field of text-to-image generation recently. The unique way of fusing text and image information contributes to their remarkable capability of generating highly text-related images. From another…

Computer Vision and Pattern Recognition · Computer Science 2024-10-02 Changming Xiao , Qi Yang , Feng Zhou , Changshui Zhang

Incorporating diffusion models in the image compression domain has the potential to produce realistic and detailed reconstructions, especially at extremely low bitrates. Previous methods focus on using diffusion models as expressive…

Image and Video Processing · Electrical Eng. & Systems 2024-10-10 Lucas Relic , Roberto Azevedo , Markus Gross , Christopher Schroers

Neural networks struggle with image classification when biases are learned and misleads correlations, affecting their generalization and performance. Previous methods require attribute labels (e.g. background, color) or utilizes Generative…

Computer Vision and Pattern Recognition · Computer Science 2024-11-26 Donggeun Ko , Dongjun Lee , Namjun Park , Wonkyeong Shim , Jaekwang Kim

Remote sensing image change captioning (RSICC) aims at generating human-like language to describe the semantic changes between bi-temporal remote sensing image pairs. It provides valuable insights into environmental dynamics and land…

Computer Vision and Pattern Recognition · Computer Science 2024-05-22 Xiaofei Yu , Yitong Li , Jie Ma
‹ Prev 1 2 3 10 Next ›