English
Related papers

Related papers: DiffCap: Exploring Continuous Diffusion on Image C…

200 papers

The image captioning task is typically realized by an auto-regressive method that decodes the text tokens one by one. We present a diffusion-based captioning model, dubbed the name DDCap, to allow more decoding flexibility. Unlike image…

Computer Vision and Pattern Recognition · Computer Science 2022-12-12 Zixin Zhu , Yixuan Wei , Jianfeng Wang , Zhe Gan , Zheng Zhang , Le Wang , Gang Hua , Lijuan Wang , Zicheng Liu , Han Hu

The task of 3D shape captioning occupies a significant place within the domain of computer graphics and has garnered considerable interest in recent years. Traditional approaches to this challenge frequently depend on the utilization of…

Graphics · Computer Science 2025-09-30 Zhenyu Shu , Jiawei Wen , Shiyang Li , Shiqing Xin , Ligang Liu

While impressive performance has been achieved in image captioning, the limited diversity of the generated captions and the large parameter scale remain major barriers to the real-word application of these systems. In this work, we propose…

Computer Vision and Pattern Recognition · Computer Science 2023-10-18 Guisheng Liu , Yi Li , Zhengcong Fei , Haiyan Fu , Xiangyang Luo , Yanqing Guo

Current video captioning methods usually use an encoder-decoder structure to generate text autoregressively. However, autoregressive methods have inherent limitations such as slow generation speed and large cumulative error. Furthermore,…

Computer Vision and Pattern Recognition · Computer Science 2026-04-10 Junbo Wang , Liangyu Fu , Yuke Li , Yining Zhu , Ya Jing , Xuecheng Wu , Jiangbin Zheng

Existing text-to-image diffusion models struggle to synthesize realistic images given dense captions, where each text prompt provides a detailed description for a specific image region. To address this, we propose DenseDiffusion, a…

Computer Vision and Pattern Recognition · Computer Science 2023-08-25 Yunji Kim , Jiyoung Lee , Jin-Hwa Kim , Jung-Woo Ha , Jun-Yan Zhu

Image captioning task has been extensively researched by previous work. However, limited experiments focus on generating captions based on non-autoregressive text decoder. Inspired by the recent success of the denoising diffusion model on…

Computer Vision and Pattern Recognition · Computer Science 2022-10-11 Shitong Xu

Image captioning, an important vision-language task, often requires a tremendous number of finely labeled image-caption pairs for learning the underlying alignment between images and texts. In this paper, we proposed a multimodal data…

Computer Vision and Pattern Recognition · Computer Science 2023-11-14 Changrong Xiao , Sean Xin Xu , Kunpeng Zhang

Can continuous diffusion models bring the same performance breakthrough on natural language they did for image generation? To circumvent the discrete nature of text data, we can simply project tokens in a continuous space of embeddings, as…

We introduce Diffusion-based Audio Captioning (DAC), a non-autoregressive diffusion model tailored for diverse and efficient audio captioning. Although existing captioning models relying on language backbones have achieved remarkable…

Computation and Language · Computer Science 2025-06-03 Manjie Xu , Chenxing Li , Xinyi Tu , Yong Ren , Ruibo Fu , Wei Liang , Dong Yu

Recent advancements in diffusion models have enabled the generation of realistic deepfakes from textual prompts in natural language. While these models have numerous benefits across various sectors, they have also raised concerns about the…

Computer Vision and Pattern Recognition · Computer Science 2024-05-22 Roberto Amoroso , Davide Morelli , Marcella Cornia , Lorenzo Baraldi , Alberto Del Bimbo , Rita Cucchiara

Explicit Caption Editing (ECE) -- refining reference image captions through a sequence of explicit edit operations (e.g., KEEP, DETELE) -- has raised significant attention due to its explainable and human-like nature. After training with…

Computer Vision and Pattern Recognition · Computer Science 2024-03-07 Zhen Wang , Xinyun Jiang , Jun Xiao , Tao Chen , Long Chen

Diffusion models have emerged as a promising approach for text generation, with recent works falling into two main categories: discrete and continuous diffusion models. Discrete diffusion models apply token corruption independently using…

Computation and Language · Computer Science 2025-05-29 Bocheng Li , Zhujin Gao , Linli Xu

Visual recognition in a low-data regime is challenging and often prone to overfitting. To mitigate this issue, several data augmentation strategies have been proposed. However, standard transformations, e.g., rotation, cropping, and…

Computer Vision and Pattern Recognition · Computer Science 2023-11-08 Aniket Roy , Anshul Shah , Ketul Shah , Anirban Roy , Rama Chellappa

The aim of image captioning is to generate captions by machine to describe image contents. Despite many efforts, generating discriminative captions for images remains non-trivial. Most traditional approaches imitate the language structure…

Computer Vision and Pattern Recognition · Computer Science 2018-07-24 Xihui Liu , Hongsheng Li , Jing Shao , Dapeng Chen , Xiaogang Wang

Image generation has recently seen tremendous advances, with diffusion models allowing to synthesize convincing images for a large variety of text prompts. In this article, we propose DiffEdit, a method to take advantage of text-conditioned…

Computer Vision and Pattern Recognition · Computer Science 2022-10-21 Guillaume Couairon , Jakob Verbeek , Holger Schwenk , Matthieu Cord

Latent diffusion models excel at producing high-quality images from text. Yet, concerns appear about the lack of diversity in the generated imagery. To tackle this, we introduce Diverse Diffusion, a method for boosting image diversity…

Computer Vision and Pattern Recognition · Computer Science 2023-10-20 Mariia Zameshina , Olivier Teytaud , Laurent Najman

Diffusion models have become a new generative paradigm for text generation. Considering the discrete categorical nature of text, in this paper, we propose GlyphDiffusion, a novel diffusion approach for text generation via text-guided image…

Computation and Language · Computer Science 2023-05-09 Junyi Li , Wayne Xin Zhao , Jian-Yun Nie , Ji-Rong Wen

The objective of image captioning models is to bridge the gap between the visual and linguistic modalities by generating natural language descriptions that accurately reflect the content of input images. In recent years, researchers have…

Computer Vision and Pattern Recognition · Computer Science 2024-05-24 Sara Sarto , Marcella Cornia , Lorenzo Baraldi , Alessandro Nicolosi , Rita Cucchiara

We present DiffCollage, a compositional diffusion model that can generate large content by leveraging diffusion models trained on generating pieces of the large content. Our approach is based on a factor graph representation where each…

Computer Vision and Pattern Recognition · Computer Science 2023-03-31 Qinsheng Zhang , Jiaming Song , Xun Huang , Yongxin Chen , Ming-Yu Liu

There has been a significant progress in text conditional image generation models. Recent advancements in this field depend not only on improvements in model structures, but also vast quantities of text-image paired datasets. However,…

Computer Vision and Pattern Recognition · Computer Science 2024-03-25 Seungdae Han , Joohee Kim
‹ Prev 1 2 3 10 Next ›