English
Related papers

Related papers: Multimodal Data Augmentation for Image Captioning …

200 papers

Visual recognition in a low-data regime is challenging and often prone to overfitting. To mitigate this issue, several data augmentation strategies have been proposed. However, standard transformations, e.g., rotation, cropping, and…

Computer Vision and Pattern Recognition · Computer Science 2023-11-08 Aniket Roy , Anshul Shah , Ketul Shah , Anirban Roy , Rama Chellappa

Current image captioning works usually focus on generating descriptions in an autoregressive manner. However, there are limited works that focus on generating descriptions non-autoregressively, which brings more decoding diversity. Inspired…

Computer Vision and Pattern Recognition · Computer Science 2023-05-23 Yufeng He , Zefan Cai , Xu Gan , Baobao Chang

Visual captioning aims to generate textual descriptions given images or videos. Traditionally, image captioning models are trained on human annotated datasets such as Flickr30k and MS-COCO, which are limited in size and diversity. This…

Computer Vision and Pattern Recognition · Computer Science 2021-03-01 Marimuthu Kalimuthu , Aditya Mogadala , Marius Mosbach , Dietrich Klakow

Image captioning is a multimodal problem that has drawn extensive attention in both the natural language processing and computer vision community. In this paper, we present a novel image captioning architecture to better explore semantics…

Computer Vision and Pattern Recognition · Computer Science 2020-06-23 Zhan Shi , Xu Zhou , Xipeng Qiu , Xiaodan Zhu

Text-to-image diffusion models achieved a remarkable leap in capabilities over the last few years, enabling high-quality and diverse synthesis of images from a textual prompt. However, even the most advanced models often struggle to…

Computer Vision and Pattern Recognition · Computer Science 2023-10-26 Eyal Segalis , Dani Valevski , Danny Lumen , Yossi Matias , Yaniv Leviathan

Recent advances on text-to-image generation have witnessed the rise of diffusion models which act as powerful generative models. Nevertheless, it is not trivial to exploit such latent variable models to capture the dependency among discrete…

Computer Vision and Pattern Recognition · Computer Science 2022-12-07 Jianjie Luo , Yehao Li , Yingwei Pan , Ting Yao , Jianlin Feng , Hongyang Chao , Tao Mei

The image captioning task is typically realized by an auto-regressive method that decodes the text tokens one by one. We present a diffusion-based captioning model, dubbed the name DDCap, to allow more decoding flexibility. Unlike image…

Computer Vision and Pattern Recognition · Computer Science 2022-12-12 Zixin Zhu , Yixuan Wei , Jianfeng Wang , Zhe Gan , Zheng Zhang , Le Wang , Gang Hua , Lijuan Wang , Zicheng Liu , Han Hu

This paper proposes a dataset augmentation method by fine-tuning pre-trained diffusion models. Generating images using a pre-trained diffusion model with textual conditioning often results in domain discrepancy between real data and…

Computer Vision and Pattern Recognition · Computer Science 2024-12-23 Abdullah Al Rahat , Hemanth Venkateswara

In computer vision, it is well-known that a lack of data diversity will impair model performance. In this study, we address the challenges of enhancing the dataset diversity problem in order to benefit various downstream tasks such as…

Computer Vision and Pattern Recognition · Computer Science 2024-08-02 Yuhang Li , Xin Dong , Chen Chen , Weiming Zhuang , Lingjuan Lyu

We present a novel data-efficient semi-supervised framework to improve the generalization of image captioning models. Constructing a large-scale labeled image captioning dataset is an expensive task in terms of labor, time, and cost. In…

Computer Vision and Pattern Recognition · Computer Science 2023-01-27 Dong-Jin Kim , Tae-Hyun Oh , Jinsoo Choi , In So Kweon

Images generated by diffusion models like Stable Diffusion are increasingly widespread. Recent works and even lawsuits have shown that these models are prone to replicating their training data, unbeknownst to the user. In this paper, we…

Machine Learning · Computer Science 2023-06-01 Gowthami Somepalli , Vasu Singla , Micah Goldblum , Jonas Geiping , Tom Goldstein

Visual attention plays an important role to understand images and demonstrates its effectiveness in generating natural language descriptions of images. On the other hand, recent studies show that language associated with an image can steer…

Computer Vision and Pattern Recognition · Computer Science 2016-12-13 Jonghwan Mun , Minsu Cho , Bohyung Han

Image data augmentation constitutes a critical methodology in modern computer vision tasks, since it can facilitate towards enhancing the diversity and quality of training datasets; thereby, improving the performance and robustness of…

Computer Vision and Pattern Recognition · Computer Science 2025-01-13 Panagiotis Alimisis , Ioannis Mademlis , Panagiotis Radoglou-Grammatikis , Panagiotis Sarigiannidis , Georgios Th. Papadopoulos

Generative diffusion models offer a natural choice for data augmentation when training complex vision models. However, ensuring reliability of their generative content as augmentation samples remains an open challenge. Despite a number of…

Computer Vision and Pattern Recognition · Computer Science 2025-03-17 Khawar Islam , Naveed Akhtar

Constructing an organized dataset comprised of a large number of images and several captions for each image is a laborious task, which requires vast human effort. On the other hand, collecting a large number of images and sentences…

Computer Vision and Pattern Recognition · Computer Science 2019-11-22 Dong-Jin Kim , Jinsoo Choi , Tae-Hyun Oh , In So Kweon

Large multimodal models demonstrate remarkable generalist ability to perform diverse multimodal tasks in a zero-shot manner. Large-scale web-based image-text pairs contribute fundamentally to this success, but suffer from excessive noise.…

Computer Vision and Pattern Recognition · Computer Science 2024-04-08 Qiying Yu , Quan Sun , Xiaosong Zhang , Yufeng Cui , Fan Zhang , Yue Cao , Xinlong Wang , Jingjing Liu

While impressive performance has been achieved in image captioning, the limited diversity of the generated captions and the large parameter scale remain major barriers to the real-word application of these systems. In this work, we propose…

Computer Vision and Pattern Recognition · Computer Science 2023-10-18 Guisheng Liu , Yi Li , Zhengcong Fei , Haiyan Fu , Xiangyang Luo , Yanqing Guo

Deep Learning models are incredibly data-hungry and require very large labeled datasets for supervised learning. As a consequence, these models often suffer from overfitting, limiting their ability to generalize to real-world examples.…

Computer Vision and Pattern Recognition · Computer Science 2025-09-16 Sahiti Yerramilli , Jayant Sravan Tamarapalli , Tanmay Girish Kulkarni , Jonathan Francis , Eric Nyberg

Image captioning models are becoming increasingly successful at describing the content of images in restricted domains. However, if these models are to function in the wild - for example, as assistants for people with impaired vision - a…

Computer Vision and Pattern Recognition · Computer Science 2018-11-29 Peter Anderson , Stephen Gould , Mark Johnson

Existing text-to-image diffusion models struggle to synthesize realistic images given dense captions, where each text prompt provides a detailed description for a specific image region. To address this, we propose DenseDiffusion, a…

Computer Vision and Pattern Recognition · Computer Science 2023-08-25 Yunji Kim , Jiyoung Lee , Jin-Hwa Kim , Jung-Woo Ha , Jun-Yan Zhu
‹ Prev 1 2 3 10 Next ›