English
Related papers

Related papers: Dependent Multi-Task Learning with Causal Interven…

200 papers

Recently it has shown that the policy-gradient methods for reinforcement learning have been utilized to train deep end-to-end systems on natural language processing tasks. What's more, with the complexity of understanding image content and…

Computer Vision and Pattern Recognition · Computer Science 2018-09-14 Haichao Shi , Peng Li , Bo Wang , Zhenyu Wang

We learn visual features by captioning images with an image-conditioned masked diffusion language model, a formulation we call masked diffusion captioning (MDC). During training, text tokens in each image-caption pair are masked at a…

Computer Vision and Pattern Recognition · Computer Science 2025-10-31 Chao Feng , Zihao Wei , Andrew Owens

Image captioning is a challenging problem owing to the complexity in understanding the image content and diverse ways of describing it in natural language. Recent advances in deep neural networks have substantially improved the performance…

Computer Vision and Pattern Recognition · Computer Science 2017-04-14 Zhou Ren , Xiaoyu Wang , Ning Zhang , Xutao Lv , Li-Jia Li

Image captioning is a multimodal problem that has drawn extensive attention in both the natural language processing and computer vision community. In this paper, we present a novel image captioning architecture to better explore semantics…

Computer Vision and Pattern Recognition · Computer Science 2020-06-23 Zhan Shi , Xu Zhou , Xipeng Qiu , Xiaodan Zhu

The Image Difference Captioning (IDC) task aims to describe the visual differences between two similar images with natural language. The major challenges of this task lie in two aspects: 1) fine-grained visual differences that require…

Multimedia · Computer Science 2022-02-10 Linli Yao , Weiying Wang , Qin Jin

While advanced image captioning systems are increasingly describing images coherently and exactly, recent progress in continual learning allows deep learning models to avoid catastrophic forgetting. However, the domain where image…

Computer Vision and Pattern Recognition · Computer Science 2020-04-22 Giang Nguyen , Tae Joon Jun , Trung Tran , Tolcha Yalew , Daeyoung Kim

Observing a set of images and their corresponding paragraph-captions, a challenging task is to learn how to produce a semantically coherent paragraph to describe the visual content of an image. Inspired by recent successes in integrating…

Computer Vision and Pattern Recognition · Computer Science 2022-07-27 Dandan Guo , Ruiying Lu , Bo Chen , Zequn Zeng , Mingyuan Zhou

Over the years, state-of-the-art (SoTA) image captioning methods have achieved promising results on some evaluation metrics (e.g., CIDEr). However, recent findings show that the captions generated by these methods tend to be biased toward…

Computer Vision and Pattern Recognition · Computer Science 2023-08-16 Qi Chen , Chaorui Deng , Qi Wu

Existing dense or paragraph video captioning approaches rely on holistic representations of videos, possibly coupled with learned object/action representations, to condition hierarchical language decoders. However, they fundamentally lack…

Computer Vision and Pattern Recognition · Computer Science 2024-01-10 Shih-Han Chou , James J. Little , Leonid Sigal

Training Large Multimodality Models (LMMs) relies on descriptive image caption that connects image and language. Existing methods for generating such captions often rely on distilling the captions from pretrained LMMs, constructing them…

Computer Vision and Pattern Recognition · Computer Science 2026-01-28 Yanpeng Sun , Jing Hao , Ke Zhu , Jiang-Jiang Liu , Yuxiang Zhao , Xiaofan Li , Na Zhao , Zechao Li , Jingdong Wang

Research on continual learning has led to a variety of approaches to mitigating catastrophic forgetting in feed-forward classification networks. Until now surprisingly little attention has been focused on continual learning of recurrent…

Computer Vision and Pattern Recognition · Computer Science 2020-10-30 Riccardo Del Chiaro , Bartłomiej Twardowski , Andrew D. Bagdanov , Joost van de Weijer

Automatically generating descriptive captions for images is a well-researched area in computer vision. However, existing evaluation approaches focus on measuring the similarity between two sentences disregarding fine-grained semantics of…

Computer Vision and Pattern Recognition · Computer Science 2019-08-07 Philipp Harzig , Dan Zecha , Rainer Lienhart , Carolin Kaiser , René Schallner

Multilingual image captioning has recently been tackled by training with large-scale machine translated data, which is an expensive, noisy, and time-consuming process. Without requiring any multilingual caption data, we propose LMCap, an…

Computation and Language · Computer Science 2023-06-01 Rita Ramos , Bruno Martins , Desmond Elliott

While deep-learning models have been shown to perform well on image-to-text datasets, it is difficult to use them in practice for captioning images. This is because captions traditionally tend to be context-dependent and offer complementary…

Machine Learning · Computer Science 2023-06-07 Shinjini Ghosh , Sagnik Anupam

The existing image captioning approaches typically train a one-stage sentence decoder, which is difficult to generate rich fine-grained descriptions. On the other hand, multi-stage image caption model is hard to train due to the vanishing…

Computer Vision and Pattern Recognition · Computer Science 2018-03-15 Jiuxiang Gu , Jianfei Cai , Gang Wang , Tsuhan Chen

High-quality image captions play a crucial role in improving the performance of cross-modal applications such as text-to-image generation, text-to-video generation, and text-image retrieval. To generate long-form, high-quality captions,…

Computer Vision and Pattern Recognition · Computer Science 2025-04-10 Ruotian Peng , Haiying He , Yake Wei , Yandong Wen , Di Hu

Generating a description of an image is called image captioning. Image captioning requires to recognize the important objects, their attributes and their relationships in an image. It also needs to generate syntactically and semantically…

Computer Vision and Pattern Recognition · Computer Science 2018-10-16 Md. Zakir Hossain , Ferdous Sohel , Mohd Fairuz Shiratuddin , Hamid Laga

Image captioning models are becoming increasingly successful at describing the content of images in restricted domains. However, if these models are to function in the wild - for example, as assistants for people with impaired vision - a…

Computer Vision and Pattern Recognition · Computer Science 2018-11-29 Peter Anderson , Stephen Gould , Mark Johnson

Existing image captioning systems are dedicated to generating narrative captions for images, which are spatially detached from the image in presentation. However, texts can also be used as decorations on the image to highlight the key…

Computer Vision and Pattern Recognition · Computer Science 2022-04-28 Yiqi Gao , Xinglin Hou , Yuanmeng Zhang , Tiezheng Ge , Yuning Jiang , Peng Wang

Automatically generating natural language descriptions from an image is a challenging problem in artificial intelligence that requires a good understanding of the visual and textual signals and the correlations between them. The…

Computation and Language · Computer Science 2020-08-07 Arushi Goel , Basura Fernando , Thanh-Son Nguyen , Hakan Bilen
‹ Prev 1 2 3 10 Next ›