Related papers: Better Captioning with Sequence-Level Exploration

Partially-Supervised Image Captioning

Image captioning models are becoming increasingly successful at describing the content of images in restricted domains. However, if these models are to function in the wild - for example, as assistants for people with impaired vision - a…

Computer Vision and Pattern Recognition · Computer Science 2018-11-29 Peter Anderson , Stephen Gould , Mark Johnson

A Semantics-Assisted Video Captioning Model Trained with Scheduled Sampling

Given the features of a video, recurrent neural networks can be used to automatically generate a caption for the video. Existing methods for video captioning have at least three limitations. First, semantic information has been widely…

Computer Vision and Pattern Recognition · Computer Science 2021-02-15 Haoran Chen , Ke Lin , Alexander Maye , Jianming Li , Xiaolin Hu

Discriminability objective for training descriptive captions

One property that remains lacking in image captions generated by contemporary methods is discriminability: being able to tell two images apart given the caption for one of them. We propose a way to improve this aspect of caption generation.…

Computer Vision and Pattern Recognition · Computer Science 2018-06-12 Ruotian Luo , Brian Price , Scott Cohen , Gregory Shakhnarovich

End-to-end Dense Video Captioning as Sequence Generation

Dense video captioning aims to identify the events of interest in an input video, and generate descriptive captions for each event. Previous approaches usually follow a two-stage generative process, which first proposes a segment for each…

Computer Vision and Pattern Recognition · Computer Science 2022-09-19 Wanrong Zhu , Bo Pang , Ashish V. Thapliyal , William Yang Wang , Radu Soricut

Attention Correctness in Neural Image Captioning

Attention mechanisms have recently been introduced in deep learning for various tasks in natural language processing and computer vision. But despite their popularity, the "correctness" of the implicitly-learned attention maps has only been…

Computer Vision and Pattern Recognition · Computer Science 2016-11-24 Chenxi Liu , Junhua Mao , Fei Sha , Alan Yuille

Reinforced Video Captioning with Entailment Rewards

Sequence-to-sequence models have shown promising improvements on the temporal task of video captioning, but they optimize word-level cross-entropy loss during training. First, using policy gradient and mixed-loss methods for reinforcement…

Computation and Language · Computer Science 2017-08-09 Ramakanth Pasunuru , Mohit Bansal

Deep Learning Approaches on Image Captioning: A Review

Image captioning is a research area of immense importance, aiming to generate natural language descriptions for visual content in the form of still images. The advent of deep learning and more recently vision-language pre-training…

Computer Vision and Pattern Recognition · Computer Science 2023-08-29 Taraneh Ghandi , Hamidreza Pourreza , Hamidreza Mahyar

Using Image Captions and Multitask Learning for Recommending Query Reformulations

Interactive search sessions often contain multiple queries, where the user submits a reformulated version of the previous query in response to the original results. We aim to enhance the query recommendation experience for a commercial…

Information Retrieval · Computer Science 2020-03-03 Gaurav Verma , Vishwa Vinay , Sahil Bansal , Shashank Oberoi , Makkunda Sharma , Prakhar Gupta

What's the Point: Semantic Segmentation with Point Supervision

The semantic image segmentation task presents a trade-off between test time accuracy and training-time annotation cost. Detailed per-pixel annotations enable training accurate models but are very time-consuming to obtain, image-level class…

Computer Vision and Pattern Recognition · Computer Science 2016-07-26 Amy Bearman , Olga Russakovsky , Vittorio Ferrari , Li Fei-Fei

CCCaption: Dual-Reward Reinforcement Learning for Complete and Correct Image Captioning

Image captioning remains a fundamental task for vision language understanding, yet ground-truth supervision still relies predominantly on human-annotated references. Because human annotations reflect subjective preferences and expertise,…

Computer Vision and Pattern Recognition · Computer Science 2026-03-31 Zhijiang Tang , Linhua Wang , Jiaxin Qi , Weihao Jiang , Peng Hou , Anxiang Zeng , Jianqiang Huang

Contrastive Learning for Image Captioning

Image captioning, a popular topic in computer vision, has achieved substantial progress in recent years. However, the distinctiveness of natural descriptions is often overlooked in previous work. It is closely related to the quality of…

Computer Vision and Pattern Recognition · Computer Science 2017-10-10 Bo Dai , Dahua Lin

Hierarchical Modular Network for Video Captioning

Video captioning aims to generate natural language descriptions according to the content, where representation learning plays a crucial role. Existing methods are mainly developed within the supervised learning framework via word-by-word…

Computer Vision and Pattern Recognition · Computer Science 2022-03-11 Hanhua Ye , Guorong Li , Yuankai Qi , Shuhui Wang , Qingming Huang , Ming-Hsuan Yang

Towards Retrieval-Augmented Architectures for Image Captioning

The objective of image captioning models is to bridge the gap between the visual and linguistic modalities by generating natural language descriptions that accurately reflect the content of input images. In recent years, researchers have…

Computer Vision and Pattern Recognition · Computer Science 2024-05-24 Sara Sarto , Marcella Cornia , Lorenzo Baraldi , Alessandro Nicolosi , Rita Cucchiara

Guiding Attention using Partial-Order Relationships for Image Captioning

The use of attention models for automated image captioning has enabled many systems to produce accurate and meaningful descriptions for images. Over the years, many novel approaches have been proposed to enhance the attention process using…

Computer Vision and Pattern Recognition · Computer Science 2022-04-18 Murad Popattia , Muhammad Rafi , Rizwan Qureshi , Shah Nawaz

Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks

Recurrent Neural Networks can be trained to produce sequences of tokens given some input, as exemplified by recent results in machine translation and image captioning. The current approach to training them consists of maximizing the…

Machine Learning · Computer Science 2015-09-24 Samy Bengio , Oriol Vinyals , Navdeep Jaitly , Noam Shazeer

An Optimization Framework for Task Sequencing in Curriculum Learning

Curriculum learning in reinforcement learning is used to shape exploration by presenting the agent with increasingly complex tasks. The idea of curriculum learning has been largely applied in both animal training and pedagogy. In…

Machine Learning · Computer Science 2019-06-14 Francesco Foglino , Christiano Coletto Christakou , Matteo Leonetti

Sequence-level Large Language Model Training with Contrastive Preference Optimization

The next token prediction loss is the dominant self-supervised training objective for large language models and has achieved promising results in a variety of downstream tasks. However, upon closer investigation of this objective, we find…

Computation and Language · Computer Science 2025-02-25 Zhili Feng , Dhananjay Ram , Cole Hawkins , Aditya Rawal , Jinman Zhao , Sheng Zha

Boost Image Captioning with Knowledge Reasoning

Automatically generating a human-like description for a given image is a potential research in artificial intelligence, which has attracted a great of attention recently. Most of the existing attention methods explore the mapping…

Computer Vision and Pattern Recognition · Computer Science 2020-11-03 Feicheng Huang , Zhixin Li , Haiyang Wei , Canlong Zhang , Huifang Ma

Boosted Attention: Leveraging Human Attention for Image Captioning

Visual attention has shown usefulness in image captioning, with the goal of enabling a caption model to selectively focus on regions of interest. Existing models typically rely on top-down language information and learn attention implicitly…

Computer Vision and Pattern Recognition · Computer Science 2019-04-02 Shi Chen , Qi Zhao

A Thorough Review on Recent Deep Learning Methodologies for Image Captioning

Image Captioning is a task that combines computer vision and natural language processing, where it aims to generate descriptive legends for images. It is a two-fold process relying on accurate image understanding and correct language…

Computer Vision and Pattern Recognition · Computer Science 2021-07-29 Ahmed Elhagry , Karima Kadaoui