Related papers: Robust Change Captioning

Finding It at Another Side: A Viewpoint-Adapted Matching Encoder for Change Captioning

Change Captioning is a task that aims to describe the difference between images with natural language. Most existing methods treat this problem as a difference judgment without the existence of distractors, such as viewpoint changes.…

Computer Vision and Pattern Recognition · Computer Science 2020-10-01 Xiangxi Shi , Xu Yang , Jiuxiang Gu , Shafiq Joty , Jianfei Cai

Distractors-Immune Representation Learning with Cross-modal Contrastive Regularization for Change Captioning

Change captioning aims to succinctly describe the semantic change between a pair of similar images, while being immune to distractors (illumination and viewpoint changes). Under these distractors, unchanged objects often appear pseudo…

Computer Vision and Pattern Recognition · Computer Science 2024-07-17 Yunbin Tu , Liang Li , Li Su , Chenggang Yan , Qingming Huang

Visual-aware Attention Dual-stream Decoder for Video Captioning

Video captioning is a challenging task that captures different visual parts and describes them in sentences, for it requires visual and linguistic coherence. The attention mechanism in the current video captioning method learns to assign…

Computer Vision and Pattern Recognition · Computer Science 2021-10-19 Zhixin Sun , Xian Zhong , Shuqin Chen , Lin Li , Luo Zhong

Attention-based transformer models for image captioning across languages: An in-depth survey and evaluation

Image captioning involves generating textual descriptions from input images, bridging the gap between computer vision and natural language processing. Recent advancements in transformer-based models have significantly improved caption…

Computer Vision and Pattern Recognition · Computer Science 2025-06-09 Israa A. Albadarneh , Bassam H. Hammo , Omar S. Al-Kadi

RobustScanner: Dynamically Enhancing Positional Clues for Robust Text Recognition

The attention-based encoder-decoder framework has recently achieved impressive results for scene text recognition, and many variants have emerged with improvements in recognition quality. However, it performs poorly on contextless texts…

Computer Vision and Pattern Recognition · Computer Science 2020-07-20 Xiaoyu Yue , Zhanghui Kuang , Chenhao Lin , Hongbin Sun , Wayne Zhang

Video Captioning with Text-based Dynamic Attention and Step-by-Step Learning

Automatically describing video content with natural language has been attracting much attention in CV and NLP communities. Most existing methods predict one word at a time, and by feeding the last generated word back as input at the next…

Computer Vision and Pattern Recognition · Computer Science 2019-11-06 Huanhou Xiao , Jinglun Shi

Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning

Attention-based neural encoder-decoder frameworks have been widely adopted for image captioning. Most methods force visual attention to be active for every generated word. However, the decoder likely requires little to no visual information…

Computer Vision and Pattern Recognition · Computer Science 2017-06-07 Jiasen Lu , Caiming Xiong , Devi Parikh , Richard Socher

DADA: Driver Attention Prediction in Driving Accident Scenarios

Driver attention prediction is becoming an essential research problem in human-like driving systems. This work makes an attempt to predict the driver attention in driving accident scenarios (DADA). However, challenges tread on the heels of…

Computer Vision and Pattern Recognition · Computer Science 2023-01-06 Jianwu Fang , Dingxin Yan , Jiahuan Qiao , Jianru Xue , Hongkai Yu

RDD: Robust Feature Detector and Descriptor using Deformable Transformer

As a core step in structure-from-motion and SLAM, robust feature detection and description under challenging scenarios such as significant viewpoint changes remain unresolved despite their ubiquity. While recent works have identified the…

Computer Vision and Pattern Recognition · Computer Science 2025-12-09 Gonglin Chen , Tianwen Fu , Haiwei Chen , Wenbin Teng , Hanyuan Xiao , Yajie Zhao

Dynamic Adaptive Attention and Supervised Contrastive Learning: A Novel Hybrid Framework for Text Sentiment Classification

The exponential growth of user-generated movie reviews on digital platforms has made accurate text sentiment classification a cornerstone task in natural language processing. Traditional models, including standard BERT and recurrent…

Computation and Language · Computer Science 2026-04-14 Qingyang Li

Spatial Attention as an Interface for Image Captioning Models

The internal workings of modern deep learning models stay often unclear to an external observer, although spatial attention mechanisms are involved. The idea of this work is to translate these spatial attentions into natural language to…

Computer Vision and Pattern Recognition · Computer Science 2020-10-23 Philipp Sadler

Describing and Localizing Multiple Changes with Transformers

Change captioning tasks aim to detect changes in image pairs observed before and after a scene change and generate a natural language description of the changes. Existing change captioning studies have mainly focused on a single…

Computer Vision and Pattern Recognition · Computer Science 2021-09-16 Yue Qiu , Shintaro Yamamoto , Kodai Nakashima , Ryota Suzuki , Kenji Iwata , Hirokatsu Kataoka , Yutaka Satoh

Delta Descriptors: Change-Based Place Representation for Robust Visual Localization

Visual place recognition is challenging because there are so many factors that can cause the appearance of a place to change, from day-night cycles to seasonal change to atmospheric conditions. In recent years a large range of approaches…

Computer Vision and Pattern Recognition · Computer Science 2020-07-31 Sourav Garg , Ben Harwood , Gaurangi Anand , Michael Milford

TRUST: Leveraging Text Robustness for Unsupervised Domain Adaptation

Recent unsupervised domain adaptation (UDA) methods have shown great success in addressing classical domain shifts (e.g., synthetic-to-real), but they still suffer under complex shifts (e.g. geographical shift), where both the background…

Computer Vision and Pattern Recognition · Computer Science 2025-08-11 Mattia Litrico , Mario Valerio Giuffrida , Sebastiano Battiato , Devis Tuia

DABERT: Dual Attention Enhanced BERT for Semantic Matching

Transformer-based pre-trained language models such as BERT have achieved remarkable results in Semantic Sentence Matching. However, existing models still suffer from insufficient ability to capture subtle differences. Minor noise like word…

Computation and Language · Computer Science 2023-04-17 Sirui Wang , Di Liang , Jian Song , Yuntao Li , Wei Wu

Text-guided Attention Model for Image Captioning

Visual attention plays an important role to understand images and demonstrates its effectiveness in generating natural language descriptions of images. On the other hand, recent studies show that language associated with an image can steer…

Computer Vision and Pattern Recognition · Computer Science 2016-12-13 Jonghwan Mun , Minsu Cho , Bohyung Han

Dual-Stream Attention with Multi-Modal Queries for Object Detection in Transportation Applications

Transformer-based object detectors often struggle with occlusions, fine-grained localization, and computational inefficiency caused by fixed queries and dense attention. We propose DAMM, Dual-stream Attention with Multi-Modal queries, a…

Computer Vision and Pattern Recognition · Computer Science 2025-08-08 Noreen Anwar , Guillaume-Alexandre Bilodeau , Wassim Bouachir

Robust Scene Change Detection Using Visual Foundation Models and Cross-Attention Mechanisms

We present a novel method for scene change detection that leverages the robust feature extraction capabilities of a visual foundational model, DINOv2, and integrates full-image cross-attention to address key challenges such as varying…

Computer Vision and Pattern Recognition · Computer Science 2025-03-05 Chun-Jung Lin , Sourav Garg , Tat-Jun Chin , Feras Dayoub

Context-Aware Doubly-Robust Semi-Supervised Learning

The widespread adoption of artificial intelligence (AI) in next-generation communication systems is challenged by the heterogeneity of traffic and network conditions, which call for the use of highly contextual, site-specific, data. A…

Signal Processing · Electrical Eng. & Systems 2025-06-27 Clement Ruah , Houssem Sifaou , Osvaldo Simeone , Bashir Al-Hashimi

Look and Modify: Modification Networks for Image Captioning

Attention-based neural encoder-decoder frameworks have been widely used for image captioning. Many of these frameworks deploy their full focus on generating the caption from scratch by relying solely on the image features or the object…

Computer Vision and Pattern Recognition · Computer Science 2020-03-10 Fawaz Sammani , Mahmoud Elsayed