Related papers: MsEdF: A Multi-stream Encoder-decoder Framework fo…

JSSFF: A Joint Structural-Semantic Fusion Framework for Remote Sensing Image Captioning

The encoder-decoder framework has become widely popular nowadays. In this model, the encoder extracts informative visual features from an input image, and the decoder employs a sequence-to-sequence formulation to generate the corresponding…

Computer Vision and Pattern Recognition · Computer Science 2026-04-28 Swadhin Das , Vivek Yadav

Good Representation, Better Explanation: Role of Convolutional Neural Networks in Transformer-Based Remote Sensing Image Captioning

Remote Sensing Image Captioning (RSIC) is the process of generating meaningful descriptions from remote sensing images. Recently, it has gained significant attention, with encoder-decoder models serving as the backbone for generating…

Computer Vision and Pattern Recognition · Computer Science 2025-07-04 Swadhin Das , Saarthak Gupta , Kamal Kumar , Raksha Sharma

SEMT: Static-Expansion-Mesh Transformer Network Architecture for Remote Sensing Image Captioning

Image captioning has emerged as a crucial task in the intersection of computer vision and natural language processing, enabling automated generation of descriptive text from visual content. In the context of remote sensing, image captioning…

Computer Vision and Pattern Recognition · Computer Science 2025-07-18 Khang Truong , Lam Pham , Hieu Tang , Jasmin Lampert , Martin Boyer , Son Phan , Truong Nguyen

A TextGCN-Based Decoding Approach for Improving Remote Sensing Image Captioning

Remote sensing images are highly valued for their ability to address complex real-world issues such as risk management, security, and meteorology. However, manually captioning these images is challenging and requires specialized knowledge…

Machine Learning · Computer Science 2025-02-07 Swadhin Das , Raksha Sharma

MV-CC: Mask Enhanced Video Model for Remote Sensing Change Caption

Remote sensing image change caption (RSICC) aims to provide natural language descriptions for bi-temporal remote sensing images. Since Change Caption (CC) task requires both spatial and temporal features, previous works follow an…

Computer Vision and Pattern Recognition · Computer Science 2024-11-01 Ruixun Liu , Kaiyu Li , Jiayi Song , Dongwei Sun , Xiangyong Cao

Robust Change Captioning in Remote Sensing: SECOND-CC Dataset and MModalCC Framework

Remote sensing change captioning (RSICC) aims to describe changes between bitemporal images in natural language. Existing methods often fail under challenges like illumination differences, viewpoint changes, blur effects, leading to…

Computer Vision and Pattern Recognition · Computer Science 2025-08-26 Ali Can Karaca , M. Enes Ozelbas , Saadettin Berber , Orkhan Karimli , Turabi Yildirim , M. Fatih Amasyali

Semantic-CC: Boosting Remote Sensing Image Change Captioning via Foundational Knowledge and Semantic Guidance

Remote sensing image change captioning (RSICC) aims to articulate the changes in objects of interest within bi-temporal remote sensing images using natural language. Given the limitations of current RSICC methods in expressing general…

Computer Vision and Pattern Recognition · Computer Science 2024-07-22 Yongshuo Zhu , Lu Li , Keyan Chen , Chenyang Liu , Fugen Zhou , Zhenwei Shi

Multi-Receptive Field Ensemble with Cross-Entropy Masking for Class Imbalance in Remote Sensing Change Detection

Remote sensing change detection (RSCD) is a complex task, where changes often appear at different scales and orientations. Convolutional neural networks (CNNs) are good at capturing local spatial patterns but cannot model global semantics…

Computer Vision and Pattern Recognition · Computer Science 2026-01-19 Humza Naveed , Xina Zeng , Mitch Bryson , Nagita Mehrseresht

Enhancing image captioning with depth information using a Transformer-based framework

Captioning images is a challenging scene-understanding task that connects computer vision and natural language processing. While image captioning models have been successful in producing excellent descriptions, the field has primarily…

Computer Vision and Pattern Recognition · Computer Science 2023-08-09 Aya Mahmoud Ahmed , Mohamed Yousef , Khaled F. Hussain , Yousef Bassyouni Mahdy

MSSDF: Modality-Shared Self-supervised Distillation for High-Resolution Multi-modal Remote Sensing Image Learning

Remote sensing image interpretation plays a critical role in environmental monitoring, urban planning, and disaster assessment. However, acquiring high-quality labeled data is often costly and time-consuming. To address this challenge, we…

Computer Vision and Pattern Recognition · Computer Science 2026-01-27 Tong Wang , Guanzhou Chen , Xiaodong Zhang , Chenxi Liu , Jiaqi Wang , Xiaoliang Tan , Wenchao Guo , Qingyuan Yang , Kaiqi Zhang

A Novel Lightweight Transformer with Edge-Aware Fusion for Remote Sensing Image Captioning

Transformer-based models have achieved strong performance in remote sensing image captioning by capturing long-range dependencies and contextual information. However, their practical deployment is hindered by high computational costs,…

Computer Vision and Pattern Recognition · Computer Science 2025-06-12 Swadhin Das , Divyansh Mundra , Priyanshu Dayal , Raksha Sharma

Enhancing Perception of Key Changes in Remote Sensing Image Change Captioning

Recently, while significant progress has been made in remote sensing image change captioning, existing methods fail to filter out areas unrelated to actual changes, making models susceptible to irrelevant features. In this article, we…

Computer Vision and Pattern Recognition · Computer Science 2024-09-20 Cong Yang , Zuchao Li , Hongzan Jiao , Zhi Gao , Lefei Zhang

A Novel Actor Dual-Critic Model for Remote Sensing Image Captioning

We deal with the problem of generating textual captions from optical remote sensing (RS) images using the notion of deep reinforcement learning. Due to the high inter-class similarity in reference sentences describing remote sensing data,…

Computer Vision and Pattern Recognition · Computer Science 2020-10-06 Ruchika Chavhan , Biplab Banerjee , Xiao Xiang Zhu , Subhasis Chaudhuri

Representation and Correlation Enhanced Encoder-Decoder Framework for Scene Text Recognition

Attention-based encoder-decoder framework is widely used in the scene text recognition task. However, for the current state-of-the-art(SOTA) methods, there is room for improvement in terms of the efficient usage of local visual and global…

Computer Vision and Pattern Recognition · Computer Science 2021-11-16 Mengmeng Cui , Wei Wang , Jinjin Zhang , Liang Wang

Semantic-Spatial Feature Fusion with Dynamic Graph Refinement for Remote Sensing Image Captioning

Remote sensing image captioning aims to generate semantically accurate descriptions that are closely linked to the visual features of remote sensing images. Existing approaches typically emphasize fine-grained extraction of visual features…

Computer Vision and Pattern Recognition · Computer Science 2025-04-01 Maofu Liu , Jiahui Liu , Xiaokang Zhang

Compressed Image Captioning using CNN-based Encoder-Decoder Framework

In today's world, image processing plays a crucial role across various fields, from scientific research to industrial applications. But one particularly exciting application is image captioning. The potential impact of effective image…

Computer Vision and Pattern Recognition · Computer Science 2024-04-30 Md Alif Rahman Ridoy , M Mahmud Hasan , Shovon Bhowmick

RSRefSeg 2: Decoupling Referring Remote Sensing Image Segmentation with Foundation Models

Referring Remote Sensing Image Segmentation provides a flexible and fine-grained framework for remote sensing scene analysis via vision-language collaborative interpretation. Current approaches predominantly utilize a three-stage pipeline…

Computer Vision and Pattern Recognition · Computer Science 2026-03-19 Keyan Chen , Chenyang Liu , Bowen Chen , Jiafan Zhang , Zhengxia Zou , Zhenwei Shi

CCExpert: Advancing MLLM Capability in Remote Sensing Change Captioning with Difference-Aware Integration and a Foundational Dataset

Remote Sensing Image Change Captioning (RSICC) aims to generate natural language descriptions of surface changes between multi-temporal remote sensing images, detailing the categories, locations, and dynamics of changed objects (e.g.,…

Computer Vision and Pattern Recognition · Computer Science 2024-11-19 Zhiming Wang , Mingze Wang , Sheng Xu , Yanjing Li , Baochang Zhang

RSRefSeg: Referring Remote Sensing Image Segmentation with Foundation Models

Referring remote sensing image segmentation is crucial for achieving fine-grained visual understanding through free-format textual input, enabling enhanced scene and object extraction in remote sensing applications. Current research…

Computer Vision and Pattern Recognition · Computer Science 2025-01-14 Keyan Chen , Jiafan Zhang , Chenyang Liu , Zhengxia Zou , Zhenwei Shi

SIMAC: A Semantic-Driven Integrated Multimodal Sensing And Communication Framework

Traditional single-modality sensing faces limitations in accuracy and capability, and its decoupled implementation with communication systems increases latency in bandwidth-constrained environments. Additionally, single-task-oriented…

Machine Learning · Computer Science 2025-03-13 Yubo Peng , Luping Xiang , Kun Yang , Feibo Jiang , Kezhi Wang , Dapeng Oliver Wu