Feature refinement and rethinking attention for remote sensing image captioning.

Yunpeng Li , Chengjin Tao , Meng Liu , Xiangrong Zhang , Guanchun Wang , Tianyang Zhang , Dong Zhao , Dabao Wang

Sci Rep

Remote Sensing Satellite Department, China Academy of Space Technology, Beijing, 100094, China.

Published: March 2025

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Effectively recognizing different regions of interest with attention mechanisms plays an important role in remote sensing image captioning task. However, these attention-driven models implicitly hypothesize that the focused region information is correct, which is too restrictive. Furthermore, the visual feature extractors will fail when facing weak correlation between objects. To address these issues, we propose a feature refinement and rethinking attention framework. Specifically, we firstly construct a feature refinement module by interacting grid-level features using refinement gate. It is noticeable that the irrelevant visual features from remote sensing images are weakened. Moreover, different from one attentive vector for inferring one word, the rethinking attention with rethinking LSTM layer is developed to spontaneously focus on different regions, when rethinking confidence is desirable. Thus, there are more than one region for predicting one word. Besides, the confidence rectification strategy is adopted to model rethinking attention for learn strongly discriminative contextual representation. We validate the designed framework on four datasets (i.e., NWPU-Captions, RSICD, UCM-Captions and Sydney-Captions). Extensive experiments show that our approach have superior performance and achieved significant improvements on the NWPU-Captions dataset.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11906612	PMC
http://dx.doi.org/10.1038/s41598-025-93125-y	DOI Listing

Publication Analysis

Top Keywords

rethinking attention

feature refinement

remote sensing

refinement rethinking

sensing image

image captioning

rethinking

attention

feature

attention remote

A PHP Error was encountered