98%
921
2 minutes
20
Pointly supervised instance segmentation (PSIS) remains a challenging task when appearance variances across object parts cause semantic inconsistency. In this article, we propose a hierarchical AttentionShift approach, to solve the semantic inconsistency issue through exploiting the hierarchical nature of semantics and the flexibility of key-point representation. The estimation of hierarchical attention is defined upon key-point sets. The representative key points are iteratively estimated spatially and in the feature space to capture the fine-grained semantics and cover the full object extent. Hierarchical AttentionShift is performed at instance, part, and fine-grained levels, optimizing object semantics while promoting the conventional self-attention activation to hierarchical activation with local refinement. Experiments on PASCAL VOC 2012 Aug and MS-COCO 2017 benchmarks show that hierarchical AttentionShift improves the state-of-the-art (SOTA) method by 10.4% and 7.0% upon mean average precision (mAP)50, respectively. When applying hierarchical AttentionShift to the segment anything model (SAM), 9.4% AP improvement on the COCO test-dev is achieved. Hierarchical AttentionShift provides a fresh insight to regularize the self-attention mechanism for fine-grained vision tasks. The code is available at github.com/MingXiangL/AttentionShift.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1109/TNNLS.2025.3526961 | DOI Listing |
IEEE Trans Neural Netw Learn Syst
August 2025
Pointly supervised instance segmentation (PSIS) remains a challenging task when appearance variances across object parts cause semantic inconsistency. In this article, we propose a hierarchical AttentionShift approach, to solve the semantic inconsistency issue through exploiting the hierarchical nature of semantics and the flexibility of key-point representation. The estimation of hierarchical attention is defined upon key-point sets.
View Article and Find Full Text PDF