Gloss Prior Guided Visual Feature Learning for Continuous Sign Language Recognition.

Leming Guo , Wanli Xue , Bo Liu , Kaihua Zhang , Tiantian Yuan , Dimitris Metaxas

IEEE Trans Image Process

Published: June 2024

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Continuous sign language recognition (CSLR) is to recognize the glosses in a sign language video. Enhancing the generalization ability of CSLR's visual feature extractor is a worthy area of investigation. In this paper, we model glosses as priors that help to learn more generalizable visual features. Specifically, the signer-invariant gloss feature is extracted by a pre-trained gloss BERT model. Then we design a gloss prior guidance network (GPGN). It contains a novel parallel densely-connected temporal feature extraction (PDC-TFE) module for multi-resolution visual feature extraction. The PDC-TFE captures the complex temporal patterns of the glosses. The pre-trained gloss feature guides the visual feature learning through a cross-modality matching loss. We propose to formulate the cross-modality feature matching into a regularized optimal transport problem, it can be efficiently solved by a variant of the Sinkhorn algorithm. The GPGN parameters are learned by optimizing a weighted sum of the cross-modality matching loss and CTC loss. The experiment results on German and Chinese sign language benchmarks demonstrate that the proposed GPGN achieves competitive performance. The ablation study verifies the effectiveness of several critical components of the GPGN. Furthermore, the proposed pre-trained gloss BERT model and cross-modality matching can be seamlessly integrated into other RGB-cue-based CSLR methods as plug-and-play formulations to enhance the generalization ability of the visual feature extractor.

Download full-text PDF	Source
http://dx.doi.org/10.1109/TIP.2024.3404869	DOI Listing

Publication Analysis

Top Keywords

visual feature

sign language

pre-trained gloss

cross-modality matching

feature

gloss prior

feature learning

continuous sign

language recognition

generalization ability

A PHP Error was encountered