Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Gaze estimation is an important indicator of human behavior that can be used for human assistance. Recent gaze estimation methods are primarily based on convolutional neural networks (CNNs) or attention Transformers. However, CNNs extract a limited local context while losing important global information, whereas attention mechanisms exhibit low utilization of multiscale hybrid features. To address these issues, we propose a novel nonlinear multi-head cross-attention network with programmable gradient information (MCA-PGI), which synthesizes the advantages of CNNs and the Transformer. The programmable gradient information is used to achieve reliable gradient propagation. An auxiliary branch is incorporated to integrate the gradient information, thereby retaining more original information than CNNs. In addition, nonlinear multi-head cross-attention is employed to fuse the global visual and multiscale hybrid features for more accurate gaze estimation. Experimental results on three publicly available datasets demonstrate that the proposed MCA-PGI exhibits strong competitiveness and outperforms most state-of-the-art methods, achieving 2.5% and 10.2% performance improvements on the MPIIFaceGaze and Eyediap datasets, respectively. The implementation code can be found at https://github.com/Yuhang-Hong/MCA-PGI .

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12297614PMC
http://dx.doi.org/10.1038/s41598-025-12466-wDOI Listing

Publication Analysis

Top Keywords

gaze estimation
16
nonlinear multi-head
12
multi-head cross-attention
12
programmable gradient
12
cross-attention network
8
network programmable
8
multiscale hybrid
8
hybrid features
8
gradient
5
gaze
4

Similar Publications

Understanding speech in noise depends on several interacting factors, including the signal-to-noise ratio (SNR), speech intelligibility (SI), and attentional engagement. However, how these factors relate to selective neural speech tracking remains unclear. In this study, we recorded EEG and eye-tracking data while participants performed a selective listening task involving a target talker in the presence of a competing masker talker and background noise across a wide range of SNRs.

View Article and Find Full Text PDF

Much research has focused on how perceptual, cognitive, and attentional processes modulate microsaccades, the small rapid gaze shifts that humans perform when at tempting to maintain steady gaze on a point. Yet the reasons why these fixational saccades occur in the first place have remained unclear. Long-standing theories have argued for either spatial (i.

View Article and Find Full Text PDF

It is prevalent to leverage unlabeled data to train deep learning models when it is difficult to collect large-scale annotated datasets. However, for 3D gaze estimation, most existing unsupervised learning methods face challenges in distinguishing subtle gaze-relevant information from dominant gaze-irrelevant information. To address this issue, we propose an unsupervised learning framework to disentangle the gaze-relevant and the gaze-irrelevant information, by seeking the shared information of a pair of input images with the same gaze and with the same eye respectively.

View Article and Find Full Text PDF

Purpose: This study presents and evaluates a hands-free eye-tracking interaction system aimed at empowering individuals with physical disabilities by facilitating inclusive digital access, in alignment with the United Nations Sustainable Development Goals (SDGs) 3 (Good Health and Well-being) and 10 (Reduced Inequalities).

Methods: The system's performance was assessed through accuracy testing, data transmission speed measurement, and frame rate stability analysis. Eye gestures were repeatedly recorded from a single user to evaluate system accuracy and responsiveness.

View Article and Find Full Text PDF

This paper introduces a novel framework for biometric person identification based on distinctive eye movement patterns. Grounded in foraging theory, the approach leverages the Ornstein-Uhlenbeck (O-U) process to model the dynamics of visual exploration and exploitation during gaze behavior. Eye movement data, including fixations and saccades, is analyzed using Bayesian estimation of a stochastic differential equation to extract individual-specific features.

View Article and Find Full Text PDF