Nonlinear multi-head cross-attention network and programmable gradient information for gaze estimation.

Yujie Li , Yuhang Hong , Ziwen Wang , Jiahui Chen , Rongjie Liu , Shuxue Ding , Benying Tan

Sci Rep

School of Artificial Intelligence, Guilin University of Eletronic Technology, Guilin, 541004, Guangxi, China.

Published: July 2025

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Gaze estimation is an important indicator of human behavior that can be used for human assistance. Recent gaze estimation methods are primarily based on convolutional neural networks (CNNs) or attention Transformers. However, CNNs extract a limited local context while losing important global information, whereas attention mechanisms exhibit low utilization of multiscale hybrid features. To address these issues, we propose a novel nonlinear multi-head cross-attention network with programmable gradient information (MCA-PGI), which synthesizes the advantages of CNNs and the Transformer. The programmable gradient information is used to achieve reliable gradient propagation. An auxiliary branch is incorporated to integrate the gradient information, thereby retaining more original information than CNNs. In addition, nonlinear multi-head cross-attention is employed to fuse the global visual and multiscale hybrid features for more accurate gaze estimation. Experimental results on three publicly available datasets demonstrate that the proposed MCA-PGI exhibits strong competitiveness and outperforms most state-of-the-art methods, achieving 2.5% and 10.2% performance improvements on the MPIIFaceGaze and Eyediap datasets, respectively. The implementation code can be found at https://github.com/Yuhang-Hong/MCA-PGI .

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12297614	PMC
http://dx.doi.org/10.1038/s41598-025-12466-w	DOI Listing

Publication Analysis

Top Keywords

gaze estimation

nonlinear multi-head

multi-head cross-attention

programmable gradient

cross-attention network

network programmable

multiscale hybrid

hybrid features

gradient

gaze

Similar Publications

Neural speech tracking in noise reflects the opposing influence of SNR on intelligibility and attentional effort.

Imaging Neurosci (Camb)

August 2025

Department of Electrical Engineering, Columbia University, New York, NY, United States.

Xiaomin He , Vinay S Raghavan , Nima Mesgarani

Understanding speech in noise depends on several interacting factors, including the signal-to-noise ratio (SNR), speech intelligibility (SI), and attentional engagement. However, how these factors relate to selective neural speech tracking remains unclear. In this study, we recorded EEG and eye-tracking data while participants performed a selective listening task involving a target talker in the presence of a competing masker talker and background noise across a wide range of SNRs.

View Article and Find Full Text PDF

Similar Publications

Spatial and Temporal Factors Influencing Fixational Saccades.

J Neurosci

August 2025

Center for Visual Science, University of Rochester, Rochester, NY, 14627.

Jie Z Wang , Claudia Cherici , Michele Rucci

Much research has focused on how perceptual, cognitive, and attentional processes modulate microsaccades, the small rapid gaze shifts that humans perform when at tempting to maintain steady gaze on a point. Yet the reasons why these fixational saccades occur in the first place have remained unclear. Long-standing theories have argued for either spatial (i.

View Article and Find Full Text PDF

Similar Publications

Unsupervised Gaze Representation Learning by Switching Features.

IEEE Trans Pattern Anal Mach Intell

August 2025

Yunjia Sun , Jiabei Zeng , Shiguang Shan , Xilin Chen

It is prevalent to leverage unlabeled data to train deep learning models when it is difficult to collect large-scale annotated datasets. However, for 3D gaze estimation, most existing unsupervised learning methods face challenges in distinguishing subtle gaze-relevant information from dominant gaze-irrelevant information. To address this issue, we propose an unsupervised learning framework to disentangle the gaze-relevant and the gaze-irrelevant information, by seeking the shared information of a pair of input images with the same gaze and with the same eye respectively.

View Article and Find Full Text PDF

Similar Publications

Hands-free interaction system using eye tracking for people with physical disabilities.

Disabil Rehabil Assist Technol

August 2025

Computer Engineering Department, Faculty of Engineering, Bina Nusantara University, Jakarta, Indonesia.

Marcel Saputra , Kayne Raphael Hadipoespito , Denzel Polantika , John Reigton Hartono , Sherrane Kuo

Purpose: This study presents and evaluates a hands-free eye-tracking interaction system aimed at empowering individuals with physical disabilities by facilitating inclusive digital access, in alignment with the United Nations Sustainable Development Goals (SDGs) 3 (Good Health and Well-being) and 10 (Reduced Inequalities).

Methods: The system's performance was assessed through accuracy testing, data transmission speed measurement, and frame rate stability analysis. Eye gestures were repeatedly recorded from a single user to evaluate system accuracy and responsiveness.

View Article and Find Full Text PDF

Similar Publications

Multi-model fusion of Ornstein-Uhlenbeck process and RNN-XGBoost for biometric person identification using eye movement patterns.

Comput Biol Med

September 2025

Rajkiya Engineering College, Kannauj, India. Electronic address:

Vivek Srivastava , Sakshi Patel

This paper introduces a novel framework for biometric person identification based on distinctive eye movement patterns. Grounded in foraging theory, the approach leverages the Ornstein-Uhlenbeck (O-U) process to model the dynamics of visual exploration and exploitation during gaze behavior. Eye movement data, including fixations and saccades, is analyzed using Bayesian estimation of a stochastic differential equation to extract individual-specific features.

View Article and Find Full Text PDF

Similar Publications