Severity: Warning
Message: file_get_contents(https://...@gmail.com&api_key=61f08fa0b96a73de8c900d749fcb997acc09&a=1): Failed to open stream: HTTP request failed! HTTP/1.1 429 Too Many Requests
Filename: helpers/my_audit_helper.php
Line Number: 197
Backtrace:
File: /var/www/html/application/helpers/my_audit_helper.php
Line: 197
Function: file_get_contents
File: /var/www/html/application/helpers/my_audit_helper.php
Line: 271
Function: simplexml_load_file_from_url
File: /var/www/html/application/helpers/my_audit_helper.php
Line: 1075
Function: getPubMedXML
File: /var/www/html/application/helpers/my_audit_helper.php
Line: 3195
Function: GetPubMedArticleOutput_2016
File: /var/www/html/application/controllers/Detail.php
Line: 597
Function: pubMedSearch_Global
File: /var/www/html/application/controllers/Detail.php
Line: 511
Function: pubMedGetRelatedKeyword
File: /var/www/html/index.php
Line: 317
Function: require_once
98%
921
2 minutes
20
Current audio-driven binary interaction methods have limitations in capturing the uncertain relationship between a speaker's audio and an interlocutor's facial movements. To address this issue, we propose a video generation pipeline based on a cross-modal Transformer. First, a Transformer decoder partitions facial features into upper and lower regions, capturing lower features that are closely linked to the audio and upper features that remain independent of visual cues. Second, we design a cross-modal attention module that combines alignment bias with causal attention mechanisms to effectively manage subtle motion variations between adjacent frames in facial sequences. To mitigate uncertainties in long-term contexts, we expand the self-attention range of the Transformer encoder and integrate self-supervised pretrained speech representations to alleviate data scarcity. Finally, by optimizing the audio-to-action mapping and incorporating an enhanced neural renderer, we achieve fine control over facial movements while generating high-quality portrait images. Extensive experiments validate the effectiveness and superiority of our approach in interactive video generation.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1016/j.neunet.2025.107714 | DOI Listing |