Stereo-Talker: Audio-driven 3D Human Synthesis with Prior-Guided Mixture-of-Experts.

Xiang Deng , Youxin Pang , Xiaochen Zhao , Chao Xu , Lizhen Wang , Hongjiang Xiao , Shi Yan , Hongwen Zhang , Yebin Liu

IEEE Trans Pattern Anal Mach Intell

Published: August 2025

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

This paper introduces Stereo-Talker, a novel one-shot audio-driven human video synthesis system that generates 3D talking videos with precise lip synchronization, expressive body gestures, temporally consistent photo-realistic quality, and continuous viewpoint control. The process follows a two-stage approach. In the first stage, the system maps audio input to high-fidelity motion sequences, encompassing upper-body gestures and facial expressions. To enrich motion diversity and authenticity, large language model (LLM) priors are integrated with text-aligned semantic audio features, leveraging LLMs' cross-modal generalization power to enhance motion quality. In the second stage, we improve diffusion-based video generation models by incorporating a prior-guided Mixture-of-Experts (MoE) mechanism: a view-guided MoE focuses on view-specific attributes, while a mask-guided MoE enhances region-based rendering stability. Additionally, a mask prediction module is devised to derive human masks from motion data, enhancing the stability and accuracy of masks and enabling mask guiding during inference. We also introduce a comprehensive human video dataset with 2,203 identities, covering diverse body gestures and detailed annotations, facilitating broad generalization. The code, data, and pre-trained models will be released for research purposes on our https://xiang-deng00.github.io/stereo-talker.github.io/.

Download full-text PDF	Source
http://dx.doi.org/10.1109/TPAMI.2025.3596160	DOI Listing

Publication Analysis

Top Keywords

audio-driven human

prior-guided mixture-of-experts

human video

body gestures

stereo-talker audio-driven

human

human synthesis

synthesis prior-guided

mixture-of-experts paper

paper introduces

A PHP Error was encountered