Severity: Warning
Message: file_get_contents(https://...@gmail.com&api_key=61f08fa0b96a73de8c900d749fcb997acc09&a=1): Failed to open stream: HTTP request failed! HTTP/1.1 429 Too Many Requests
Filename: helpers/my_audit_helper.php
Line Number: 197
Backtrace:
File: /var/www/html/application/helpers/my_audit_helper.php
Line: 197
Function: file_get_contents
File: /var/www/html/application/helpers/my_audit_helper.php
Line: 271
Function: simplexml_load_file_from_url
File: /var/www/html/application/helpers/my_audit_helper.php
Line: 3165
Function: getPubMedXML
File: /var/www/html/application/controllers/Detail.php
Line: 597
Function: pubMedSearch_Global
File: /var/www/html/application/controllers/Detail.php
Line: 511
Function: pubMedGetRelatedKeyword
File: /var/www/html/index.php
Line: 317
Function: require_once
98%
921
2 minutes
20
Although Singing Voice Synthesis (SVS) has revolutionized audio content creation, global linguistic diversity remains challenging. Current SVS research shows scant exploration of cross-lingual generalization, as fragmented, language-specific phoneme encodings (e.g., Pinyin, ARPA) hinder unified phonetic modeling. To address this challenge, we built a four-language dataset based on GTSinger's speech data, using the International Phonetic Alphabet (IPA) for consistent phonetic representation and applying precise segmentation and calibration for improved quality. In particular, we propose a novel method of decomposing IPA phonemes into letters and diacritics, enabling the model to deeply learn the underlying rules of pronunciation and achieve better generalization. A dynamic IPA adaptation strategy further enables the application of learned phonetic representations to unseen languages. Based on VISinger2, we introduce Transinger, an innovative cross-lingual synthesis framework. Transinger achieves breakthroughs in phoneme representation learning by precisely modeling pronunciation, which effectively enables compositional generalization to unseen languages. It also integrates Conformer and RVQ techniques to optimize information extraction and generation, achieving outstanding cross-lingual synthesis performance. Objective and subjective experiments have confirmed that Transinger significantly outperforms state-of-the-art singing synthesis methods in terms of cross-lingual generalization. These results demonstrate that multilingual aligned representations can markedly enhance model learning efficacy and robustness, even for languages not seen during training. Moreover, the integration of a strategy that splits IPA phonemes into letters and diacritics allows the model to learn pronunciation more effectively, resulting in a qualitative improvement in generalization.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.3390/s25133973 | DOI Listing |