98%
921
2 minutes
20
The coal mining industry in Northern Shaanxi is robust, with a prevalent use of the local dialect, known as "Shapu", characterized by a distinct Northern Shaanxi accent. This study addresses the practical need for speech recognition in this dialect. We propose an end-to-end speech recognition model for the North Shaanxi dialect, leveraging the Conformer architecture. To tailor the model to the coal mining context, we developed a specialized corpus reflecting the phonetic characteristics of the dialect and its usage in the industry. We investigated feature extraction techniques suitable for the North Shaanxi dialect, focusing on the unique pronunciation of initial consonants and vowels. A preprocessing module was designed to accommodate the dialect's rapid speech tempo and polyphonic nature, enhancing recognition performance. To enhance the decoder's text generation capability, we replaced the Conformer decoder with a Transformer architecture. Additionally, to mitigate the computational demands of the model, we incorporated Connectionist Temporal Classification (CTC) joint training for optimization. The experimental results on our self-established voice dataset for the Northern Shaanxi coal mining industry demonstrate that the proposed Conformer-Transformer-CTC model achieves a 9.2% and 10.3% reduction in the word error rate compared to the standalone Conformer and Transformer models, respectively, confirming the advancement of our method. The next step will involve researching how to improve the performance of dialect speech recognition by integrating external language models and extracting pronunciation features of different dialects, thereby achieving better recognition results.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11769510 | PMC |
http://dx.doi.org/10.3390/s25020341 | DOI Listing |
Int J Audiol
September 2025
Institute of Hearing Technology and Audiology, Jade University of Applied Sciences, Oldenburg, Germany.
Objective: Determination of monaural and binaural speech-recognition curves for the Freiburg monosyllabic speech test (FMST) in quiet to update and supplement existing normative data.
Design: Monaural and binaural speech-recognition tests were performed in free field at five speech levels in two anechoic test rooms at two sites (Lübeck and Oldenburg, Germany). For the monaural tests, one ear was occluded with a foam earplug.
Front Artif Intell
August 2025
School of Computation and Communication Science and Engineering, The Nelson Mandela African Institution of Science and Technology, Arusha, Tanzania.
Computer vision has been identified as one of the solutions to bridge communication barriers between speech-impaired populations and those without impairment as most people are unaware of the sign language used by speech-impaired individuals. Numerous studies have been conducted to address this challenge. However, recognizing word signs, which are usually dynamic and involve more than one frame per sign, remains a challenge.
View Article and Find Full Text PDFZhonghua Jie He He Hu Xi Za Zhi
September 2025
Department of Respiratory and Critical Care Medicine, the First Affiliated Hospital of Guangzhou Medical University, National Center for Respiratory Medicine, National Clinical Research Center for Respiratory Disease, State Key Laboratory of Respiratory Disease, Guangzhou Institute of Respiratory He
Cough is a common symptom of many respiratory diseases, and parameters such as frequency, intensity, type and duration play important roles in disease screening, diagnosis and prognosis. Among these, cough frequency is the most widely applied metric. In current clinical practice, cough severity is primarily assessed based on patients' subjective symptom descriptions in combination with semi-structured questionnaires.
View Article and Find Full Text PDFCogn Psychol
September 2025
Graduate School of Engineering, Kochi University of Technology, Kami, Kochi, Japan. Electronic address:
Prior researches on global-local processing have focused on hierarchical objects in the visual modality, while the real-world involves multisensory interactions. The present study investigated whether the simultaneous presentation of auditory stimuli influences the recognition of visually hierarchical objects. We added four types of auditory stimuli to the traditional visual hierarchical letters paradigm:no sound (visual-only), a pure tone, a spoken letter that was congruent with the required response (response-congruent), or a spoken letter that was incongruent with it (response-incongruent).
View Article and Find Full Text PDFNanomicro Lett
September 2025
Nanomaterials & System Lab, Major of Mechatronics Engineering, Faculty of Applied Energy System, Jeju National University, Jeju, 63243, Republic of Korea.
Wearable sensors integrated with deep learning techniques have the potential to revolutionize seamless human-machine interfaces for real-time health monitoring, clinical diagnosis, and robotic applications. Nevertheless, it remains a critical challenge to simultaneously achieve desirable mechanical and electrical performance along with biocompatibility, adhesion, self-healing, and environmental robustness with excellent sensing metrics. Herein, we report a multifunctional, anti-freezing, self-adhesive, and self-healable organogel pressure sensor composed of cobalt nanoparticle encapsulated nitrogen-doped carbon nanotubes (CoN CNT) embedded in a polyvinyl alcohol-gelatin (PVA/GLE) matrix.
View Article and Find Full Text PDF