98%
921
2 minutes
20
Problem: Stroke usually occurs suddenly. Stroke prehospital screening tools heavily rely on medical knowledge and are subjective. As an essential aspect of stroke assessment, speech analysis provides a non-invasive and convenient approach to early stroke diagnosis (ESD), offering critical support for timely intervention and treatment. Nevertheless, real-world speech recordings are often affected by environmental noise, which can significantly reduce the accuracy and reliability of speech-based diagnostic systems.
Objective: This paper aims to investigate the feasibility and effectiveness of ESD based on pathological voice classification and speech enhancement (SE).
Methods: We propose a cascaded ESD framework consisting of a SE module and a recognition module. Stroke patients' sustained vowels (SVs) and spontaneous speech (SS) signals are denoised by the SEWUNet-based SE model. The recognition module subsequently diagnoses stroke from the enhanced speech. For SVs, discrete handcrafted features were extracted, and due to their strong interpretability and computational efficiency in pathological voice tasks, five classical machine learning algorithms (KNN, SVM, RF, DT, and AdaBoost) were employed to train both six-vowel and all-vowel recognition models. For SS, we used four data augmentation techniques to expand the dataset. Then, we extracted Mel-spectrogram features to train a CNN-Transformer model. Additionally, transfer learning was introduced by replacing the CNN with a pre-trained ResNet model to further improve performance. We trained all recognition models using five-fold cross-validation, with gender and age incorporated as physiological features. Based on a single-channel SEWUNet network, the SE module included separate enhancement models for SVs and SS. An early stopping mechanism was adopted during the training of the SS and enhancement models to prevent overfitting.
Results: Results showed that the optimal models for SVs achieved high accuracy, sensitivity, specificity, and F1-score, all exceeding 90 %. The two best models for SS surpassed 95 %. The SEWUNet-based enhancement model improved speech quality metrics for both SVs and SS. Moreover, the recognition models trained on enhanced speech achieved approximately a 10 % performance improvement compared to those trained on noisy speech. Ultimately, we designed a real-time ESD system using the AdaBoost and the CNN-Transformer models and conducted clinical trials and WeChat mini-program tests. Results demonstrated that among 34 subjects (24 patients and 10 healthy individuals), SVs achieved 85.29 % accuracy, with four patients misclassified and one patient undetermined. By applying the proposed two-stage recognition strategy (SVs followed by SS), the system achieved 100 % overall recognition accuracy.
Conclusions: The proposed ESD method combining SE with SVs and SS can serve as an assistive diagnostic tool to help medical professionals and individuals detect and prevent strokes at an earlier stage, reduce workload, and improve identification objectivity. The code and experimental protocol of this paper are available at https://github.com/LiuYingchenseu/ESPVC.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1016/j.compbiomed.2025.110940 | DOI Listing |
J Voice
September 2025
Bielefeld University, P.O. Box 10 01 31, Bielefeld D-33501, Germany. Electronic address:
To this day, the assessment of human voices remains a challenge due to (i) inconsistencies in subjective ratings and (ii) the lack of objective measurements for the perceptual impressions of voice characteristics. This can lead to significant consequences in applied fields such as speech therapy, where the assessment of voices is crucial for a successful treatment. In this paper, we address the explanation of voice and its characteristics from two different angles: In a first study, 22 speech therapists in training assessed a set of 20 non-pathological voices regarding 20 voice characteristics before and after receiving an expert explanation.
View Article and Find Full Text PDFJ Voice
September 2025
Department of Clinical Science, Intervention and Technology (CLINTEC), Division of Speech and Language Pathology, Karolinska Institutet, SE-171 76, Stockholm, Sweden.
Objective: Subglottal pressure is a clinically relevant parameter for assessment of voice disorders and correlates to f and sound pressure level (SPL). The aim of the current study was to evaluate the use of a visual target for feedback of f and SPL in subglottal pressure measurements in habitual voice and at phonation threshold level with a syllable string and a phrase for the purpose of improving the reliability of subglottal pressure measurements.
Methods: Data from 12 vocally healthy women (29-61 years) was analyzed.
Laryngoscope
September 2025
UAB Voice Center, Department of Otolaryngology-Head and Neck Surgery, Heersink School of Medicine, Birmingham, Alabama, USA.
Objectives: To examine factors that direct decisions in the treatment of glottic insufficiency and propose a paradigm that may assist in treatment decision-making in glottic insufficiency.
Methods: A retrospective chart review was completed of 73 patients with vocal fold atrophy, presbyphonia, or vocal fold motion impairment, including diagnosis, Voice Handicap Index-10 (VHI-10), Voice Problem Impact Scales (VPIS), Glottal Function Index (GFI), Eating Assessment Tool-10 (EAT-10), Consensus Auditory Perceptual Analysis of Voice (CAPE-V), glottal gap size, stimulability, treatment decisions, and outcomes. Univariate and multivariate logistic regression analyses were performed to identify which variables predicted initial treatment recommendation.
Trop Doct
September 2025
Professor and Head, Department of Dermatology, Venereology and Leprosy, King George's Medical University, Lucknow, Uttar Pradesh, India.
A 56-year old immuno-competent male from a non-endemic region in India presented with progressive weight loss, hoarseness of voice and widespread cutaneous lesions, including leonine facies, genital nodules and diffuse scaling. Magnetic resonance imaging of the neck revealed oedematous thickening of the false vocal cords, epiglottis and aryepiglottic folds, suggesting laryngeal involvement. All routine investigations were normal.
View Article and Find Full Text PDFJ Voice
September 2025
Department of Communication Sciences and Disorders, University of Iowa, Iowa City, IA.
Background: Strained voice quality-commonly referred to as vocal strain-is a hallmark of functional voice disorders such as muscle tension dysphonia and is often associated with vocal fatigue and laryngeal hyperfunction. Although listeners describe it as excessive vocal effort, strained voice quality frequently overlaps perceptually with breathiness and roughness, complicating reliable assessment. Despite its clinical relevance, no standardized acoustic definition of strained voice quality has been established.
View Article and Find Full Text PDF