Early stroke diagnosis and evaluation based on pathological voice classification using speech enhancement.

Comput Biol Med

The State Key Laboratory of Digital Medical Engineering, Jiangsu Key Lab of Robot Sensor and Control, School of Instrument Science and Engineering, Southeast University, Nanjing, 210096, China.

Published: September 2025


Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Problem: Stroke usually occurs suddenly. Stroke prehospital screening tools heavily rely on medical knowledge and are subjective. As an essential aspect of stroke assessment, speech analysis provides a non-invasive and convenient approach to early stroke diagnosis (ESD), offering critical support for timely intervention and treatment. Nevertheless, real-world speech recordings are often affected by environmental noise, which can significantly reduce the accuracy and reliability of speech-based diagnostic systems.

Objective: This paper aims to investigate the feasibility and effectiveness of ESD based on pathological voice classification and speech enhancement (SE).

Methods: We propose a cascaded ESD framework consisting of a SE module and a recognition module. Stroke patients' sustained vowels (SVs) and spontaneous speech (SS) signals are denoised by the SEWUNet-based SE model. The recognition module subsequently diagnoses stroke from the enhanced speech. For SVs, discrete handcrafted features were extracted, and due to their strong interpretability and computational efficiency in pathological voice tasks, five classical machine learning algorithms (KNN, SVM, RF, DT, and AdaBoost) were employed to train both six-vowel and all-vowel recognition models. For SS, we used four data augmentation techniques to expand the dataset. Then, we extracted Mel-spectrogram features to train a CNN-Transformer model. Additionally, transfer learning was introduced by replacing the CNN with a pre-trained ResNet model to further improve performance. We trained all recognition models using five-fold cross-validation, with gender and age incorporated as physiological features. Based on a single-channel SEWUNet network, the SE module included separate enhancement models for SVs and SS. An early stopping mechanism was adopted during the training of the SS and enhancement models to prevent overfitting.

Results: Results showed that the optimal models for SVs achieved high accuracy, sensitivity, specificity, and F1-score, all exceeding 90 %. The two best models for SS surpassed 95 %. The SEWUNet-based enhancement model improved speech quality metrics for both SVs and SS. Moreover, the recognition models trained on enhanced speech achieved approximately a 10 % performance improvement compared to those trained on noisy speech. Ultimately, we designed a real-time ESD system using the AdaBoost and the CNN-Transformer models and conducted clinical trials and WeChat mini-program tests. Results demonstrated that among 34 subjects (24 patients and 10 healthy individuals), SVs achieved 85.29 % accuracy, with four patients misclassified and one patient undetermined. By applying the proposed two-stage recognition strategy (SVs followed by SS), the system achieved 100 % overall recognition accuracy.

Conclusions: The proposed ESD method combining SE with SVs and SS can serve as an assistive diagnostic tool to help medical professionals and individuals detect and prevent strokes at an earlier stage, reduce workload, and improve identification objectivity. The code and experimental protocol of this paper are available at https://github.com/LiuYingchenseu/ESPVC.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.compbiomed.2025.110940DOI Listing

Publication Analysis

Top Keywords

pathological voice
12
recognition models
12
speech
9
early stroke
8
stroke diagnosis
8
based pathological
8
voice classification
8
classification speech
8
speech enhancement
8
recognition module
8

Similar Publications

Challenges and Limits in Explaining and Acoustic Modeling of Voice Characteristics.

J Voice

September 2025

Bielefeld University, P.O. Box 10 01 31, Bielefeld D-33501, Germany. Electronic address:

To this day, the assessment of human voices remains a challenge due to (i) inconsistencies in subjective ratings and (ii) the lack of objective measurements for the perceptual impressions of voice characteristics. This can lead to significant consequences in applied fields such as speech therapy, where the assessment of voices is crucial for a successful treatment. In this paper, we address the explanation of voice and its characteristics from two different angles: In a first study, 22 speech therapists in training assessed a set of 20 non-pathological voices regarding 20 voice characteristics before and after receiving an expert explanation.

View Article and Find Full Text PDF

Evaluation of Visual Feedback for f and SPL in Subglottal Pressure Measurements-A Methodological Study.

J Voice

September 2025

Department of Clinical Science, Intervention and Technology (CLINTEC), Division of Speech and Language Pathology, Karolinska Institutet, SE-171 76, Stockholm, Sweden.

Objective: Subglottal pressure is a clinically relevant parameter for assessment of voice disorders and correlates to f and sound pressure level (SPL). The aim of the current study was to evaluate the use of a visual target for feedback of f and SPL in subglottal pressure measurements in habitual voice and at phonation threshold level with a syllable string and a phrase for the purpose of improving the reliability of subglottal pressure measurements.

Methods: Data from 12 vocally healthy women (29-61 years) was analyzed.

View Article and Find Full Text PDF

Objectives: To examine factors that direct decisions in the treatment of glottic insufficiency and propose a paradigm that may assist in treatment decision-making in glottic insufficiency.

Methods: A retrospective chart review was completed of 73 patients with vocal fold atrophy, presbyphonia, or vocal fold motion impairment, including diagnosis, Voice Handicap Index-10 (VHI-10), Voice Problem Impact Scales (VPIS), Glottal Function Index (GFI), Eating Assessment Tool-10 (EAT-10), Consensus Auditory Perceptual Analysis of Voice (CAPE-V), glottal gap size, stimulability, treatment decisions, and outcomes. Univariate and multivariate logistic regression analyses were performed to identify which variables predicted initial treatment recommendation.

View Article and Find Full Text PDF

Manuscript title-Leonine facies and hoarseness in disseminated histoplasmosis: A diagnostic pitfall.

Trop Doct

September 2025

Professor and Head, Department of Dermatology, Venereology and Leprosy, King George's Medical University, Lucknow, Uttar Pradesh, India.

A 56-year old immuno-competent male from a non-endemic region in India presented with progressive weight loss, hoarseness of voice and widespread cutaneous lesions, including leonine facies, genital nodules and diffuse scaling. Magnetic resonance imaging of the neck revealed oedematous thickening of the false vocal cords, epiglottis and aryepiglottic folds, suggesting laryngeal involvement. All routine investigations were normal.

View Article and Find Full Text PDF

Background: Strained voice quality-commonly referred to as vocal strain-is a hallmark of functional voice disorders such as muscle tension dysphonia and is often associated with vocal fatigue and laryngeal hyperfunction. Although listeners describe it as excessive vocal effort, strained voice quality frequently overlaps perceptually with breathiness and roughness, complicating reliable assessment. Despite its clinical relevance, no standardized acoustic definition of strained voice quality has been established.

View Article and Find Full Text PDF