Useful blunders: Can automated speech recognition errors improve downstream dementia classification?

Changye Li , Weizhe Xu , Trevor Cohen , Serguei Pakhomov

J Biomed Inform

Department of Pharmaceutical Care & Health Systems, University of Minnesota, Minneapolis, 55455, MN, USA.

Published: February 2024

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Objectives: We aimed to investigate how errors from automatic speech recognition (ASR) systems affect dementia classification accuracy, specifically in the "Cookie Theft" picture description task. We aimed to assess whether imperfect ASR-generated transcripts could provide valuable information for distinguishing between language samples from cognitively healthy individuals and those with Alzheimer's disease (AD).

Methods: We conducted experiments using various ASR models, refining their transcripts with post-editing techniques. Both these imperfect ASR transcripts and manually transcribed ones were used as inputs for the downstream dementia classification. We conducted comprehensive error analysis to compare model performance and assess ASR-generated transcript effectiveness in dementia classification.

Results: Imperfect ASR-generated transcripts surprisingly outperformed manual transcription for distinguishing between individuals with AD and those without in the "Cookie Theft" task. These ASR-based models surpassed the previous state-of-the-art approach, indicating that ASR errors may contain valuable cues related to dementia. The synergy between ASR and classification models improved overall accuracy in dementia classification.

Conclusion: Imperfect ASR transcripts effectively capture linguistic anomalies linked to dementia, improving accuracy in classification tasks. This synergy between ASR and classification models underscores ASR's potential as a valuable tool in assessing cognitive impairment and related clinical applications.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10922372	PMC
http://dx.doi.org/10.1016/j.jbi.2024.104598	DOI Listing

Publication Analysis

Top Keywords

speech recognition

downstream dementia

dementia classification

"cookie theft"

imperfect asr-generated

asr-generated transcripts

imperfect asr

asr transcripts

synergy asr

asr classification

Similar Publications

Efficient spatio-temporal modeling for sign language recognition using CNN and RNN architectures.

Front Artif Intell

August 2025

School of Computation and Communication Science and Engineering, The Nelson Mandela African Institution of Science and Technology, Arusha, Tanzania.

Kasian Myagila , Devotha Godfrey Nyambo , Mussa Ally Dida

Computer vision has been identified as one of the solutions to bridge communication barriers between speech-impaired populations and those without impairment as most people are unaware of the sign language used by speech-impaired individuals. Numerous studies have been conducted to address this challenge. However, recognizing word signs, which are usually dynamic and involve more than one frame per sign, remains a challenge.

View Article and Find Full Text PDF

Similar Publications

[Cough frequency monitoring: current technologies and clinical research applications].

Zhonghua Jie He He Hu Xi Za Zhi

September 2025

Department of Respiratory and Critical Care Medicine, the First Affiliated Hospital of Guangzhou Medical University, National Center for Respiratory Medicine, National Clinical Research Center for Respiratory Disease, State Key Laboratory of Respiratory Disease, Guangzhou Institute of Respiratory He

J X Xie , K F Lai

Cough is a common symptom of many respiratory diseases, and parameters such as frequency, intensity, type and duration play important roles in disease screening, diagnosis and prognosis. Among these, cough frequency is the most widely applied metric. In current clinical practice, cough severity is primarily assessed based on patients' subjective symptom descriptions in combination with semi-structured questionnaires.

View Article and Find Full Text PDF

Similar Publications

Forest before trees? It depends on not only what you see, but also what you hear.

Cogn Psychol

September 2025

Graduate School of Engineering, Kochi University of Technology, Kami, Kochi, Japan. Electronic address:

Xiaoyu Tang , Haoming Liu , Heming Zhang , Yufeng He , Xinzhong Cui

Prior researches on global-local processing have focused on hierarchical objects in the visual modality, while the real-world involves multisensory interactions. The present study investigated whether the simultaneous presentation of auditory stimuli influences the recognition of visually hierarchical objects. We added four types of auditory stimuli to the traditional visual hierarchical letters paradigm:no sound (visual-only), a pure tone, a spoken letter that was congruent with the required response (response-congruent), or a spoken letter that was incongruent with it (response-incongruent).

View Article and Find Full Text PDF

Similar Publications

Deep Learning-Assisted Organogel Pressure Sensor for Alphabet Recognition and Bio-Mechanical Motion Monitoring.

Nanomicro Lett

September 2025

Nanomaterials & System Lab, Major of Mechatronics Engineering, Faculty of Applied Energy System, Jeju National University, Jeju, 63243, Republic of Korea.

Kusum Sharma , Kousik Bhunia , Subhajit Chatterjee , Muthukumar Perumalsamy , Anandhan Ayyappan Saj

Wearable sensors integrated with deep learning techniques have the potential to revolutionize seamless human-machine interfaces for real-time health monitoring, clinical diagnosis, and robotic applications. Nevertheless, it remains a critical challenge to simultaneously achieve desirable mechanical and electrical performance along with biocompatibility, adhesion, self-healing, and environmental robustness with excellent sensing metrics. Herein, we report a multifunctional, anti-freezing, self-adhesive, and self-healable organogel pressure sensor composed of cobalt nanoparticle encapsulated nitrogen-doped carbon nanotubes (CoN CNT) embedded in a polyvinyl alcohol-gelatin (PVA/GLE) matrix.

View Article and Find Full Text PDF

Similar Publications

No Association Between Vocal Emotion Recognition and Subjective Parental Reporting of Alexithymia in School-Age Children With Hearing Aids.

Ear Hear

September 2025

Department of Otorhinolaryngology, University Medical Center Groningen (UMCG), University of Groningen, Groningen, the Netherlands.

Başak Özkişi Yazgan , Laura Rachman , Gizem Babaoğlu , Pinar Ertürk , Etienne Gaudrain

Objectives: Alexithymia is characterized by difficulties in identifying and describing one's own emotions. Alexithymia has previously been associated with deficits in the processing of emotional information at both behavioral and neurobiological levels, and some studies have shown elevated levels of alexithymic traits in adults with hearing loss. This explorative study investigated alexithymia in young and adolescent school-age children with hearing aids in relation to (1) a sample of age-matched children with normal hearing, (2) age, (3) hearing thresholds, and (4) vocal emotion recognition.

View Article and Find Full Text PDF

Similar Publications