Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Objective: The timely and accurate submission of prehospital electronic medical records is crucial for the efficiency of medical rescue operations. However, personnel professional experience, training cycles, and environmental conditions often influence its completion rate. This study proposes integrating noise-robust speech recognition technology with large language models (LLMs) to generate emergency diagnosis summaries. This approach aims to help medical personnel quickly document key patient information, streamlining the emergency response process.

Methods: A joint training model combining speech enhancement and recognition was proposed, incorporating LLMs to generate emergency diagnosis summaries. The model was trained in two rounds using actual ambulance noise data, environmental noise data, and open-source speech datasets. The model optimized Connectionist Temporal Classification(CTC) and attention loss through deep feature extraction and the selective attention mechanism. The study also analyzed the impact of different prompt designs on the quality of LLMs-generated summaries. Tukey HSD and Holm correction methods were employed for multiple comparisons of three subjective evaluation metrics under three prompts for three models, assessing the statistical significance of each factor's influence on the generation results.

Results: The proposed speech recognition model reduced the character error rate in real-world ambulance noise recordings to 52.92%, outperforming several comparative speech recognition models. Under the Stylized Prompt condition, the Qwen2.5-7B-Instruct model demonstrated superior accuracy and relevance compared to other models in terms of subjectivity and relevance, reducing the average completion time for prehospital electronic medical records from 20 min to 14 min.

Conclusion: Using noise-robust speech recognition combined with LLMs to generate emergency diagnosis summaries improves efficiency and enhances medical record completion. This approach demonstrates broad application potential in emergencies and could be extended to quality evaluation, disease prediction, and risk assessment.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.ijmedinf.2025.106029DOI Listing

Publication Analysis

Top Keywords

speech recognition
20
llms generate
12
generate emergency
12
emergency diagnosis
12
diagnosis summaries
12
prehospital electronic
8
electronic medical
8
medical records
8
noise-robust speech
8
ambulance noise
8

Similar Publications

Objective: Determination of monaural and binaural speech-recognition curves for the Freiburg monosyllabic speech test (FMST) in quiet to update and supplement existing normative data.

Design: Monaural and binaural speech-recognition tests were performed in free field at five speech levels in two anechoic test rooms at two sites (Lübeck and Oldenburg, Germany). For the monaural tests, one ear was occluded with a foam earplug.

View Article and Find Full Text PDF

Computer vision has been identified as one of the solutions to bridge communication barriers between speech-impaired populations and those without impairment as most people are unaware of the sign language used by speech-impaired individuals. Numerous studies have been conducted to address this challenge. However, recognizing word signs, which are usually dynamic and involve more than one frame per sign, remains a challenge.

View Article and Find Full Text PDF

[Cough frequency monitoring: current technologies and clinical research applications].

Zhonghua Jie He He Hu Xi Za Zhi

September 2025

Department of Respiratory and Critical Care Medicine, the First Affiliated Hospital of Guangzhou Medical University, National Center for Respiratory Medicine, National Clinical Research Center for Respiratory Disease, State Key Laboratory of Respiratory Disease, Guangzhou Institute of Respiratory He

Cough is a common symptom of many respiratory diseases, and parameters such as frequency, intensity, type and duration play important roles in disease screening, diagnosis and prognosis. Among these, cough frequency is the most widely applied metric. In current clinical practice, cough severity is primarily assessed based on patients' subjective symptom descriptions in combination with semi-structured questionnaires.

View Article and Find Full Text PDF

Prior researches on global-local processing have focused on hierarchical objects in the visual modality, while the real-world involves multisensory interactions. The present study investigated whether the simultaneous presentation of auditory stimuli influences the recognition of visually hierarchical objects. We added four types of auditory stimuli to the traditional visual hierarchical letters paradigm:no sound (visual-only), a pure tone, a spoken letter that was congruent with the required response (response-congruent), or a spoken letter that was incongruent with it (response-incongruent).

View Article and Find Full Text PDF

Deep Learning-Assisted Organogel Pressure Sensor for Alphabet Recognition and Bio-Mechanical Motion Monitoring.

Nanomicro Lett

September 2025

Nanomaterials & System Lab, Major of Mechatronics Engineering, Faculty of Applied Energy System, Jeju National University, Jeju, 63243, Republic of Korea.

Wearable sensors integrated with deep learning techniques have the potential to revolutionize seamless human-machine interfaces for real-time health monitoring, clinical diagnosis, and robotic applications. Nevertheless, it remains a critical challenge to simultaneously achieve desirable mechanical and electrical performance along with biocompatibility, adhesion, self-healing, and environmental robustness with excellent sensing metrics. Herein, we report a multifunctional, anti-freezing, self-adhesive, and self-healable organogel pressure sensor composed of cobalt nanoparticle encapsulated nitrogen-doped carbon nanotubes (CoN CNT) embedded in a polyvinyl alcohol-gelatin (PVA/GLE) matrix.

View Article and Find Full Text PDF