Machine learning-assisted wearable sensing systems for speech recognition and interaction.

Nat Commun

Key Laboratory of Optoelectronic Technology & Systems of Ministry of Education, International R & D Center of Micro-nano Systems and New Materials Technology, Chongqing University, Chongqing, 400044, China.

Published: March 2025


Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

The human voice stands out for its rich information transmission capabilities. However, voice communication is susceptible to interference from noisy environments and obstacles. Here, we propose a wearable wireless flexible skin-attached acoustic sensor (SAAS) capable of capturing the vibrations of vocal organs and skin movements, thereby enabling voice recognition and human-machine interaction (HMI) in harsh acoustic environments. This system utilizes a piezoelectric micromachined ultrasonic transducers (PMUT), which feature high sensitivity (-198 dB), wide bandwidth (10 Hz-20 kHz), and excellent flatness (±0.5 dB). Flexible packaging enhances comfort and adaptability during wear, while integration with the Residual Network (ResNet) architecture significantly improves the classification of laryngeal speech features, achieving an accuracy exceeding 96%. Furthermore, we also demonstrated SAAS's data collection and intelligent classification capabilities in multiple HMI scenarios. Finally, the speech recognition system was able to recognize everyday sentences spoken by participants with an accuracy of 99.8% through a deep learning model. With advantages including a simple fabrication process, stable performance, easy integration, and low cost, SAAS presents a compelling solution for applications in voice control, HMI, and wearable electronics.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11894117PMC
http://dx.doi.org/10.1038/s41467-025-57629-5DOI Listing

Publication Analysis

Top Keywords

speech recognition
8
machine learning-assisted
4
learning-assisted wearable
4
wearable sensing
4
sensing systems
4
systems speech
4
recognition interaction
4
interaction human
4
voice
4
human voice
4

Similar Publications

Deep Learning-Assisted Organogel Pressure Sensor for Alphabet Recognition and Bio-Mechanical Motion Monitoring.

Nanomicro Lett

September 2025

Nanomaterials & System Lab, Major of Mechatronics Engineering, Faculty of Applied Energy System, Jeju National University, Jeju, 63243, Republic of Korea.

Wearable sensors integrated with deep learning techniques have the potential to revolutionize seamless human-machine interfaces for real-time health monitoring, clinical diagnosis, and robotic applications. Nevertheless, it remains a critical challenge to simultaneously achieve desirable mechanical and electrical performance along with biocompatibility, adhesion, self-healing, and environmental robustness with excellent sensing metrics. Herein, we report a multifunctional, anti-freezing, self-adhesive, and self-healable organogel pressure sensor composed of cobalt nanoparticle encapsulated nitrogen-doped carbon nanotubes (CoN CNT) embedded in a polyvinyl alcohol-gelatin (PVA/GLE) matrix.

View Article and Find Full Text PDF

Objectives: Alexithymia is characterized by difficulties in identifying and describing one's own emotions. Alexithymia has previously been associated with deficits in the processing of emotional information at both behavioral and neurobiological levels, and some studies have shown elevated levels of alexithymic traits in adults with hearing loss. This explorative study investigated alexithymia in young and adolescent school-age children with hearing aids in relation to (1) a sample of age-matched children with normal hearing, (2) age, (3) hearing thresholds, and (4) vocal emotion recognition.

View Article and Find Full Text PDF

Objectives: This study aimed to investigate the potential contribution of subtle peripheral auditory dysfunction to listening difficulties (LiD) using a threshold-equalizing noise (TEN) test and distortion-product otoacoustic emissions (DPOAE). We hypothesized that a subset of patients with LiD have undetectable peripheral auditory dysfunction.

Design: This case-control study included 61 patients (12 to 53 years old; male/female, 18/43) in the LiD group and 22 volunteers (12 to 59 years old; male/female, 10/12) in the control group.

View Article and Find Full Text PDF

Dysphagia lusoria is an uncommon cause of dysphagia with an increasing incidence with age. It is unknown why individuals with dysphagia lusoria typically remain asymptomatic until older adulthood, but some theorize that it could be related to physiologic and anatomical changes that occur with the aging process, such as increased esophageal rigidity and stiffening of vascular walls with atherosclerosis, that make the compression from these congenital aberrations more impactful. While uncommon, it is also likely underrecognized due to its being diagnostically challenging to identify.

View Article and Find Full Text PDF

While blink analysis was traditionally conducted within vision research, recent studies suggest that blinks might reflect a more general cognitive strategy for resource allocation, including with auditory tasks, but its use within the fields of Audiology or Psychoacoustics remains scarce and its interpretation largely speculative. It is hypothesized that as listening conditions become more difficult, the number of blinks would decrease, especially during stimulus presentation, because it reflects a window of alertness. In experiment 1, 21 participants were presented with 80 sentences at different signal-to-noise ratios (SNRs): 0,  + 7,  + 14 dB and in quiet, in a sound-proof room with gaze and luminance controlled (75 lux).

View Article and Find Full Text PDF