An LSTM-based Gesture-to-Speech Recognition System.

Proc (IEEE Int Conf Healthc Inform)

Department of Computer Science and Engineering, Department of Biomedical Engineering, University of North Texas, Denton, Texas, USA.

Published: June 2023


Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Fast and flexible communication options are limited for speech-impaired people. Hand gestures coupled with fast, generated speech can enable a more natural social dynamic for those individuals - particularly individuals without the fine motor skills to type on a keyboard or tablet reliably. We created a mobile phone application prototype that generates audible responses associated with trained hand movements and collects and organizes the accelerometer data for rapid training to allow tailored models for individuals who may not be able to perform standard movements such as sign language. Six participants performed 11 distinct gestures to produce the dataset. A mobile application was developed that integrated a bidirectional LSTM network architecture which was trained from this data. After evaluation using nested subject-wise cross-validation, our integrated bidirectional LSTM model demonstrates an overall recall of 91.8% in recognition of these pre-selected 11 hand gestures, with recall at 95.8% when two commonly confused gestures were not assessed. This prototype is a step in creating a mobile phone system capable of capturing new gestures and developing tailored gesture recognition models for individuals in speech-impaired populations. Further refinement of this prototype can enable fast and efficient communication with the goal of further improving social interaction for individuals unable to speak.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10894657PMC
http://dx.doi.org/10.1109/ichi57859.2023.00062DOI Listing

Publication Analysis

Top Keywords

hand gestures
8
mobile phone
8
models individuals
8
integrated bidirectional
8
bidirectional lstm
8
gestures
5
individuals
5
lstm-based gesture-to-speech
4
gesture-to-speech recognition
4
recognition system
4

Similar Publications

Tool use is a complex motor planning problem. Prior research suggests that planning to use tools involves resolving competition between different tool-related action representations. We therefore reasoned that competition may also be exacerbated with tools for which the motions of the tool and the hand are incongruent (e.

View Article and Find Full Text PDF

Deliberate synchronization of speech and gesture: Effects of neurodiversity and development.

Lang Cogn

December 2024

Donders Center for Brain, Cognition, and Behaviour, Radboud University Nijmegen, Heyendaalseweg 135, 6525 AJ Nijmegen, The Netherlands.

The production of speech and gesture is exquisitely temporally coordinated. In autistic individuals, speech-gesture synchrony during spontaneous discourse is disrupted. To evaluate whether this asynchrony reflects motor coordination versus language production processes, the current study examined performed hand movements during speech in youth with autism spectrum disorder (ASD) compared to neurotypical youth.

View Article and Find Full Text PDF

Speech is the primary form of communication; still, there are people whose hearing or speaking skills are disabled. Communication offers an essential hurdle for people with such an impairment. Sign Languages (SLs) are the natural languages of the Deaf and their primary means of communication.

View Article and Find Full Text PDF

Gesture encoding in human left precentral gyrus neuronal ensembles.

Commun Biol

August 2025

Robert J. and Nancy D. Carney Institute for Brain Science, Brown University, Providence, RI, USA.

Understanding the cortical activity patterns driving dexterous upper limb motion has the potential to benefit a broad clinical population living with limited mobility through the development of novel brain-computer interface (BCI) technology. The present study examines the activity of ensembles of motor cortical neurons recorded using microelectrode arrays in the dominant hemisphere of two BrainGate clinical trial participants with cervical spinal cord injury as they attempted to perform a set of 48 different hand gestures. Although each participant displayed a unique organization of their respective neural latent spaces, it was possible to achieve classification accuracies of ~70% for all 48 gestures (and ~90% for sets of 10).

View Article and Find Full Text PDF

This study presents a real-time hand tracking and collision detection system for immersive mixed-reality boxing training on Apple Vision Pro (Apple Inc., Cupertino, CA, USA). Leveraging the device's advanced spatial computing capabilities, this research addresses the limitations of traditional fitness applications that lack precision for technique-based sports like boxing with visual-only hand tracking.

View Article and Find Full Text PDF