Efficient Binary Weight Convolutional Network Accelerator for Speech Recognition.

Lunyi Guo , Shining Mu , Yijie Deng , Chaofan Shi , Bo Yan , Zhuoling Xiao

Sensors (Basel)

School of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China.

Published: January 2023

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Speech recognition has progressed tremendously in the area of artificial intelligence (AI). However, the performance of the real-time offline Chinese speech recognition neural network accelerator for edge AI needs to be improved. This paper proposes a configurable convolutional neural network accelerator based on a lightweight speech recognition model, which can dramatically reduce hardware resource consumption while guaranteeing an acceptable error rate. For convolutional layers, the weights are binarized to reduce the number of model parameters and improve computational and storage efficiency. A multichannel shared computation (MCSC) architecture is proposed to maximize the reuse of weight and feature map data. The binary weight-sharing processing engine (PE) is designed to avoid limiting the number of multipliers. A custom instruction set is established according to the variable length of voice input to configure parameters for adapting to different network structures. Finally, the ping-pong storage method is used when the feature map is an input. We implemented this accelerator on Xilinx ZYNQ XC7Z035 under the working frequency of 150 MHz. The processing time for 2.24 s and 8 s of speech was 69.8 ms and 189.51 ms, respectively, and the convolution performance reached 35.66 GOPS/W. Compared with other computing platforms, accelerators perform better in terms of energy efficiency, power consumption and hardware resource consumption.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9920974	PMC
http://dx.doi.org/10.3390/s23031530	DOI Listing

Publication Analysis

Top Keywords

speech recognition

network accelerator

neural network

hardware resource

resource consumption

feature map

speech

efficient binary

binary weight

weight convolutional

Similar Publications

Monaural and binaural speech-recognition curves for the freiburg monosyllabic speech test in quiet.

Int J Audiol

September 2025

Institute of Hearing Technology and Audiology, Jade University of Applied Sciences, Oldenburg, Germany.

Hendrik Husstedt , Larissa Warkentin , Florian Denk , Inga Holube

Objective: Determination of monaural and binaural speech-recognition curves for the Freiburg monosyllabic speech test (FMST) in quiet to update and supplement existing normative data.

Design: Monaural and binaural speech-recognition tests were performed in free field at five speech levels in two anechoic test rooms at two sites (Lübeck and Oldenburg, Germany). For the monaural tests, one ear was occluded with a foam earplug.

View Article and Find Full Text PDF

Similar Publications

Efficient spatio-temporal modeling for sign language recognition using CNN and RNN architectures.

Front Artif Intell

August 2025

School of Computation and Communication Science and Engineering, The Nelson Mandela African Institution of Science and Technology, Arusha, Tanzania.

Kasian Myagila , Devotha Godfrey Nyambo , Mussa Ally Dida

Computer vision has been identified as one of the solutions to bridge communication barriers between speech-impaired populations and those without impairment as most people are unaware of the sign language used by speech-impaired individuals. Numerous studies have been conducted to address this challenge. However, recognizing word signs, which are usually dynamic and involve more than one frame per sign, remains a challenge.

View Article and Find Full Text PDF

Similar Publications

[Cough frequency monitoring: current technologies and clinical research applications].

Zhonghua Jie He He Hu Xi Za Zhi

September 2025

Department of Respiratory and Critical Care Medicine, the First Affiliated Hospital of Guangzhou Medical University, National Center for Respiratory Medicine, National Clinical Research Center for Respiratory Disease, State Key Laboratory of Respiratory Disease, Guangzhou Institute of Respiratory He

J X Xie , K F Lai

Cough is a common symptom of many respiratory diseases, and parameters such as frequency, intensity, type and duration play important roles in disease screening, diagnosis and prognosis. Among these, cough frequency is the most widely applied metric. In current clinical practice, cough severity is primarily assessed based on patients' subjective symptom descriptions in combination with semi-structured questionnaires.

View Article and Find Full Text PDF

Similar Publications

Forest before trees? It depends on not only what you see, but also what you hear.

Cogn Psychol

September 2025

Graduate School of Engineering, Kochi University of Technology, Kami, Kochi, Japan. Electronic address:

Xiaoyu Tang , Haoming Liu , Heming Zhang , Yufeng He , Xinzhong Cui

Prior researches on global-local processing have focused on hierarchical objects in the visual modality, while the real-world involves multisensory interactions. The present study investigated whether the simultaneous presentation of auditory stimuli influences the recognition of visually hierarchical objects. We added four types of auditory stimuli to the traditional visual hierarchical letters paradigm:no sound (visual-only), a pure tone, a spoken letter that was congruent with the required response (response-congruent), or a spoken letter that was incongruent with it (response-incongruent).

View Article and Find Full Text PDF

Similar Publications

Deep Learning-Assisted Organogel Pressure Sensor for Alphabet Recognition and Bio-Mechanical Motion Monitoring.

Nanomicro Lett

September 2025

Nanomaterials & System Lab, Major of Mechatronics Engineering, Faculty of Applied Energy System, Jeju National University, Jeju, 63243, Republic of Korea.

Kusum Sharma , Kousik Bhunia , Subhajit Chatterjee , Muthukumar Perumalsamy , Anandhan Ayyappan Saj

Wearable sensors integrated with deep learning techniques have the potential to revolutionize seamless human-machine interfaces for real-time health monitoring, clinical diagnosis, and robotic applications. Nevertheless, it remains a critical challenge to simultaneously achieve desirable mechanical and electrical performance along with biocompatibility, adhesion, self-healing, and environmental robustness with excellent sensing metrics. Herein, we report a multifunctional, anti-freezing, self-adhesive, and self-healable organogel pressure sensor composed of cobalt nanoparticle encapsulated nitrogen-doped carbon nanotubes (CoN CNT) embedded in a polyvinyl alcohol-gelatin (PVA/GLE) matrix.

View Article and Find Full Text PDF

Similar Publications