Bangla Speech Emotion Recognition Using Deep Learning-Based Ensemble Learning and Feature Fusion.

Md Shahid Ahammed Shakil , Fahmid Al Farid , Nitun Kumar Podder , S M Hasan Sazzad Iqbal , Abu Saleh Musa Miah , Md Abdur Rahim , Hezerul Abdul Karim

J Imaging

Centre for Image and Vision Computing (CIVC), COE for Artificial Intelligence, Faculty of Artificial Intelligence and Engineering (FAIE), Multimedia University, Cyberjaya 63100, Selangor, Malaysia.

Published: August 2025

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Emotion recognition in speech is essential for enhancing human-computer interaction (HCI) systems. Despite progress in Bangla speech emotion recognition, challenges remain, including low accuracy, speaker dependency, and poor generalization across emotional expressions. Previous approaches often rely on traditional machine learning or basic deep learning models, struggling with robustness and accuracy in noisy or varied data. In this study, we propose a novel multi-stream deep learning feature fusion approach for Bangla speech emotion recognition, addressing the limitations of existing methods. Our approach begins with various data augmentation techniques applied to the training dataset, enhancing the model's robustness and generalization. We then extract a comprehensive set of handcrafted features, including Zero-Crossing Rate (ZCR), chromagram, spectral centroid, spectral roll-off, spectral contrast, spectral flatness, Mel-Frequency Cepstral Coefficients (MFCCs), Root Mean Square (RMS) energy, and Mel-spectrogram. Although these features are used as 1D numerical vectors, some of them are computed from time-frequency representations (e.g., chromagram, Mel-spectrogram) that can themselves be depicted as images, which is conceptually close to imaging-based analysis. These features capture key characteristics of the speech signal, providing valuable insights into the emotional content. Sequentially, we utilize a multi-stream deep learning architecture to automatically learn complex, hierarchical representations of the speech signal. This architecture consists of three distinct streams: the first stream uses 1D convolutional neural networks (1D CNNs), the second integrates 1D CNN with Long Short-Term Memory (LSTM), and the third combines 1D CNNs with bidirectional LSTM (Bi-LSTM). These models capture intricate emotional nuances that handcrafted features alone may not fully represent. For each of these models, we generate predicted scores and then employ ensemble learning with a soft voting technique to produce the final prediction. This fusion of handcrafted features, deep learning-derived features, and ensemble voting enhances the accuracy and robustness of emotion identification across multiple datasets. Our method demonstrates the effectiveness of combining various learning models to improve emotion recognition in Bangla speech, providing a more comprehensive solution compared with existing methods. We utilize three primary datasets-SUBESCO, BanglaSER, and a merged version of both-as well as two external datasets, RAVDESS and EMODB, to assess the performance of our models. Our method achieves impressive results with accuracies of 92.90%, 85.20%, 90.63%, 67.71%, and 69.25% for the SUBESCO, BanglaSER, merged SUBESCO and BanglaSER, RAVDESS, and EMODB datasets, respectively. These results demonstrate the effectiveness of combining handcrafted features with deep learning-based features through ensemble learning for robust emotion recognition in Bangla speech.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12387467	PMC
http://dx.doi.org/10.3390/jimaging11080273	DOI Listing

Publication Analysis

Top Keywords

emotion recognition

bangla speech

handcrafted features

speech emotion

ensemble learning

deep learning

deep learning-based

learning

learning feature

feature fusion

Similar Publications

Validation of the German Emotional Contagion Scale and development of a mimicry brief version.

PLoS One

September 2025

Department of Psychology, University of Duisburg-Essen, Essen, Germany.

Tobias Janelt , Tobias Altmann , Danièle Anne Gubler , Marcus Roth

The susceptibility to emotional contagion has been psychometrically addressed by the self-reported Emotional Contagion Scale. With the present research, we validated a German adaptation of this scale and developed a mimicry brief version by selecting only the four items explicitly addressing the overt subprocess of mimicry. Across three studies (N1 = 195, N2 = 442, N3 = 180), involving various external measures of empathy, general personality domains, emotion recognition, and other constructs, the total German Emotional Contagion Scale demonstrated sound convergent and discriminant validity.

View Article and Find Full Text PDF

Similar Publications

Impact of hand function impairment on daily life of patients with systemic sclerosis: a qualitative study.

Rheumatology (Oxford)

September 2025

Department of Rheumatology & Clinical Immunology, University Medical Centre Utrecht, Utrecht, The Netherlands.

Mark Greveling , Stefano Rodolfi , Nora El Bardai , Christopher P Denton , Voon H Ong

Objectives: Many patients with systemic sclerosis (SSc) experience impaired hand function, yet the precise nature and impact of this impairment remains unclear. In this study, we explored the determinants of hand function impairment in SSc from a patient perspective and its impact on daily life. Additionally, we identified unmet care needs related to hand function impairment.

View Article and Find Full Text PDF

Similar Publications

Wearable sensors for animal health and wellness monitoring.

Prog Mol Biol Transl Sci

September 2025

Nanobiology and Nanozymology Research Laboratory, National Institute of Animal Biotechnology (NIAB), Opposite Journalist Colony, Near Gowlidoddy, Hyderabad, Telangana, India; Regional Centre for Biotechnology (RCB), Faridabad, Haryana, India. Electronic address:

M R Krishnendu , Sanjay Singh

Biosensors are rapidly emerging as a key tool in animal health management, therefore, gaining a significant recognition in the global market. Wearable sensors, integrated with advanced biosensing technologies, provide highly specialized devices for measuring both individual and multiple physiological parameters of animals, as well as monitoring their environment. These sensors are not only precise and sensitive but also reliable, user-friendly, and capable of accelerating the monitoring process.

View Article and Find Full Text PDF

Similar Publications

Impact of an educational intervention on postpartum depression for primary care nurses: a quasi-experimental study.

Rev Esc Enferm USP

September 2025

Universidade Federal do Triângulo Mineiro, Uberaba, MG, Brazil.

Débora Alves da Silva , Marli Aparecida Reis Coimbra , Lucas Carvalho Santana , Maria Aline Leocádio , Fernanda Bonato Zuffi

Objective: To evaluate the impact of an educational intervention on nursing care for women with signs of postpartum depression for primary health care nurses.

Method: Quasi-experimental, before-and-after study carried out with 14 primary health care nurses from a municipality, who participated in an educational intervention on nursing care for women with signs of postpartum depression. Qualitative data analysis was carried out before and after the intervention, using Bardin's thematic content analysis.

View Article and Find Full Text PDF

Similar Publications

No Association Between Vocal Emotion Recognition and Subjective Parental Reporting of Alexithymia in School-Age Children With Hearing Aids.

Ear Hear

September 2025

Department of Otorhinolaryngology, University Medical Center Groningen (UMCG), University of Groningen, Groningen, the Netherlands.

Başak Özkişi Yazgan , Laura Rachman , Gizem Babaoğlu , Pinar Ertürk , Etienne Gaudrain

Objectives: Alexithymia is characterized by difficulties in identifying and describing one's own emotions. Alexithymia has previously been associated with deficits in the processing of emotional information at both behavioral and neurobiological levels, and some studies have shown elevated levels of alexithymic traits in adults with hearing loss. This explorative study investigated alexithymia in young and adolescent school-age children with hearing aids in relation to (1) a sample of age-matched children with normal hearing, (2) age, (3) hearing thresholds, and (4) vocal emotion recognition.

View Article and Find Full Text PDF

Similar Publications