Smartphone Application for the Analysis of Prosodic Features in Running Speech with a Focus on Bipolar Disorders: System Performance Evaluation and Case Study.

Andrea Guidi , Sergio Salvi , Manuel Ottaviano , Claudio Gentili , Gilles Bertschy , Danilo de Rossi , Enzo Pasquale Scilingo , Nicola Vanello

Sensors (Basel)

Dipartimento di Ingegneria dell'Informazione, University of Pisa, Via G. Caruso 16, Pisa 56122, Italy.

Published: November 2015

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Bipolar disorder is one of the most common mood disorders characterized by large and invalidating mood swings. Several projects focus on the development of decision support systems that monitor and advise patients, as well as clinicians. Voice monitoring and speech signal analysis can be exploited to reach this goal. In this study, an Android application was designed for analyzing running speech using a smartphone device. The application can record audio samples and estimate speech fundamental frequency, F0, and its changes. F0-related features are estimated locally on the smartphone, with some advantages with respect to remote processing approaches in terms of privacy protection and reduced upload costs. The raw features can be sent to a central server and further processed. The quality of the audio recordings, algorithm reliability and performance of the overall system were evaluated in terms of voiced segment detection and features estimation. The results demonstrate that mean F0 from each voiced segment can be reliably estimated, thus describing prosodic features across the speech sample. Instead, features related to F0 variability within each voiced segment performed poorly. A case study performed on a bipolar patient is presented.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4701269	PMC
http://dx.doi.org/10.3390/s151128070	DOI Listing

Publication Analysis

Top Keywords

voiced segment

prosodic features

running speech

case study

features

speech

smartphone application

application analysis

analysis prosodic

features running

Similar Publications

Accurate Analysis of the Pitch Pulse-Based Magnitude/Phase Structure of Natural Vowels and Assessment of Three Lightweight Time/Frequency Voicing Restoration Methods.

J Voice

September 2025

RISE-Health, Alameda Prof. Hernâni Monteiro, 4200-319 Porto, Portugal; Department of Otorhinolaryngology, Centro Hospitalar Universitário de São João, Porto, Portugal; Department of Surgery and Physiology, University of Porto - Faculty of Medicine, Alameda Prof. Hernâni Monteiro, 4200-319 Porto

Aníbal J S Ferreira , Luis M T Jesus , Laurentino M M Leal , Jorge E F Spratley

This paper addresses two challenges that are intertwined and are key in informing signal processing methods restoring natural (voiced) speech from whispered speech. The first challenge involves characterizing and modeling the evolution of the harmonic phase/magnitude structure of a sequence of individual pitch periods in a voiced region of natural speech comprising sustained or co-articulated vowels. A novel algorithm segmenting individual pitch pulses is proposed, which is then used to obtain illustrative results highlighting important differences between sustained and co-articulated vowels, and suggesting practical synthetic voicing approaches.

View Article and Find Full Text PDF

Similar Publications

A reconfigurable piezo-ionotropic polymer membrane for sustainable multi-resonance acoustic sensing.

Nat Commun

September 2025

Department of Chemical Engineering, Hanyang University, Seoul, Republic of Korea.

Wu Bin Ying , Joo Sung Kim , Zhengyang Kong , Zhe Yu , Elvis K Boahen

Sensorineural hearing loss is the most common form of deafness, typically resulting from the loss of sensory cells on the basilar membrane, which cannot regenerate and thus lose sensitivity to sound vibrations. Here, we report a reconfigurable piezo-ionotropic polymer membrane engineered for biomimetic sustainable multi-resonance acoustic sensing, offering exceptional sensitivity (530 kPa) and broadband frequency discrimination (20 Hz to 3300 Hz) while remaining resistant to "dying". The acoustic sensing capability is driven by an ion hitching-in cage effect intrinsic to the ion gel combined with fluorinated polyurethane.

View Article and Find Full Text PDF

Similar Publications

MedVidDeID: Protecting privacy in clinical encounter video recordings.

J Biomed Inform

August 2025

Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, 423 Guardian Dr, Philadelphia, 19104, PA, USA; Department of Computer and Information Science, University of Pennsylvania, Levine Hall, 3330 Walnut St, Philadelphia, 19104, PA, USA. E

Sriharsha Mopidevi , Kuk Jin Jang , Basam Alasaly , Sydney Pugh , Jean Park

Objective: The increasing use of audio-video (AV) data in healthcare has improved patient care, clinical training, and medical and ethnographic research. However, it has also introduced major challenges in preserving patient-provider privacy due to Protected Health Information (PHI) in such data. Traditional de-identification methods are inadequate for AV data, which can reveal identifiable information such as faces, voices, and environmental details.

View Article and Find Full Text PDF

Similar Publications

Acoustic insights into coarticulatory dynamics in aphasia: Evidence from a semitic language.

J Commun Disord

August 2025

College of Social Sciences, Arts, and Humanities, Al-Akhawayn University, Morocco. Electronic address:

Hicham Adem

This is the first comprehensive study to examine the feasibility of using acoustic measures to characterize coarticulatory dynamics in Arabic speakers with Broca's aphasia, addressing a significant gap in the literature and contributing to both universal and culturally specific understandings of coarticulatory timing in aphasia. Five Palestinian Arabic-speaking participants with Broca's aphasia and five control speakers completed a repetition task involving initial fricative-vowel syllables. Using PRAAT software, the analysis incorporates both static and dynamic acoustic parameters, including formant values (F2 and F3), transition slopes and variability, Voice Onset Time (VOT), and intensity measures.

View Article and Find Full Text PDF

Similar Publications

Language-specific phonetic realisation of stop voicing contrasts in English and Japanese synthesised speech.

JASA Express Lett

August 2025

English Language and Linguistics, University of Glasgow, Glasgow, G12 8QQ, United Kingdom.

James Tanner , Yasuaki Shinohara , Faith Chiu

Speech synthesis has improved dramatically over recent years, enabled by large datasets and advances in neural network architectures. Little is known, however, about how synthesised speech patterns are realized from a phonetic perspective. By synthesising speech in two languages with differing implementations of stop voicing, we observe that synthesised speech broadly follows expected patterns for each language, though partially diverges for specific segments.

View Article and Find Full Text PDF

Similar Publications