Towards Improved Auditory-Perceptual Assessment of Timbres: Comparing Accuracy and Reliability of Four Deconstructed Timbre Assessment Models.

Mathias Aaen , Cathrine Sadolin

J Voice

Complete Vocal Institute, Copenhagen K, Denmark.

Published: May 2024

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Unlabelled: Timbre is a central quality of singing, yet remains a complex notion poorly understood in psychoacoustic studies. Previous studies note how no single acoustic variable or combinations of variables consistently predict timbre dimensions. Timbre varies on a continuum from darkest to lightest. These extremes are associated with laryngeal and vocal tract adjustments related to smaller and larger vocal tract area and variations in vocal fold vibratory characteristics. Perceptually, timbre assessment is influenced by spectral characteristics and formant frequency adjustments, though these dimensions are not independently perceived. Perceptual studies repeatedly demonstrate difficulties in correlating variations in timbre stimuli to specific measures. A recent study demonstrated how acoustic predictive salience of voice category and voice weight across pitches contribute to timbre assessments and concludes that timbre may be related to as-of-yet unknown factor(s). The purpose of this study was to test four different models for assessing timbre; one model focused on specific anatomy, one on listener intuition, one utilizing auditory anchors, and one using expert raters in a deconstructed timbre model with five specific dimensions.

Methods: Four independent panels were conducted with separate cohorts of professional singing teachers. Forty-one assessors took part in the anatomically focused panel, 54 in the intuition-based panel, 30 in the anchored panel, and 12 in the expert listener panel. Stimuli taken from live performances of well-known singers were used for all panels, representing all genders, genres, and styles across a large pitch range. All stimuli are available as Supplementary Materials. Fleiss' kappa values, descriptive statistics, and significance tests are reported for all panel assessments.

Results: Panels 1 through 4 varied in overall accuracy and agreement. The intuition-based model showed overall 45% average accuracy (SD ± 4%), k = 0.289 (<0.001) compared to overall 71% average accuracy (SD ± 3%), k = 0.368 (<0.001) of the anatomical focused panel. The auditory-anchored model showed overall 75% average accuracy (SD ± 8%), k = 0.54 (<0.001) compared with overall 83% average accuracy and agreement of k = 0.63 (<0.001) for panel 4. Results revealed that the highest accuracy and reliability were achieved in a deconstructed timbre model and that providing anchoring improved reliability but with no further increase in accuracy.

Conclusion: Deconstructing timbre into specific parameters improved auditory perceptual accuracy and overall agreement. Assessing timbre along with other perceptual dimensions improves accuracy and reliability. Panel assessors' expert level of listening skills remain an important factor in obtaining reliable and accurate assessments of auditory stimuli for timbre dimensions. Anchoring improved reliability but with no further increase in accuracy. The study suggests that timbre assessment can be improved by approaching the percept through a prism of five specific dimensions each related to specific physiology and auditory-perceptual subcategories. Further tests are needed with framework-naïve listeners, nonmusically educated listeners, artificial intelligence comparisons, and synthetic stimuli to further test the reliability.

Download full-text PDF	Source
http://dx.doi.org/10.1016/j.jvoice.2024.03.039	DOI Listing

Publication Analysis

Top Keywords

timbre

deconstructed timbre

timbre assessment

vocal tract

timbre model

panel

improved auditory-perceptual

auditory-perceptual assessment

assessment timbres

timbres comparing

Similar Publications

Perceived vibrato and the singing power ratio explain overall evaluations in opera singing.

Front Psychol

August 2025

Faculty of Environment and Information Studies, Keio University, Kanagawa, Japan.

Haruka Kondo , Sotaro Kondoh , Shinya Fujii

In opera singing competitions, judges use an overall score to evaluate the singers' voices and determine their rankings. This score not only guides the singers' technique and expressiveness but also serves as a crucial indicator that can significantly influence their careers. However, the specific elements captured by this overall score remain unclear.

View Article and Find Full Text PDF

Similar Publications

Micro-variations in timing and loudness affect music-evoked mental imagery.

Sci Rep

August 2025

Sydney Conservatorium of Music, The University of Sydney, Sydney, NSW, Australia.

Ceren Ayyildiz , Andrew J Milne , Muireann Irish , Steffen A Herff

Music can shape the vividness, sentiment, and content of directed mental imagery. Yet, the role of specific musical features in these effects remains elusive. One important aspect of human musical performances is the presence of micro-variations-small deviations in timbre, pitch, and timing, driven by motor and attentional processes.

View Article and Find Full Text PDF

Similar Publications

A re-examination of duplex perception with musical chordsa).

J Acoust Soc Am

August 2025

School of Psychology, Aston University, Birmingham B4 7ET, United Kingdom.

Brian Roberts

In duplex perception, an acoustic element differing from the others in receiving ear or form (e.g., harmonic complex or sinusoidal) contributes simultaneously to two distinct percepts.

View Article and Find Full Text PDF

Similar Publications

How vocal timbre impacts word identification and listening effort in traffic-shaped noises.

JASA Express Lett

July 2025

Department of Speech, Language and Hearing Sciences, Indiana University, Bloomington, Indiana 47408,

Tzu-Pei Tsai , Tessa Bent , Malachi Henry

This study investigated how variation in vocal timbre (e.g., neutral and twangy) influences intelligibility and listening effort in traffic-shaped noises.

View Article and Find Full Text PDF

Similar Publications

Natural sounds can be reconstructed from human neuroimaging data using deep neural network representation.

PLoS Biol

July 2025

Department of Intelligence Science and Technology, Graduate School of Informatics, Kyoto University, Kyoto, Japan.

Jong-Yun Park , Mitsuaki Tsukamoto , Misato Tanaka , Yukiyasu Kamitani

Reconstruction of perceptual experiences from brain activity offers a unique window into how population neural responses represent sensory information. Although decoding visual content from functional MRI (fMRI) has seen significant success, reconstructing arbitrary sounds remains challenging due to the fine temporal structure of auditory signals and the coarse temporal resolution of fMRI. Drawing on the hierarchical auditory features of deep neural networks (DNNs) with progressively larger time windows and their neural activity correspondence, we introduce a method for sound reconstruction that integrates brain decoding of DNN features and an audio-generative model.

View Article and Find Full Text PDF

Similar Publications