Contrastive Speaker Representation Learning with Hard Negative Sampling for Speaker Recognition.

Changhwan Go , Young Han Lee , Taewoo Kim , Nam In Park , Chanjun Chun

Sensors (Basel)

Department of Computer Engineering, Chosun University, Gwangju 61452, Republic of Korea.

Published: September 2024

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Speaker recognition is a technology that identifies the speaker in an input utterance by extracting speaker-distinguishable features from the speech signal. Speaker recognition is used for system security and authentication; therefore, it is crucial to extract unique features of the speaker to achieve high recognition rates. Representative methods for extracting these features include a classification approach, or utilizing contrastive learning to learn the speaker relationship between representations and then using embeddings extracted from a specific layer of the model. This paper introduces a framework for developing robust speaker recognition models through contrastive learning. This approach aims to minimize the similarity to hard negative samples-those that are genuine negatives, but have extremely similar features to the positives, leading to potential mistaken. Specifically, our proposed method trains the model by estimating hard negative samples within a mini-batch during contrastive learning, and then utilizes a cross-attention mechanism to determine speaker agreement for pairs of utterances. To demonstrate the effectiveness of our proposed method, we compared the performance of a deep learning model trained with a conventional loss function utilized in speaker recognition with that of a deep learning model trained using our proposed method, as measured by the equal error rate (EER), an objective performance metric. Our results indicate that when trained with the voxceleb2 dataset, the proposed method achieved an EER of 0.98% on the voxceleb1-E dataset and 1.84% on the voxceleb1-H dataset.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11478696	PMC
http://dx.doi.org/10.3390/s24196213	DOI Listing

Publication Analysis

Top Keywords

speaker recognition

proposed method

hard negative

contrastive learning

speaker

deep learning

learning model

model trained

learning

recognition

A PHP Error was encountered