How to identify and fix human annotation errors in speech corpora for human speech generation purposes and more

Abstract:

Numerous speech corpora exist for various purposes. Some corpora are designed for specific topics but can be adapted in other domains. This paper presents methods for identifying and eliminating human annotation errors in speech corpora, focusing on key features relevant to biometric recognition and Text-To-Speech syn008 thesis. We describe errors that may appear in speech corpora, such as gender mislabeling and speaker misattribution. The presented method to detect such errors analyzes biometric verification distributions of audio samples from the database. Potentially incorrect annotations are then subjected to listening analysis. This method was tested on the CLARIN speech corpus using Phonexia software, demonstrating improved biometric recognition metrics like EER and FAR/FRR. The results indicate that error corrections can significantly improve the reliability of speech corpora, particularly for specific language applications where directly suitable corpora are scarce. The presented method is intended to significantly reduce the number of listening analyzes necessary to identify and remove errors from the speech corpus.