Overview of Deep Learning Voice Conversion Methods using Disentangling Speaker from Linguistic Content

Abstract:

In voice conversion, the user identity is an attribute that characterizes the utterance we want to swap with the other person's identity while keeping the content of the utterance unchanged. Voice conversion algorithms incorporate various types of speech processing techniques such as utterance analysis, speaker classifiers, and vocoders. This paper presents an overview of state-of-the-art voice conversion methods leveraging disentangling speaker from linguistic content.