Abstract:
The paper will present a method for de-noising human speech using Wave-U-Net, specifically the Demucs v3 model - an algorithm for separating sounds made by instruments in music tracks. This version of the model is characterized by the use of not only a wave but also a spectrogram. It is a hybrid method. Since the previous Demucs version has already been used in de-noising scenarios, a comparative analysis of the two ways was carried out in the reference. In addition, the effect of the cost function on the learning process, and the final result, was examined. The model was trained on white noise and noise from potential backgrounds occurring in everyday life. When trained on the same data, the newer algorithm provides better results in a shorter training time.