Evaluation of Deep Neural Networks-based Acoustic Modelling Techniques Used for low-Quality Polish Speech Recordings

Abstract:

In the presented paper, we describe the works over the speech recognition of infoline records in the Polish language. One of the difficulties with this kind of records, is the number of different speakers. In addition, utterances are often short or overlapping, which can provide a weak speaker model and then invalid acoustic modelling. Among the available techniques, the Deep Neural Networks (DNN) deserve special attention as a tool for creating acoustic models. They proved to be very efficient for this purpose. However, most of the research using this kind of models was performed for the English language, while other languages receive much less attention. Thus, the research devoted to the Polish language is particularly valuable. In comparison to English, the Polish language is much more complex (i.e., free word order, 14 declinations). The recognition is more difficult because a bigger lexicon is required. The language model needs to serve more different words for the similar kind of language. The aim of the presented paper is evaluation of acoustic models for the Polish language based on different DNN architectures for the low quality speech data.