Abstract:
This paper describes a prosody generation system based on the use of hybrid of rule-based algorithm and model-based method. The system was developed based on two methods: concatenation of diphones waveforms and a prosodic model selection to control fundamental frequency and duration. The prosodic generation is part of the speech control module, which carries out the interface function, bridging the gap between the output of the block of text linguistic processing and the input of speech signal generation module. Each voice segment in a word being synthesized, is attributed to a set of pitch target values. Signal generation is implemented according to the prosody phrasing stream, which describes the phrase as a sequence of diphone phoneme codes with assigned duration and fundamental frequency values. The key steps in prosody generation based on Multi Band Resynthesis Overlap and Add (MBROLA) technology and the ways of increasing naturalness of synthesized speech are also highlighted.