Building a Tunisian Dialect into Modern Standard Arabic Parallel Corpus For a Phrase-based Machine Translation

Abstract:

The purpose of this paper is to build a system capable of translating the dialect of Tunisia's capital into the modern standars Arabic. Having such a tool can have an impact in various domains such as translating social network user interactions, subtitles of Tunisian movies, books written with the writers' lical dialect, etc. Since the Tunisian dialect as well as the other Arabic dialects. We starte by building a parallel corpus that contains 5000 sentences. These latter are extracted from different sources accessible on the web and specifically from Facebook and Youtube. Then, they were manually translated into the Arabic. Afterwards, this resource was used to adopt the satistical approach, which is based on the creationof the Language Model (LM) and the Translation Model (TM). These twi models are then used by the decoder to choose the best translation for an input sentence. The results were promising where we achieved  49.90 as a BLEU score.

nsdlogo2016