Recherche et production de corpus de messages pour la multilinguisation de sites de ecommerce en SMS, initialement en arabe

Abstract:

In this paper, we present our research in the framework of the CATS project (Classified Ads through SMS). CATS is a system of management of small Arabic advertisements posted in SMS of buy and sale ( cars, real estate...), currently deployed in Jordan by the FastLink operator. In order to evolve this system in width by adapting it to other languages (French, English) and in length by applying it to other sectors (employment, marriage, domestic machines, trade of mobile phone, pages yellow...), we are in the difficulty in finding or building SMS corpora functionally equivalent to a real, arab and natural corpus. A simple translation of this starting corpus gives it the same type of corpus (real, natural)? We present, in this paper, an answer to this question and a solution for a case of multilinguisation of sites of E-trade in SMS, initially in Arabic.