Acquisition des Traductions de Requêtes à partir de Wikipédia pour la Recherche d’Information Translingue

Abstract:

The multilingual encyclopedia Wikipedia has become a very useful resource for the construction and enrichment of linguistic resources, such as dictionaries and ontologies. In this study, we are interested by the exploitation of Wikipedia for query translation in Cross-Language Information Retrieval. An application is completed for the Arabic-English pair of languages. All possible translation candidates are extracted from the titles of Wikipedia articles based on the inter-links between Arabic and English; which is considered as direct translation. Furthermore, other links such as Arabic to French and French to English are exploited for a transitive translation. A slight stemming and segmentation of the query into multiple tokens can be made if no translation can be found for the entire query. Assessments monolingual and cross-lingual systems were conducted using three weighting schemes of the Lucene search engine (default, Tf-Idf and BM25). In addition, the performance of the so-called translation method was compared with those of GoogleTranslate and MyMemory.