Knowledge Acquisition from Comparable Corpora for Cross-Language Information Retrieval

Abstract:

This paper aims at exploiting comparable corpora for cross-language information retrieval. Large text corpora represent a crucial resource for knowledge acquisition that will enrich multilingual lexicons and thesauri and help cross the language barrier for information retrieval. This paper seeks to present an approach for learning bilingual terminology from comparable corpora in order to translate terms from source language to target language and retrieve documents across languages. A linear combination involving the extracted bilingual terminology from comparable corpora, a bilingual dictionary and a transliteration model is proposed. An application on Japanese-English pair of languages shows that the proposed combination yields better translations and an effectiveness of information retrieval could be achieved across different languages.

9th IBIMA Conference

Abstract: