An Algorithm for Extracting the Root of Arabic Words

Abstract:

Stemming is one of many tools used in information retrieval (IR) to combat the vocabulary mismatch problem, in which query words do not match document words. Stemming in the Arabic language  does not  fit into the usual mold, because stemming in most research in other languages so far  depends only on eliminating prefixes and suffixes  from the word, but  Arabic words contain infixes as well. In this paper we introduce a root-based algorithm that handles the problems of affixes, including prefixes, suffixes, and infixes depending on the morphological pattern of the word. In this paper we will use the stemming concept to eliminate all kinds of  affixes, including infixes.