Abstract:
Stemming is one of many tools used in information retrieval (IR) to combat the vocabulary mismatch problem, in which query words do not match document words. Stemming in the Arabic language does not fit into the usual mold, because stemming in most research in other languages so far depends only on eliminating prefixes and suffixes from the word, but Arabic words contain infixes as well. In this paper we introduce a root-based algorithm that handles the problems of affixes, including prefixes, suffixes, and infixes depending on the morphological pattern of the word. In this paper we will use the stemming concept to eliminate all kinds of affixes, including infixes.