NAFIS: A Gold Standard Corpus for Arabic Stemmers Evaluation

Abstract:

Arabic stemming as in important pre-processing task in Arabic natural language processing services and applications experience two serious deficiencies:”unique stemming solution “and “stemmers “performance inconsistency”. these defects are mainly caused by the absence of a Gold standard Corpus .Defined as a Collection of texts stored in an electronic format ,selected to be representative of a particular language, collection or genre, manually annotated and enriched with additional  linguistic information, such corpus is used in stemmers benchmarking works. This paper provides a sight on NAFIS (Normalized Arabic  Fragments for Inestimable Stemming ),an Arabic stemming gold standard corpus .We describe NAFIS building methodology and we use it as an evaluation corpus in a benchmarking exercise.

nsdlogo2016