Arabic Stop Words: Towards a Generalisation and Standardisation

Abstract:

Language resources are an essential component in all natural language processing (NLP) applications. Language resources are divided into written and oral resources. In the current work, we are interested in written resources. There exist different types of written resources:•Character: Graphic symbol used as a unit in writing • Lexicon: the set of words of a given language • Dictionary: List of terms usually arranged in alphabetical order, providing definitions, explanatory information or descriptive data for each item • Glossary: Specialized list of terms relating to a particular field of study or interest, which may contain explanatory or descriptive information on the items listed. Example: glossary of terms employed in the standardization of geographical names.• Corpus: set of documents grouped for a certain goal•etc

 

nsdlogo2016