Improving Arabic Named Entity Recognition by Global Features and Triggers

Abstract:

Proper nouns are commonly introduced in special fashion within text, especially in official news articles. They are accompanied with triggers when they are first introduced e.g. titles with person names. However, triggers are not frequently used if the same entity is being referred to again within the same article. Exploiting this phenomenon, this study aims at improving proper noun identification and semantic classification in Arabic. A particular challenge in written Arabic is the lack of orthographic features to distinguish proper nouns. Adopting a corpus-based approach, we build a classification model from a training corpus and use the resulting model in the testing phase. In addition to gazetteers and POS tagging information features, the system employs a list of triggers for each named entity class and uses the class of other occurrences of the current word within the same article. Evaluation showed the effect of these added features was to increase performance by 3.2% on F-measure.

Â