Automatic Text Classification using Naïve Bayesian Algorithm on Arabic Language

Abstract:

Text classification is a supervised technique that uses labeled train data to learn the classification system and then automatically classify the remaining text using these class labels.

In this research we illustrated an effective approach to text classification from Arabic text collections; our approach uses probabilistic framework by applying naïve Bayesian algorithm to classify Arabic text.

Our test data is 600 documents distributed equally into 6 classes (Architecture, Economy, Health and Medicine, Politics, science, and sports), each class contains 100 documents. We used 25% from each class to train our system, and the other 75% documents are automatically classified using naïve Bayesian classifier.

We tested our system performance using the accuracy measure at several number of test documents. The accuracy varies from class to another (from 41% to 100%). The over all average accuracy achieved by our system for all classes is 57.19% and the best results achieved was for the first class, it reached 88.38%.