Impact of Semantic and Optimization Approach on Arabic Texts Clustering

Abstract:

The automatic classification of Terms in Arabic texts becomes necessary because of the volume of Terms within Arabic texts exchanged and stored on an electronic medium. This work is part of a comparative study of unsupervised classification methods (k-means, PAM and hierarchical classification), known as clustering. This comparison is made on Arabic texts. The choice of this language is motivated by its very specific morph syntactic characteristics and the limitation of work on this language. Thus, two contributions deal with the one semantic approach for the extraction of terms and the other approach for the determination of the optimal choice of K in order to improve the quality of clustering. Two criteria (internal and stability) were used to evaluate our algorithms in order to compare them to increase the quality of clustering of Arabic text terms. Our evaluations show that each algorithm has its limitations and benefits.

nsdlogo2016