Abstract:
The corpus ofTafsir al-Tabari is considered as one ofthe high regard Islamic corpora. The importance ofthis corpus is revealed in the deep interpretation (Tafsir) of the Holy Quran. This corpus is characterized by an important and rich knowledge that is ofien used by different researchers and Islamic individual communities. Indeed, the normalization of Tafsir al-Tabari corpus can improve information retrieval and text analysis to achieve advanced results. In this context, we aim to develop and experiment a standardized Arabic version of Tafsir al-Tabari with XML based standard: Text Encoding Initiative (TEI). To achieve this, We start with proposing an encoding model based on TEI and adapted for the Tafsir text structure. Then, we elaborate and experiment an automatic encoding tool to generate the standardized version of Tafsir al-Tabari based on our model. The obtained results are encouraging despite some problems related to some exceptional cases.