Rule Storage for an Efficient Rule Based Inconsistency Check

Abstract:

Data inconsistency is a key source of data quality problems. Rule based methods are a major means for inconsistency checking. Association rules have been used for this purpose. Time efficiency is very important for online checking. In this paper we utilize a tree structure for efficient stor- age and retrieval of rules; to reduce complexity and improve efficiency. In the present work we use a storage method called prefix tree (Trie) to store and retrieve rules for making predictions on a dirty dataset. Inconsistent values are identified from large, high dimensional data sets using a large ruleset with reduced complexity in comparison to the existing methods.