Abstract:
Information systems allow companies and organizations to collect a large number of operational and transactional data. Data warehousing provides tools and techniques for analysing these data and deriving information at a level of abstraction suitable to support decision processes. However fundamental choices have to be taken in the process of configuring a data warehouse and importing data: the user has to explicitly tell the system (i) which data have to be analysed, (ii) which attributes have to be considered as measures and which as dimensions in the data warehouse, (iii) how high each dimension should be generalized and finally (iv) what is the quality and reliability of the data warehouse and the related decision process. Since these choices are not a trivial tasks for users (especially for complex databases) and considering that they can heavily influence the data warehouse effectiveness, in this paper we propose a methodology based on a set of statistical indexes that allows one to derive information on data quality. We are currently experimenting our methodology on a set of real-world operational databases and studying the preliminary results.