Abstract:
Researchers and practitioners from various elds strive to achieve and maintain high standards of information quality for their projects or organizational units. Since information (data) is widely seen as avaluable asset, the literature consists of a number of frameworks and guidelines for assessing its quality. Implementing such methods can provide the business with an overview of information quality aspects. In practice, a variety of (automated) calculations are often in place in order to aid the assessment of the information quality, and are presented to the consumer such as the responsible data analyst. Nonetheless, in most cases it is not apparent which results have the most signicant eect on the overall data value and hence, the impacts on the business. In this paper, we propose a method for quantifying these impacts by developing an overall measurement based on a multivariate analysis. The approach uses fuzzy logic to construct an approximation of the quality indicator from a set of functions, that stem from intuitive formulations such as \if-then" to describe data quality. Then, we apply regression techniques to make a selection of if-then rules that best explain the data quality. These can subsequently be used to assess new data on grounds of plausible and tested rules. This paper presents the idea and theoretical foundations, provides an example, and illustrates the practical implications of the method. It thus takes a rst step towards an automated aid to quantify the risk that results from low levels of information quality.The proposed approach can help prioritizing quality issues and hence, setting the scope of data quality improvement initiatives for the business.