A Graph-based Model for Big Data Warehouses Governance

Abstract:

Data governance emerges as an increased challenge in the context of Big Data, as it increases the heterogeneity and variety of data sources. Data governance covers a few subareas, namely Metadata Management and Data Quality, which can also be extended to the Data Profiling concept. These areas offer a set of processes to support and manage the complexity in the integration of different data sources. It is not only about increased data volume, but also about the increased number of business processes within an organization, requiring additional knowledge about this data. The objective of this work is to propose a graph-based model for the cataloging and governance of a Big Data Warehouse, in order to make its content available and show, among others, how it is evolving over time. The proposed model also allows the tagging and lineage of the data sources and the storage of the collected metadata in a graph database.