Abstract:
The paper presents an approach to implementation of a framework to validate XML based datasets. The Extensible Markup Language (XML) is a subset of SGML with a goal to enable generic SGML to be served, received and processed on the Web [1]. A well-formed XML document in some cases needs to be validated to meet certain constraints. In this paper authors have discussed an approach to create a standardised framework to solve issues concerning ETL process and data integration including syntax and content validation using external validation rules repository. Extract, Transform, Load (ETL) is a procedure of copying data from one or more sources to the destination system [2].
The paper also describes an implementation proposal of a presented framework utilising state-of-the-art technologies. The use of interoperability mechanisms deployed in cloud infrastructure allowed authors to provide an easily-configurable and scalable environment for big data processing. It should be noted that the proposal involves data extraction from homogeneous or heterogeneous data sources. Presented solution has been implemented and evaluated on the ETL process while migrating between production workflow systems.