Evolution of Data Architectures: From Data Warehouses to Lakehouse and Beyond

Abstract:

This paper examines the evolution of data architectures through the lens of enterprise architecture, with a focus on Data Warehouse (DWH), Data Lake (DL), and Data Lakehouse (DLH). It positions data architecture as a core element of enterprise architecture and outlines how frameworks such as Zachman, TOGAF, and the Gartner EA approach frame governance, integration, and strategic alignment of the data layer. Methodologically, the study follows a literature-based analysis and synthesis, reflecting the stated objective to describe the evolution of data architectures and provide a comparative view.  The paper characterizes DWH as a structured, schema-on-write, multi-layered repository (stage/core/mart) with strong data quality but slower onboarding; DL as an object-storage approach emphasizing schema-on-read and ELT, high flexibility, and risks around metadata/catalog management and transactional guarantees; and DLH as a hybrid that adds a metadata/transactional layer (ACID) while preserving DL flexibility, albeit at higher architectural complexity and skill demands.  The paper also contextualizes related notions (modern cloud DWH, data fabric, data mesh) and presents a comparative table to summarize trade-offs. Overall, the contribution is a practitioner-oriented synthesis clarifying where each paradigm fits and emphasizing the centrality of governance and metadata stewardship to avoid “data swamps” and sustain value creation.