From ETL to Modern Data Stack: A Systematic Review and Strategic Implementation Framework
Chayma Tlemcani, Abou Zakaria Faroukhi, Youssef GahiOrganizations are under constant strain to substitute the traditional ETL pipelines and monolithic warehouses for the dynamic cloud-native modular architectures. Despite rapid adoption, the Modern Data Stack (MDS) literature remains fragmented: lakehouse, mesh, and fabric paradigms are studied in isolation, and no prior review has linked component-level decisions to organizational maturity. This systematic review addresses that gap. Following PRISMA 2020 guidelines across six databases, 650 records were screened and 141 studies were retained for thematic synthesis (the corpus was peer-reviewed, with a small number of primary-source technical preprints screening was performed primarily by one reviewer; a 10% double-screened sample yielded Cohen’s κ = 0.81). Six functional layers (ingestion, storage, transformation, orchestration, analytics, and observability/governance) and four dominant architectural patterns (cloud warehouse, lakehouse, data mesh, and data fabric) were identified. Components were evaluated against five criteria—scalability, cost efficiency, vendor neutrality, learning curve, and business impact—across three organizational archetypes (startup, SME, and enterprise). A three-phase maturity model (Foundation, Extension, Consolidation), a five-stage iterative implementation cycle, and a RACI governance matrix constitute the resulting strategic framework. Governance emerged as simultaneously the least-adopted layer in the corpus and the most consequential for long-run adoption success. The framework is propositional; empirical validation through an expert Delphi study, multi-case longitudinal analysis, and an AHP-based practitioner survey are planned as future work.