DOI: 10.3390/make8070174 ISSN: 2504-4990

A Cognitive Lakehouse Framework with Transformer-Driven Analytics and Autonomous Decision Intelligence for Real-Time Enterprise Systems

Santosh Reddy Addula, Deepak Kumar, Guna Sekhar Sajja, Steven Hallman, Alan Dennis

The rapid evolution of data-driven enterprises demands scalable and intelligent systems capable of managing substantial volumes of heterogeneous data in real time. However, traditional systems lack a holistic approach to managing distributed data engineering, real-time analytics, and intelligent decision-making. To address these limitations, this paper proposes a Cognitive Lakehouse Framework that integrates distributed data processing, transformer-based deep learning, real-time analytics, and autonomous decision intelligence. Data are gathered from high-velocity, heterogeneous streams using Apache Kafka. Subsequently, data are processed using the hybrid batch/streaming paradigm, implemented via Apache Spark and Apache Flink, providing low latency and scalability. For data storage, a unified lakehouse layer is created using Delta Lake and Apache Iceberg, both of which support ACID transactions and schema evolution. In addition, transformer-based Deep Learning (DL) algorithms are utilized to capture temporal dependencies for predictive analytics, anomaly detection, and adaptive learning. Model lifecycle management is handled by MLflow, while ClickHouse and Apache Druid are used for real-time analytics. The architecture uses microservices and an event-driven approach on Kubernetes, and the workflow is automated with Apache Airflow. The performance assessment is conducted using TPC-H, TPC-DS, and real-time stream data to measure latency, throughput, and accuracy. Data quality, security, and compliance are provided by governance layers consisting of Apache Ranger and Apache Atlas. Experimental results show that significant gains can be made in terms of performance, with an accuracy of 98.5%, a query response time of 120 ms, a peak throughput of 85,000 records/s, and an end-to-end latency of 95 ms.

More from our Archive