DOI: 10.3390/computers15070402 ISSN: 2073-431X

A Hybrid Multi-Level Computational Framework for Latent Risk Modeling from Tabular Data

Bigul Mukhametzhanova, Akgul Naizagarayeva, Gulbakyt Ansabekova, Shynar Turmaganbetova, Yermek Sarsikeyev, Akmaral Kassymova, Azamat Dnekeshev, Pavel Dunayev, Zhanat Manbetova

This study presents a hybrid artificial intelligence system for latent cardiovascular risk stratification based on publicly available clinical and laboratory data. The proposed system integrates data preprocessing, auxiliary target modeling, latent phenotyping using UMAP and Gaussian mixture models, fuzzy logic-based risk integration, and multilevel predictive modeling. The key contribution of the system is the construction of a proxy target reflecting latent risk progression by combining phenotypic structure, probabilistic indicators, and mortality-related anchor points. Experimental evaluation was conducted on the NHANES dataset. The final analytical cohort included 78,822 adult participants, and the modeling set was divided into training, validation, and test subgroups using a stratified 70/15/15 design. The proposed PhaseFuzzy Hybrid model achieved an accuracy of 0.8390, a balanced accuracy of 0.7302, an F1-score of 0.5225, an MCC of 0.4203, an ROC-AUC of 0.8489, a PR-AUC of 0.5014, and a best LogLoss value of 0.4290 on the test set. The latent phenotyping step also demonstrated acceptable internal validity with a silhouette coefficient of 0.4138 and a confidence of 0.8800. The results demonstrate that the proposed framework identifies hidden cardiometabolic risk factors and provides an interpretable, scalable, and calibration-aware framework for latent cardiometabolic risk stratification and population-level screening.

More from our Archive